# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview This is a Claude Code MCP plugin that automatically applies Anthropic's [prompt caching](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching) to reduce token costs in coding workflows. It wraps the Anthropic SDK to inject `cache_control` markers intelligently and reports cache savings. **Core value proposition**: In agentic coding sessions, large system prompts, tool definitions, and document context are re-sent on every turn. This plugin identifies these stable, reusable segments and marks them for caching, achieving up to 90% cost reduction on repeated content. ## Engineering Mindset All code in this repo follows [BEST_PRACTICES.md](BEST_PRACTICES.md). Read it before writing any code. Key rules that affect every file: - stdout is the MCP wire — never `console.log()` in tool handlers, use `stderr` - Every tool handler must catch all errors and return `{ isError: true, ... }` — never throw - Every tool must define `outputSchema` — no schema, no merge - The irony rule: this plugin must never flood context itself — tool responses must be small and structured ## Cross-Platform Requirements This plugin runs on **Linux, macOS, and Windows**. Every code path and every test must pass on all three. CI runs a 9-job matrix (3 OS × 3 Node versions). ### Rules — always follow these | Topic | Wrong | Correct | |---|---|---| | Path construction | `dir + '/' + file` | `path.join(dir, file)` | | Home directory | `~` or `process.env.HOME` | `os.homedir()` | | Temp directory | `'/tmp'` | `os.tmpdir()` | | Config path | hardcoded string | `path.join(os.homedir(), '.claude', 'settings.json')` | | Line endings | `\r\n` | `\n` (`.gitattributes` enforces LF on all platforms) | | Process signals | `SIGTERM` only | `SIGINT` + `SIGTERM` (`SIGTERM` is not available on Windows) | | File permissions | `fs.chmod(...)` alone | guard with `if (process.platform !== 'win32')` | | Shell commands | `sh -c ...` | avoid; use Node `fs`/`child_process` with `shell: true` only when necessary | ### Path utilities All file path logic lives in `src/utils/paths.ts`. Never resolve paths inline — add a helper there and import it. This is the file the smoke-test job validates on each OS. ### Writing cross-platform tests - Use `path.join` for any expected path strings in assertions - For OS-specific behavior, use `if (process.platform === 'win32')` guards inside tests — do not skip tests entirely - Fixture files use LF line endings; don't assert on `\r\n` ## Tech Stack - **Runtime**: Node.js with TypeScript - **MCP framework**: `@modelcontextprotocol/sdk` - **Anthropic SDK**: `@anthropic-ai/sdk` - **Build**: `tsc` → `dist/` - **Test**: `vitest` - **Package manager**: `npm` ## Commands ```bash npm run build # tsc compile → dist/ npm run dev # tsx watch mode for development npm run test # vitest run npm run test:watch # vitest watch npm run lint # eslint src/ npm run typecheck # tsc --noEmit ``` To run a single test file: ```bash npx vitest run src/__tests__/cache-analyzer.test.ts ``` To run the MCP server locally (for Claude Code integration): ```bash node dist/index.js ``` ## Architecture ``` src/ ├── index.ts # MCP server entry point — registers tools and starts server ├── cache-analyzer.ts # Segments prompts into cacheable vs. volatile blocks ├── prompt-optimizer.ts # Injects cache_control markers into message arrays ├── token-tracker.ts # Accumulates usage stats across turns (cache hits/misses) ├── api-proxy.ts # Thin wrapper around Anthropic SDK that applies optimization └── __tests__/ ``` ### Key Design Decisions **Caching strategy**: Anthropic's cache is keyed on the exact prefix of a conversation up to the `cache_control` breakpoint. The optimizer inserts breakpoints at: 1. The end of the system prompt (almost always static) 2. The end of tool definitions block (static within a session) 3. The end of large document/code injections (static within a task) Breakpoints must be placed so that the prefix before each one never changes between turns. Placing a breakpoint too late (inside volatile content) causes a cache miss and wastes the creation cost. **Cache control placement rules** (enforced in `cache-analyzer.ts`): - Only the last `user` or `system` message in a stable prefix gets a breakpoint - Tool definitions are always cacheable if they don't change mid-session - Content blocks shorter than ~1000 tokens are not worth caching (overhead > savings) **Token tracking**: The Anthropic API response includes `usage.cache_creation_input_tokens` and `usage.cache_read_input_tokens`. `token-tracker.ts` accumulates these across turns and exposes a summary via the `get_cache_stats` MCP tool. ### MCP Tools Exposed | Tool | Description | |------|-------------| | `optimize_messages` | Takes a raw messages array, returns one with `cache_control` injected | | `get_cache_stats` | Returns cumulative token savings for the current session | | `reset_cache_stats` | Resets the session statistics | | `analyze_cacheability` | Dry-run: shows which segments would be marked and estimated savings | ### Configuration (`.prompt-cache.json` in project root) ```json { "minTokensToCache": 1024, "cacheToolDefinitions": true, "cacheSystemPrompt": true, "maxCacheBreakpoints": 4 } ``` ## Prompt Caching Primer (Anthropic API) - Cache lifetime: 5 minutes (ephemeral), extended by hits - Supported models: Claude 3+ (Haiku, Sonnet, Opus) - Cache is per-model, per-workspace - `cache_creation_input_tokens` cost = 1.25× normal input rate - `cache_read_input_tokens` cost = 0.1× normal input rate — the 90% savings - Max 4 `cache_control` breakpoints per request ## Adding to Claude Code In `~/.claude/settings.json`: ```json { "mcpServers": { "prompt-caching-mcp": { "command": "node", "args": ["/path/to/prompt-caching/dist/index.js"] } } } ```