The mechanic that surprises everyone
Large language models are stateless. They don't "remember" your session between turns. So to let the model act on the conversation so far, the client has to send the whole context — your prompts, the files it read, every tool result, and all prior replies — with each request.
In a chat that's cheap, because the conversation is short. In agentic coding it's not: Claude Code reads files, runs commands, and accumulates large tool outputs, so the context balloons. A session that has read a few large files and run a dozen commands can carry hundreds of thousands of tokens, and that whole payload is re-sent every single turn.
Where the money actually goes
Every API call in a session is priced in a few buckets. Your dashboard shows the total; what matters for cutting the bill is the attribution:
| Bucket | What it is | Why it grows |
|---|---|---|
| Cache read | The context re-sent every turn so the model can "see" the conversation. | Grows as the session gets longer — usually the biggest line on a long session. |
| Cache write | New context (a file you open, a tool result) written into the prompt cache the first time. | Spikes when you pull in big files or verbose command output. |
| Output | The tokens the model actually generates — the code and explanations. | Often a surprisingly small slice of a big bill. |
| Fresh input | Uncached prompt tokens. | Usually small once caching kicks in. |
The key insight: on a long session, cache read — re-sent context — tends to dominate, even though it's invisible on most dashboards. tokenscope exists to surface exactly this split.
Why caching helps but doesn't make it free
Prompt caching is the thing that keeps this from being catastrophic. Anthropic's documented multipliers (verify them for your tier, they can change):
- Cache read = 0.1× the input price. Re-sending cached context is roughly a tenth of the normal input cost.
- Cache write = 1.25× (5-minute cache) or 2× (1-hour cache) the input price, paid once when context is first written.
So caching is a big discount — but 0.1× of a 400k-token context, re-paid on hundreds of turns, still adds up. The math is simply large context × many turns. Caching lowers the per-token rate; it doesn't change the fact that you re-send the whole thing every turn.
See it for your own sessions
tokenscope is a local, read-only CLI. It parses your Claude Code session logs
(~/.claude/projects/**/*.jsonl) and shows the output / cache-read / cache-write / fresh
split, the per-turn context-growth curve, and concrete "trim this" insights. Nothing is uploaded.
Run it on your latest session:
npx @wartzar-bee/tokenscope
Or paste a report in your browser →
Read-only · local · no network · no telemetry.