Home › Guides › Why is Claude Code so expensive?

Why is Claude Code so expensive?

Short version: it usually isn't the model writing code. As an agentic-coding session grows, the entire conversation context is re-sent to the model on every turn. You pay to re-read the same files, tool output, and history again and again — and that cost grows the longer the session runs.

The mechanic that surprises everyone

Large language models are stateless. They don't "remember" your session between turns. So to let the model act on the conversation so far, the client has to send the whole context — your prompts, the files it read, every tool result, and all prior replies — with each request.

In a chat that's cheap, because the conversation is short. In agentic coding it's not: Claude Code reads files, runs commands, and accumulates large tool outputs, so the context balloons. A session that has read a few large files and run a dozen commands can carry hundreds of thousands of tokens, and that whole payload is re-sent every single turn.

The counter-intuitive part: a turn where the model writes just a few lines of code can still cost a lot — because before it writes those lines, the entire (large) context was sent to it again.

Where the money actually goes

Every API call in a session is priced in a few buckets. Your dashboard shows the total; what matters for cutting the bill is the attribution:

BucketWhat it isWhy it grows
Cache readThe context re-sent every turn so the model can "see" the conversation.Grows as the session gets longer — usually the biggest line on a long session.
Cache writeNew context (a file you open, a tool result) written into the prompt cache the first time.Spikes when you pull in big files or verbose command output.
OutputThe tokens the model actually generates — the code and explanations.Often a surprisingly small slice of a big bill.
Fresh inputUncached prompt tokens.Usually small once caching kicks in.

The key insight: on a long session, cache read — re-sent context — tends to dominate, even though it's invisible on most dashboards. tokenscope exists to surface exactly this split.

Why caching helps but doesn't make it free

Prompt caching is the thing that keeps this from being catastrophic. Anthropic's documented multipliers (verify them for your tier, they can change):

  • Cache read = 0.1× the input price. Re-sending cached context is roughly a tenth of the normal input cost.
  • Cache write = 1.25× (5-minute cache) or (1-hour cache) the input price, paid once when context is first written.

So caching is a big discount — but 0.1× of a 400k-token context, re-paid on hundreds of turns, still adds up. The math is simply large context × many turns. Caching lowers the per-token rate; it doesn't change the fact that you re-send the whole thing every turn.

See it for your own sessions

tokenscope is a local, read-only CLI. It parses your Claude Code session logs (~/.claude/projects/**/*.jsonl) and shows the output / cache-read / cache-write / fresh split, the per-turn context-growth curve, and concrete "trim this" insights. Nothing is uploaded.

Run it on your latest session:

npx @wartzar-bee/tokenscope Or paste a report in your browser →

Read-only · local · no network · no telemetry.