Home › Guides › Context window cost

Claude Code context window cost

A large context window is convenient — but the window isn't free to fill. Whatever the model is carrying gets re-sent every turn, so the size of your live context maps almost directly onto your per-turn cost.

Capacity vs. what you actually use

It helps to separate two things people both call "context window":

  • The window's capacity — the maximum tokens a model can accept. Having a large ceiling costs nothing by itself.
  • Your live context — how much of that window is actually filled right now: the files, tool output, and history the model is carrying. This is what you pay for, because it's what's re-sent each turn.

So a big window is only expensive if you fill it. The cost lever is your live context size, not the model's maximum.

How window usage becomes dollars

Once context is cached, each turn re-reads it at 0.1× the input price. The per-turn cost of just carrying context is therefore about:

per-turn carry cost ≈ live context tokens × input price × 0.1

Two contexts, same model, same number of turns — the bigger live context costs proportionally more on every single turn. Double the resident context and you roughly double the carry cost for the rest of the session. Over hundreds of turns that proportionality is what turns a comfortable window into a large bill. (Input price varies by model; see the bill-reduction guide for the default rates.)

Why letting the window fill up is a trap

Because the carry cost is paid every turn, context that goes in early is re-billed the most times. Filling a large window near the start of a long session means you re-pay for all of it on every subsequent turn. The cost isn't the moment you read the big file — it's the long tail of turns that re-send it.

A roomy window invites you to dump in more files and bigger output "just in case." Each addition is re-sent every turn after. Keep the live context to what the current task actually needs.

Keeping live context small

  • Compact or restart when context has grown — it directly lowers the live size for the turns ahead. See when to use /compact.
  • Be selective about what you read in. Targeted files beat reading whole trees; grep or summarize huge output instead of pasting it whole.
  • Disable tools that inject large context on every turn if you're not using them.

All of these reduce the one number that drives the carry cost: how much the model is holding at once.

See your context curve

tokenscope plots your context size per turn (peak, average, and where it grew), so you can see how full your window got and when. It reads your local logs only — nothing is uploaded.

npx @wartzar-bee/tokenscope Paste a report in your browser →

Read-only, local, no upload, no telemetry.