Capacity vs. what you actually use
It helps to separate two things people both call "context window":
- The window's capacity — the maximum tokens a model can accept. Having a large ceiling costs nothing by itself.
- Your live context — how much of that window is actually filled right now: the files, tool output, and history the model is carrying. This is what you pay for, because it's what's re-sent each turn.
So a big window is only expensive if you fill it. The cost lever is your live context size, not the model's maximum.
How window usage becomes dollars
Once context is cached, each turn re-reads it at 0.1× the input price. The per-turn cost of just carrying context is therefore about:
per-turn carry cost ≈ live context tokens × input price × 0.1
Two contexts, same model, same number of turns — the bigger live context costs proportionally more on every single turn. Double the resident context and you roughly double the carry cost for the rest of the session. Over hundreds of turns that proportionality is what turns a comfortable window into a large bill. (Input price varies by model; see the bill-reduction guide for the default rates.)
Why letting the window fill up is a trap
Because the carry cost is paid every turn, context that goes in early is re-billed the most times. Filling a large window near the start of a long session means you re-pay for all of it on every subsequent turn. The cost isn't the moment you read the big file — it's the long tail of turns that re-send it.
Keeping live context small
- Compact or restart when context has grown — it directly lowers the live size for the turns ahead. See when to use /compact.
- Be selective about what you read in. Targeted files beat reading whole trees; grep or summarize huge output instead of pasting it whole.
- Disable tools that inject large context on every turn if you're not using them.
All of these reduce the one number that drives the carry cost: how much the model is holding at once.
See your context curve
tokenscope plots your context size per turn (peak, average, and where it grew), so you can see how full your window got and when. It reads your local logs only — nothing is uploaded.
npx @wartzar-bee/tokenscope
Paste a report in your browser →
Read-only, local, no upload, no telemetry.