Home › Guides › Reduce your Claude Code bill

How to reduce your Claude Code bill

Most of a Claude Code bill is re-sent context, not the model writing code. So the biggest savings come from one idea: keep the context the model carries small. Here's a concrete checklist, roughly in order of impact.

First, measure — don't guess

Before changing your workflow, find out where the money is actually going. On most long sessions the dominant line is cache read (context re-sent every turn), but the only way to know your split is to look at your own logs.

npx @wartzar-bee/tokenscope Paste a report in your browser →

Local, read-only, no upload. It prints the output / cache-read / cache-write split, the context-growth curve, and which tools filled your context.

The checklist

1. Compact or restart long sessions

Cache-read cost scales with context size × number of turns. Once a session has accumulated a large context, every remaining turn re-pays for it. Running /compact collapses the history into a summary so subsequent turns carry far less; starting a fresh session for a new task resets it entirely. This is usually the single highest-leverage habit. See when to use /compact.

2. Don't keep huge files and tool output resident

Every file the model reads and every verbose command it runs gets written into context, then re-sent on subsequent turns. A few habits help:

  • Point the model at the relevant files or functions rather than asking it to read everything.
  • Avoid piping enormous logs, full test output, or giant generated files into the conversation — summarize or grep first.
  • If a big file was needed once but isn't anymore, a compact or fresh session sheds it.

3. Trim the tools that bloat context

Some tools and MCP servers inject large schemas, directory listings, or verbose results on every relevant turn. tokenscope's tool breakdown shows which tools were called most; if a tool you rarely use is adding a lot of context, disabling it removes that weight from every turn.

4. Match the model to the task

Input price drives every cache-read and cache-write charge, so the model you choose multiplies your whole context cost. Documented default rates (verify for your tier — they can change):

Model familyInput / 1MOutput / 1M
claude-opus-4$15$75
claude-sonnet-4$3$15
claude-haiku-4$1$5

Reserve the most expensive model for the reasoning-heavy work; a cheaper model for routine edits and exploration cuts the cost of the exact same context proportionally. (These are defaults; always confirm current pricing for your plan.)

5. Let caching work for you

Prompt caching already discounts re-sent context to 0.1× the input price, but the cache has a short lifetime. Long idle gaps can let it expire, forcing a re-write (1.25× for the 5-minute cache, for the 1-hour cache). Working in focused bursts rather than leaving a session idle for long stretches keeps more of your context on the cheap cache-read path.

What not to over-optimize

The model's output — the code it writes — is usually a small slice of the bill, so asking for terser answers rarely moves the needle. Don't sacrifice useful explanations to save output tokens; spend your effort on context size instead. Measure first so you optimize the line that's actually big.