Reduce Your AI Coding Energy Consumption: A Sustainable Developer's Playbook
Seven concrete tactics to cut the energy and carbon footprint of your Claude Code and Codex usage — model selection, prompt caching, session hygiene, and extended thinking discipline.
You can cut the energy footprint of your Claude Code and Codex usage by 40 to 70 percent without slowing down, losing quality, or sacrificing the agentic workflows that make modern AI coding useful. The levers are behavioural, not infrastructural — model selection, prompt caching discipline, session concision, and judicious use of extended thinking. This playbook covers the seven tactics that actually move the number, in rough order of impact, with concrete heuristics for when each one applies.
If you want to understand the energy physics underneath these recommendations first, see how much energy does AI coding use. If you want to measure your progress, the Eco Score on your LobsterOne dashboard tracks exactly these levers.
1. Route Tasks to the Smallest Capable Model
The single biggest sustainability lever is model discipline. Claude Opus uses roughly 7–10× the energy per output token that Haiku does, based on parameter counts and the best public energy benchmarks. Sonnet sits in between at roughly 4× Haiku. A developer who defaults every prompt to Opus is burning compute on tasks that Haiku would handle equally well.
The heuristics:
- Haiku: grep-style search across a repo, file summarisation, routing between agents, classification, short-form extraction. Anything where the answer structure is predictable and the reasoning depth is shallow.
- Sonnet: the default for most coding tasks — implementing a feature, writing tests, refactoring, debugging typical issues, reviewing a diff. This is where you should live.
- Opus: architectural decisions, complex debugging with multiple interacting systems, anything requiring the model to hold and reason over a large mental model of the codebase. Worth the energy cost when the task actually needs it.
Claude Code supports model selection per interaction. Use it. If you are using an IDE integration, check whether the default can be configured — the out-of-the-box default is usually tuned for capability, not sustainability.
2. Keep Your Context Stable Between Turns
Prompt caching is the second-biggest lever, and it is mostly invisible. When you send a turn to Claude, the model caches the prefilled context (system prompt, CLAUDE.md, prior conversation, loaded files). The next turn, if the cached prefix is unchanged, skips recomputation entirely and pays about one tenth of the normal input cost.
Anything that changes the prefix invalidates the cache:
- Editing CLAUDE.md mid-session
- Switching between projects without starting a new session
- Loading a different set of files into context
- Changing the system prompt or persona
- Manual context resets
A workflow with stable context accumulates 30–100× cache reuse ratios. A chaotic workflow runs under 10×. That difference roughly doubles the per-turn energy of the chaotic workflow, because full-price prefill replaces cache reads.
The practical discipline: change scope at session boundaries, not mid-conversation. If you need to swap projects, close the session and start a fresh one. Your Eco Score’s cache efficiency sub-score tracks this directly.
3. Start Fresh Sessions Instead of Running Marathons
A 200-million-token session is not twice as expensive as a 100-million-token session. It is closer to five times as expensive, because attention compute is O(n²) in the context length and the cache becomes less effective as the conversation drifts. A fresh session that covers the same task in 30 million tokens is structurally more efficient than a drifting session that accumulates to 300 million.
The warning signs of a session that should have been ended an hour ago:
- You keep asking the model to “forget about the previous part”
- Recent turns are answering questions tangential to the original goal
- The context window is 50%+ full and you are not referring back to the early parts
- You have already shipped the thing you were working on but kept the session going
None of these are dramatic. They are the normal drift of a productive work session. The fix is to treat session endings like commit boundaries — intentional, frequent, and cheap.
For session analytics and drift detection at the individual and team level, see our session analytics guide.
4. Use Extended Thinking Deliberately
Extended thinking (Claude’s “thinking” mode, OpenAI’s o3-class reasoning) multiplies the decode cost of a query by 5–10×. The model generates a large volume of internal reasoning tokens before producing the final answer. For genuinely hard problems — algorithmic design, non-obvious debugging, architectural trade-offs — the extra compute is justified. For the routine 80% of coding tasks, it is energy waste.
Turn it off by default. Turn it on when:
- The problem requires the model to reason across multiple layers (data model → API → UI) simultaneously
- You have a complex bug whose cause is not obvious
- You need the model to weigh trade-offs, not just execute a pattern
- The stakes warrant it — production incident, critical algorithm, safety-sensitive code
Turn it off when:
- You are writing a test
- You are refactoring
- You are summarising
- You are reading code
- The answer pattern is predictable
5. Ask for Diffs, Not Full Rewrites
When you need a small change to a file, ask for the change, not a rewrite. “Update the retryWithBackoff function to double the delay between retries” will produce a focused diff. “Rewrite utils.ts to improve the retry logic” will produce a full-file regeneration of code that is 95% unchanged. The output tokens on a full-file rewrite are often 10–50× what the actual change required.
The prompts that produce concise output:
- “Change X to Y in
file.ts” - “Add an early-return guard at the top of
function X” - “Update
handleSubmitto also validate the email format before submission”
The prompts that produce bloated output:
- “Improve
file.ts” - “Make this better”
- “Clean this up”
Specific prompts are also better prompts, so this tactic is roughly free — the sustainability benefit is a consequence of being specific, not a tradeoff against it.
6. Batch Related Questions Into One Turn
Each conversational turn carries the full prefill cost of your context, regardless of how much you actually ask. Ten small turns cost roughly ten times the prefill energy of one turn that asks ten things. For related questions — reviewing a diff across multiple files, asking for several small refactors, gathering information about the same subsystem — combine them into a single prompt.
This does not mean writing 50-question mega-prompts. It means not sending a separate turn to ask “what does this function do?” followed by another turn asking “what calls it?” when you could have asked both at once.
7. Monitor Your Drift
Behaviours that start efficient drift. You start with disciplined Sonnet use and slowly default to Opus for everything because it feels safer. Your session hygiene erodes as a Friday afternoon debugging session becomes a four-hour marathon. Your cache hit rate tanks the week you are juggling three projects at once.
The counterfactual is hard to see without instrumentation. The Wh-per-turn chart on your Climate page is designed to surface it. If your energy-per-turn is rising while your output-per-turn stays flat, that is measurable drift. The divergence banner above the chart quantifies it.
A rough personal rule: if your composite Eco Score drops 10 points in a week, something changed in your workflow worth investigating.
The One Anti-Pattern to Avoid
Do not treat Eco Score as a KPI to optimise blindly. The score is a feedback loop, not a target. The highest-scoring behaviour is to use Haiku for everything — which would produce worse code and cost you more in engineering time than you saved in energy. The score is useful because it makes the invisible visible, not because maximising it is a goal.
The goal is to use the right amount of the right model for the work in front of you. The score rewards that because using the right model for the job is genuinely efficient. Use the score to notice when you have drifted into using the wrong model for the job — then correct, don’t chase.
Track these metrics automatically with LobsterOne
Get Started FreeFrequently Asked Questions
How much can I realistically reduce my AI coding energy consumption?
For a developer who currently defaults to Opus with thinking mode always on, a disciplined workflow (Haiku/Sonnet as default, Opus for hard problems, thinking mode reserved for the 10% of tasks that need it) typically cuts per-task energy by 60–80%. For a developer already using Sonnet as the default, the remaining headroom is 20–40% mostly in cache discipline and session concision.
Does using Haiku for simple tasks meaningfully hurt quality?
For the tasks Haiku is suited to — grep, summarisation, classification, routing — no. Haiku is excellent at these. The quality gap opens up when you push Haiku into tasks that require multi-step reasoning or architectural judgement, which is exactly where Sonnet or Opus belong.
Is it worth the cognitive overhead of picking a model per task?
After a week or two it stops feeling like overhead — you develop intuition for which task needs which model. Most tools support setting a default (Sonnet is a good one) and overriding per interaction. The cognitive cost is much smaller than it feels at first.
What about Cursor, Cody, and other AI coding tools?
The same principles apply. Model selection, cache discipline, and session concision are the three levers that matter regardless of which tool you use — they reflect the underlying physics of LLM inference, not any one tool’s UX. Some tools make the levers easier to pull than others.
How do I know my cache is actually being hit?
The token breakdown in your LobsterOne dashboard splits input tokens into fresh vs cache-read. If cache-read is 10× or more than fresh input, you are caching well. If they are comparable, your cache is being invalidated often — trace back to what you are changing between turns.
Pierre Sauvignon
Founder
Founder of LobsterOne. Building tools that make AI-assisted development visible, measurable, and fun.
Related Articles

Introducing the Eco Score: The First Sustainability Metric for AI-Assisted Coding
Every Claude Code and Codex turn has an energy cost. The Eco Score makes your AI coding climate impact visible — with a composite 0–100 score, three sub-scores, and a leaf rating you can actually improve.

How Much Energy Does AI Coding Use? A Developer's Guide to LLM Carbon Footprint
The public data on Claude Code and Codex energy consumption — Wh per token, per session, per workday — triangulated from Epoch AI, Google's Gemini disclosure, and peer-reviewed benchmarks. What the numbers actually mean for developers.

AI-Assisted Coding Workflow Patterns That Ship Faster
Five proven workflow patterns for AI-assisted development — scaffold-then-refine, test-first-then-implement, review-loop, spike-and-stabilize, pair-with-AI.