Claude Code doesn’t run everything on one model. It has tiers. A heavy “thinking” model carries the plan and the hard reasoning. Cheaper models do the background grunt work — summaries, quick edits, the subagents your Task tool spins up. Anthropic charges you their prices for all of it.
That split is the whole opportunity. The reasoning is where you want a strong model. The grunt work is where you want fast and cheap. There’s no rule that says both have to come from the same vendor — or from Anthropic at all.
So I put a LiteLLM proxy in the middle. The planning tier routes to a fast GLM 5.2 on Baseten. The coding, background, and subagent tiers route to cheap budget models on OpenCode Go. Claude Code stays completely stock — it thinks it’s talking to Anthropic, sends normal Anthropic requests, and LiteLLM translates each one to whatever backend I mapped that tier to.
This is a writeup of an actual setup — real config shape and ports, placeholder secrets. Substitute your own keys and model ids; provider catalogs change often, so don’t copy the model strings blind.
Why this shape
- One proxy, many backends. Claude Code only knows how to talk to a single Anthropic-shaped endpoint. LiteLLM fronts that endpoint and fans out per tier.
- Cheapest capable model per tier. Heavy turns go to GLM 5.2. Background and quick turns go to a cheap, fast model. You pay for thinking where you need it and almost nothing where you don’t.
- Claude Code stays stock. No patches, no forks, no flags. Point two env vars at the proxy and you’re done.
1. Install LiteLLM
Use a venv so the proxy is isolated. uv is fast; plain python -m venv works too.
uv venv .venv
uv pip install --python .venv 'litellm[proxy]'
The [proxy] extra pulls in the proxy server (the litellm --config … mode). Without it you only get the Python library.
Security — pin your version. LiteLLM’s PyPI releases 1.82.7 and 1.82.8 shipped credential-stealing malware. Pin away from those two. This setup uses 1.90.0. Do not downgrade into the bad pair.
2. Drop your keys in .env
The launcher sources .env, so provider keys live there:
BASETEN_API_KEY=baseten-XXXXXXXX
OPENCODE_API_KEY=opencode-XXXXXXXX # works for both Zen and Go plans
LITELLM_MASTER_KEY=sk-litellm-local # the proxy's OWN auth (not a provider key)
# Critical for OpenCode Go — see the gotchas section.
LITELLM_USE_CHAT_COMPLETIONS_URL_FOR_ANTHROPIC_MESSAGES=True
LITELLM_MASTER_KEY is how the proxy authenticates you. Claude Code sends it as ANTHROPIC_AUTH_TOKEN, and the two values have to match (default sk-litellm-local). That last env var isn’t optional if you use OpenCode Go — skip ahead to the gotchas if you want to know why before you trust it.
3. Write the routing config — this is the whole idea
litellm.config.yaml is a list of model_name → backend mappings. The model_name is the tier Claude Code asks for; litellm_params.model is the real backend, prefixed with the provider so LiteLLM knows how to talk to it (baseten/… for Baseten, openai/… for any OpenAI-compatible endpoint).
This is where the brain/hands split lives:
model_list:
# THINKING TIER → fast GLM 5.2 on Baseten. This is the model that plans.
- model_name: claude-opus-4-8
litellm_params:
model: baseten/zai-org/GLM-5.2
api_key: os.environ/BASETEN_API_KEY
# MID TIER → OpenCode Zen (OpenAI-compatible)
- model_name: claude-sonnet-4-6
litellm_params:
model: openai/deepseek-v4-flash
api_base: https://opencode.ai/zen/v1
api_key: os.environ/OPENCODE_API_KEY
# FAST / BACKGROUND TIER → OpenCode Go (OpenAI-compatible, cheap)
- model_name: claude-haiku-4-5
litellm_params:
model: openai/kimi-k2.7-code
api_base: https://opencode.ai/zen/go/v1
api_key: os.environ/OPENCODE_API_KEY
# SUBAGENTS (Claude Code's Task tool) → cheap model on Go.
# Claude Code sends CLAUDE_CODE_SUBAGENT_MODEL's value as `model`; map it here.
- model_name: subagent
litellm_params:
model: openai/deepseek-v4-flash
api_base: https://opencode.ai/zen/go/v1
api_key: os.environ/OPENCODE_API_KEY
litellm_settings:
drop_params: true # silently drop Anthropic params the OpenAI backends reject
general_settings:
master_key: os.environ/LITELLM_MASTER_KEY
A few things that bite people:
drop_params: truematters because Claude Code sends Anthropic-only params the OpenAI backends will reject. This strips them silently.- The
openai//baseten/prefix onmodel:is not optional. LiteLLM reads the provider from that prefix. Leave it off and you getLLM Provider NOT providedat startup — and the deployment is silently skipped, so every later request for that tier 404s with “no healthy deployments.” - The separate
subagententry exists because Claude Code’s Task tool sendsCLAUDE_CODE_SUBAGENT_MODEL’s value verbatim as themodelfield. Point that env var at the literal stringsubagent(step 5) and the proxy routes every subagent wherever you want — here, a cheap model on Go.
Want it even cheaper? Make a second config —
litellm.config.go.yaml— where every tier goes through OpenCode Go (no Baseten, no Zen), using Go-catalog model ids. Run it on a different port (e.g.30181) so it sits alongside the main proxy. Now you can flip between “GLM 5.2 brain” and “all-cheap” with a single alias.
4. Start the proxy
A tiny launcher that loads keys and runs the proxy:
#!/usr/bin/env bash
set -euo pipefail
cd "$(dirname "$0")"
set -a; source .env; set +a
exec .venv/bin/litellm --config litellm.config.yaml --port 30180
Save it as start-proxy.sh, chmod +x, run it, leave it running:
./start-proxy.sh # listens on http://localhost:30180
The set -a; source .env; set +a dance exports every var from .env so the proxy actually sees the keys, then flips export back off to keep your shell clean.
Startup prints register_model: … not in built-in cost map warnings for your custom model strings. Harmless — LiteLLM just has no pricing table for them, so its cost tracking shows 0.
5. Point Claude Code at the proxy
Two ways in. The first is file-driven and tied to the repo dir; the second works from anywhere.
Settings file — .claude/settings.json, read once at startup:
{
"env": {
"ANTHROPIC_BASE_URL": "http://localhost:30180",
"ANTHROPIC_AUTH_TOKEN": "sk-litellm-local",
"ANTHROPIC_MODEL": "claude-opus-4-8",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "claude-sonnet-4-6",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "claude-haiku-4-5",
"CLAUDE_CODE_SUBAGENT_MODEL": "subagent"
}
}
Then cd into the repo and run claude. Restart Claude after editing this file. Those ANTHROPIC_DEFAULT_* vars pin each Claude Code tier to a model_name the proxy actually knows, so the /model picker always lands on something routable. Skip them and you risk sending an id the proxy can’t map.
Inline env — works from any directory, overrides settings.json for that one run:
ANTHROPIC_BASE_URL=http://localhost:30180 \
ANTHROPIC_AUTH_TOKEN=sk-litellm-local \
ANTHROPIC_MODEL=claude-opus-4-8 \
ANTHROPIC_DEFAULT_SONNET_MODEL=claude-sonnet-4-6 \
ANTHROPIC_DEFAULT_HAIKU_MODEL=claude-haiku-4-5 \
CLAUDE_CODE_SUBAGENT_MODEL=subagent \
claude
For something repeatable, alias it in ~/.zshrc:
alias ccgo='ANTHROPIC_BASE_URL=http://localhost:30181 ANTHROPIC_AUTH_TOKEN=sk-litellm-local ANTHROPIC_MODEL=claude-opus-4-8 ANTHROPIC_DEFAULT_SONNET_MODEL=claude-sonnet-4-6 ANTHROPIC_DEFAULT_HAIKU_MODEL=claude-haiku-4-5 CLAUDE_CODE_SUBAGENT_MODEL=subagent claude'
(Swap the port to 30180 for the Zen/Baseten proxy, 30181 for the all-Go one.)
6. Verify it works
With the proxy running, hit it directly with an Anthropic-shaped request:
.venv/bin/python - <<'PY'
import json, urllib.request
body = json.dumps({"model": "claude-opus-4-8", "max_tokens": 16,
"messages": [{"role": "user", "content": "reply with OK"}]}).encode()
req = urllib.request.Request("http://localhost:30180/v1/messages", data=body,
headers={"x-api-key": "sk-litellm-local", "anthropic-version": "2023-06-01",
"content-type": "application/json"})
print(urllib.request.urlopen(req, timeout=30).read().decode())
PY
Expect a 200 with an Anthropic-shaped content block. Swap the model for claude-sonnet-4-6, claude-haiku-4-5, or subagent to prove each tier independently.
Using the split in practice
Once it’s wired up, the brain/hands split isn’t just a config diagram — it changes how a session feels.
The heavy reasoning rides on GLM 5.2. That’s the Opus tier, which is what Claude Code leans on when it’s actually thinking — working through Plan mode, holding the shape of a change in its head, deciding what to do. The cheap models carry the volume: the Haiku tier’s background chores and every subagent your Task tool fans out. You’re paying for a strong reasoner on the turns that need one and pennies on the turns that don’t.
If you want to be deliberate about it, lean into the tiers: draft the approach in Plan mode on the Opus/GLM tier, then /model down to a cheaper tier to grind out the execution. Plan with the smart model, code with the cheap one. That’s the whole pitch, and now it’s a config file instead of a vendor lock-in.
Gotchas (the painful ones)
OpenCode Go 404s on /v1/responses. Every request to a Go-backed tier logs Client error '404 Not Found' for url 'https://opencode.ai/zen/go/v1/responses', then cascades into 429 No deployments available as the failing deployment goes into cooldown. The cause: LiteLLM’s Anthropic pass-through converts /v1/messages into an OpenAI Responses API call (/v1/responses). Zen implements that endpoint; Go does not — only chat/completions and messages. The fix is the env var from step 2:
LITELLM_USE_CHAT_COMPLETIONS_URL_FOR_ANTHROPIC_MESSAGES=True
It’s global but only touches the Anthropic→OpenAI path, so Zen and Baseten keep working. The tell: when you see a 404 … /v1/<endpoint> against a provider, you’re calling an API surface that provider doesn’t implement — reach for the chat-completions toggle before anything else.
Catalog drift. You’ll hit ModelError: Model <x> is not supported or There are no healthy deployments for this model because provider catalogs change often and they are not symmetric between plans. OpenCode Zen’s model list is different from Go’s, and an id that looks plausibly newer can still be the wrong plan. Check the live catalog and copy the exact id:
# Zen
curl -s https://opencode.ai/zen/v1/models -H "Authorization: Bearer $OPENCODE_API_KEY"
# Go
curl -s https://opencode.ai/zen/go/v1/models -H "Authorization: Bearer $OPENCODE_API_KEY"
Match the id field exactly in your model: line (after the openai/ prefix, or baseten/… for Baseten). A 403 instead of a model error means a backend key is wrong, not the model string.
TL;DR
uv venv .venv && uv pip install --python .venv 'litellm[proxy]' # 1. install (avoid 1.82.7/1.82.8)
# 2. put BASETEN_API_KEY / OPENCODE_API_KEY / LITELLM_MASTER_KEY +
# LITELLM_USE_CHAT_COMPLETIONS_URL_FOR_ANTHROPIC_MESSAGES=True in .env
# 3. write litellm.config.yaml: thinking tier → baseten/zai-org/GLM-5.2, rest → openai/… on OpenCode
./start-proxy.sh # 4. start it (port 30180)
ANTHROPIC_BASE_URL=http://localhost:30180 ANTHROPIC_AUTH_TOKEN=sk-litellm-local claude # 5. use it
Claude Code never knows the difference. You just stopped paying Opus prices for grunt work.