Gateway Model Routing¶
An Anthropic-compatible gateway serves inference and publishes the model catalogue, so one config knob drives both the inference target and the model picker.
The Pattern¶
A traditional harness ships with a hard-coded model list and uses a base-URL override only to redirect inference traffic. Gateway-served models then have to be added manually with custom-model env vars or settings flags. The pattern decouples model identity from the harness binary: when the inference endpoint and the catalogue come from the same gateway, model choice follows the same configuration path as model invocation.
Claude Code 2.1.126 (2026-05-01) ships this pattern as a built-in. From the changelog: "The /model picker now lists models from your gateway's /v1/models endpoint when ANTHROPIC_BASE_URL points at an Anthropic-compatible gateway."
The Discovery Contract¶
The harness queries the gateway at startup, applies a namespace filter, and renders discovered entries in /model alongside built-ins (Claude Code: LLM gateway). Four contract points matter:
graph LR
H[Harness startup] --> Q["GET /v1/models"]
Q --> F[Filter: id starts with<br>claude or anthropic]
F --> C["Cache to ~/.claude/cache/<br>gateway-models.json"]
C --> P["/model picker:<br>built-ins + From gateway"]
Q -.->|fail| Cached[Last cached list]
Cached -.->|empty| Built[Built-in list]
- Trigger — opt-in by flag and URL. Discovery runs only when
CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1is set andANTHROPIC_BASE_URLpoints at a non-Anthropic host exposing the Anthropic Messages format (Claude Code: Model configuration). It does not run with the flag unset, for Bedrock or Vertex pass-through endpoints, nor when the base URL is unset or points atapi.anthropic.com. - Auth — the discovery request reuses inference credentials:
ANTHROPIC_AUTH_TOKENas bearer, orANTHROPIC_API_KEYasx-api-key, plus headers fromANTHROPIC_CUSTOM_HEADERS. One known gap: when credentials come only from anapiKeyHelperscript rather than an env var, discovery races the async helper and fires unauthenticated, so gateway models silently never appear (anthropics/claude-code#56675). SetANTHROPIC_AUTH_TOKEN/ANTHROPIC_API_KEYdirectly to avoid it. - Filter — only IDs starting with
claudeoranthropicare added to the picker. Each entry is labelled "From gateway" using the response'sdisplay_namefield. - Failure mode — on request failure or missing endpoint, the picker falls back to the previously cached list, then to the built-in list. The harness keeps working.
Gateway Requirements¶
Anthropic documents a minimum API contract for any gateway in front of Claude Code: it must expose /v1/messages and /v1/messages/count_tokens, and it must forward the anthropic-beta and anthropic-version request headers. "Failure to forward headers or preserve body fields may result in reduced functionality or inability to use Claude Code features" (Claude Code: LLM gateway).
Two header behaviours affect gateway operators specifically:
X-Claude-Code-Session-Idis sent on every request so proxies can aggregate per-session traffic without parsing the body.- An attribution block is prepended to the system prompt. The Anthropic API strips it before processing, so first-party prompt caching is unaffected — but a gateway running its own cache keyed on the full request body will see drift. Set
CLAUDE_CODE_ATTRIBUTION_HEADER=0to omit it (Claude Code: LLM gateway).
Capability Declaration¶
Discovery puts a model in the picker; it does not tell the harness what features that model supports. Claude Code matches IDs against built-in patterns to enable effort levels, extended thinking, and adaptive reasoning. Gateway-discovered IDs that do not match leave those features off (Claude Code: Model configuration).
For pinned defaults, declare capabilities explicitly via ANTHROPIC_DEFAULT_OPUS_MODEL_SUPPORTED_CAPABILITIES (and the Sonnet/Haiku equivalents). Values include effort, xhigh_effort, max_effort, thinking, adaptive_thinking, and interleaved_thinking. The companion _NAME and _DESCRIPTION variables override the picker label and take effect under any custom ANTHROPIC_BASE_URL (Claude Code: Model configuration).
When This Backfires¶
- Single-vendor, single-team workloads. A gateway adds an extra hop, an auth surface, and a binary in the supply chain. Without per-team budgets, multi-vendor routing, or centralised audit, the operational cost outweighs the discovery benefit.
- Non-Anthropic IDs. Gateways that publish OpenAI- or Gemini-style IDs through an Anthropic-compatible facade are filtered out by the namespace check. The fallback is a single manual entry via
ANTHROPIC_CUSTOM_MODEL_OPTION, which undermines the "single source of truth" framing the pattern is sold on. - Header-stripping proxies. Any gateway that drops
anthropic-betaoranthropic-versionsilently degrades harness features. The request succeeds; the harness ships in reduced-functionality mode. - Third-party trust surface. Anthropic does not endorse, maintain, or audit LiteLLM, and LiteLLM's PyPI versions 1.82.7 and 1.82.8 shipped credential-stealing malware (Claude Code: LLM gateway; BerriAI/litellm#24518). Standing up a gateway adds a supply-chain dependency that has to be pinned and monitored.
Example¶
A team running LiteLLM as a unified gateway in front of Claude Code uses one variable to switch both inference and discovery:
export ANTHROPIC_BASE_URL=https://litellm-server:4000
export ANTHROPIC_AUTH_TOKEN=sk-litellm-static-key
LiteLLM's unified Anthropic-format endpoint serves /v1/messages for inference and /v1/models for discovery. With CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1 set, Claude Code 2.1.126 queries the gateway on startup, filters returned IDs to those beginning with claude or anthropic, and adds them to /model labelled "From gateway." If the gateway exposes a custom Bedrock-routed Opus deployment with an ID like claude-opus-4-7-bedrock-prod, it appears in the picker without rebuilding the harness.
One caveat with LiteLLM specifically: discovery parses only the Anthropic-native /v1/models shape (type: "model", display_name, top-level has_more/first_id). LiteLLM currently returns the OpenAI shape (object: "model", Unix created), which Claude Code does not parse, so its models are filtered out until LiteLLM ships an Anthropic-format response (BerriAI/litellm#27180). Until then, the fallback is a manual ANTHROPIC_CUSTOM_MODEL_OPTION entry.
For deployments that need effort levels enabled on the gateway-served model:
export ANTHROPIC_DEFAULT_OPUS_MODEL='claude-opus-4-7-bedrock-prod'
export ANTHROPIC_DEFAULT_OPUS_MODEL_NAME='Opus via Gateway'
export ANTHROPIC_DEFAULT_OPUS_MODEL_SUPPORTED_CAPABILITIES='effort,xhigh_effort,thinking,adaptive_thinking'
This is the gateway version of pinning a Bedrock ARN (Claude Code: Model configuration). The pinned ID participates in the opus alias, the picker shows the friendly name, and the harness enables effort and thinking for the model.
Key Takeaways¶
- Gateway model routing decouples model choice from harness binary by treating an Anthropic-compatible gateway as both inference target and catalogue source.
- Discovery is opt-in by URL, namespace-filtered (
claude/anthropiconly), and degrades gracefully through cached and built-in fallbacks. - The harness contract requires
/v1/messages,/v1/messages/count_tokens, and forwardedanthropic-beta/anthropic-versionheaders — gateways that violate this silently disable features. - Capability detection is separate from discovery: declare effort and thinking support via
_SUPPORTED_CAPABILITIESfor IDs the harness does not recognise. - The pattern adds an auth surface and a supply-chain dependency; reserve it for workloads that already need centralised auth, budgets, or multi-vendor routing.
Related¶
- Cross-Vendor Competitive Routing — platform-level fan-out across vendors; gateway routing is the infrastructure layer that makes single-harness multi-vendor practical.
- Cost-Aware Agent Design — within-harness tier selection that runs on top of gateway-discovered models.
- Model Deprecation Lifecycle — operational wrapper for migrating gateway-routed model IDs.
- Per-Model Harness Tuning — per-model configuration once a gateway exposes multiple options.
- Managed vs Self-Hosted Harness — trade-off frame that gateways sit inside.
- Copilot CLI BYOK Local Models — comparable BYOK pattern in a different harness.