Production MCP Agent Stack¶
Moving an MCP agent from prototype to production means sequencing six orthogonal decisions that constrain each other. The patterns are well-documented; the sequence and the cross-pattern gotchas are where real deployments go wrong.
Anthropic's production MCP guidance (April 2026) frames MCP as "the critical layer" for cloud-resident agents. The compositional order it skips — which decision forecloses which, which combinations silently break — is what this page captures.
The Six-Axis Decision Space¶
| Axis | Option A | Option B | Option C |
|---|---|---|---|
| Server location | Local / stdio | Remote (HTTP/SSE) | — |
| Tool grouping | Flat 1:1 with API | Intent-grouped | Code-orchestrated (search + execute) |
| Schema delivery | Eager (all tools loaded) | Deferred via tool search (defer_loading: true) |
— |
| Result processing | Raw-to-context | Programmatic tool calling (sandboxed) | — |
| OAuth client registration | Static / pre-registered | Dynamic Client Registration (DCR) | Client ID Metadata Documents (CIMD) |
| Token storage | Per-session credentials | Vault with refresh | — |
Each axis has a defensible answer in isolation; the production question is which combinations compose cleanly.
Decision Sequencing¶
Resolve in this order — each locks the option space for the next.
graph TD
A[Server location] -->|Remote| B[OAuth required]
A -->|Local| B2[Credential file or no auth]
B --> C[CIMD vs DCR]
C --> D[Token storage: vault or per-session]
A --> E[Tool grouping]
E -->|Flat or grouped| F[Schema delivery]
E -->|Code-orchestrated| G[Sandbox availability]
F -->|Tool search| H[Result processing]
G --> H
H -->|Programmatic| I[Sandbox + ZDR check]
- Server location. Remote-first is the only configuration that reaches web, mobile, and cloud-hosted agents (Anthropic, April 2026). Remote forces OAuth; local can use filesystem credentials.
- OAuth flow. The 2025-11-25 MCP spec adds CIMD as the recommended registration mechanism — faster first-time flow and fewer re-auth prompts than DCR. Caveat: CIMD provider support is still uneven (Keycloak experimental; WorkOS, Auth0, Authlete shipping through 2026), and practitioners report real deployments often support both — CIMD for fast-moving distributed clients, DCR for vetted high-governance ones (Scalekit, 2026). Pick CIMD-first only when your IdP supports it.
- Token storage. For multi-user cloud agents, Claude Managed Agents vaults register tokens once, inject them at session creation, and refresh automatically.
- Tool grouping. Flat 1:1 API mirrors degrade at scale — LongFuncEval (2025) reports 7–85% selection-accuracy drops as catalogs grow. Intent-grouping (toolset agentization) shrinks the 1-of-N problem; code-orchestration is the extreme form.
- Schema delivery. Tool search with
defer_loading: truecuts tool-definition tokens by 85%+ but retrieves from whatever catalog you ship. - Result processing. Programmatic tool calling cuts tokens ~37% on multi-step workflows but needs a sandbox and is not Zero Data Retention eligible.
Cross-Pattern Gotchas¶
Failure modes that matter in production only appear when patterns combine.
Dynamic fetching nukes the prompt cache — unless it's tool search. Rebuilding the tool list per step invalidates the cache prefix because tool definitions sit atop the hierarchy (tools → system → messages). See the dynamic tool fetching anti-pattern. Tool search with defer_loading: true sidesteps this — deferred tools are excluded from the cacheable prefix (Anthropic advanced tool use).
Tool search and input_examples are mutually exclusive per catalog. Server-side tool search cannot surface tools that carry input_examples (error handling). Catalogs that rely on examples need standard calling or client-side search.
Retrieval quality binds at very large catalogs. Independent testing across 4,027 tools reports 56% (regex) and 64% (BM25) accuracy on straightforward queries (Arcade.dev, December 2025) — well below Anthropic's internal benchmarks. Plan custom client-side retrieval past a few thousand tools.
Programmatic calling is not ZDR-eligible and loses intermediate reasoning (data retention). Only stdout returns.
Intent-grouping benefits from trajectory data. Regroup from real co-invocation traces once traffic lands (toolset agentization).
Example: Cloudflare's Two-Tool MCP Server¶
Cloudflare's MCP server is the reference extreme of intent-grouping + code-orchestration. The API covers ~2,500 endpoints across Workers, DNS, Zero Trust, and the dashboard — a flat mirror would consume tens of thousands of tokens in definitions alone.
The design exposes two tools — search and execute — in roughly 1K tokens total (Anthropic, April 2026). Programmatic calling compounds the win: for "enable DNSSEC on all zones where it's disabled," the agent loops in a sandbox and returns only changed zones, instead of pulling thousands of records into context.
Every layer lines up: remote server → intent grouping at its extreme → deferred schemas unnecessary → programmatic calling for large result sets → OAuth + vault on auth.
When Not to Deploy the Full Stack¶
The stack earns its complexity at cloud-hosted multi-user scale. Overkill when:
- Under ~20 stable tools, single agent, single tenant. Intent-grouping and tool search add round-trips with no token benefit; direct API or CLI is simpler.
- Air-gapped or on-prem with no sandbox. Programmatic calling is inert without trusted code execution.
- Retrieval accuracy floor above ~80% on large catalogs. Server-side tool search drops below 65% at 4,000+ tools (Arcade.dev) — plan custom retrieval, or split the catalog.
- Catalogs that depend on
input_examples. Tool search is mutually exclusive with examples; pick one per catalog.
Key Takeaways¶
- Six decisions — server location, tool grouping, schema delivery, result processing, OAuth, token storage — constrain each other; sequence matters more than any individual choice.
- Remote-first server forces OAuth, which forces the CIMD-vs-DCR decision, which shapes whether you need a vault.
- Tool search with
defer_loading: trueis the one form of dynamic tool loading that does not break the prompt cache; naive dynamic fetching does. - Programmatic calling and
input_examples+ tool search are the two composition traps — verify sandbox availability and example-vs-search per catalog before committing. - Cloudflare's two-tool server over ~2,500 endpoints is the reference case for how far intent-grouping and code-orchestration scale when every layer lines up.
Related¶
- MCP Server Design — remote-first, primitive choice, schema design on the server side.
- MCP Client Design — host-side lifecycle, caching, and multi-server routing.
- Toolset Agentization — intent-grouping as a sub-agent pattern with trajectory-based adaptation.
- Advanced Tool Use — tool search (
defer_loading), programmatic calling, and input examples in depth. - Dynamic Tool Fetching Breaks KV Cache — the load-bearing gotcha that tool search sidesteps.
- MCP Protocol — the open standard these patterns build on.