Skip to content

MCP Server Design: Building Agent-Friendly Servers

A well-designed MCP server makes the right tool call obvious. A poorly designed one burns tokens on retries, confuses routing, and forces blind debugging.

First Decision: Tool, Resource, or Prompt?

Picking the wrong primitive creates friction before naming or schema design matters.

Primitive Controlled By Use When Example
Tool Model (agent invokes) Agent takes action or fetches dynamic data create_issue, search_logs
Resource Application (client attaches) Read-only context the agent sees but cannot invoke Project config, schema, env info
Prompt User (slash command) Reusable multi-step workflows /summarize-pr, /deploy-staging

Resources support audience and priority annotations for client-side filtering; tools can return resource_link references instead of embedding full content.

Tool Naming

The spec allows 1--128 characters using A-Z a-z 0-9 _ - . with no spaces. Conventions that work:

  • snake_case -- used by >90% of public MCP servers (zazencodes analysis)
  • verb_noun pattern -- search_customer_orders not query_db_orders
  • 32 characters or fewer -- descriptive but still matches tool search
  • No version numbers or abbreviations -- search_products not prod_lookup_v2

Tool search matches names and descriptions; opaque names cause routing failures (Anthropic).

Schema Design

inputSchema must be a valid JSON Schema object (use {"type":"object","additionalProperties":false} for parameterless tools). Schemas define types and constraints but not format conventions or domain usage -- supplement with examples. In Anthropic tests, 1--5 realistic examples raised accuracy from 72% to 90% (Advanced Tool Use).

What Good Schema Design Looks Like

{
  "name": "search_logs",
  "description": "Search application logs by time range and severity. Returns max 100 entries. Use list_services first to get valid service names. Do NOT use for metrics -- use query_metrics instead.",
  "inputSchema": {
    "type": "object",
    "properties": {
      "service": {
        "type": "string",
        "description": "Service name from list_services (e.g., 'auth-api', 'payment-worker')"
      },
      "severity": {
        "type": "string",
        "enum": ["debug", "info", "warn", "error", "fatal"],
        "default": "error"
      },
      "since": {
        "type": "string",
        "description": "ISO 8601 timestamp. Must be within last 30 days. Example: '2026-03-01T00:00:00Z'"
      }
    },
    "required": ["service", "severity"],
    "additionalProperties": false
  }
}

Enums reduce guesswork, defaults handle common cases, descriptions pair constraints with examples, and negative guidance tells the agent when not to call.

Output Schema and Annotations

outputSchema enables structured content validation. Return both structuredContent (validated) and serialized JSON in content for backwards compatibility. Tool annotations (readOnlyHint, destructiveHint, idempotentHint, openWorldHint) are metadata only, not trustable from untrusted servers. Set idempotentHint: true for tools following the idempotent operations pattern.

Error Handling

MCP has two error channels:

flowchart LR
    A[Tool Call] --> B{Error Type}
    B -->|Structural| C["Protocol Error<br/>JSON-RPC -32602/-32603<br/>Missing params, invalid method"]
    B -->|Business Logic| D["Tool Execution Error<br/>isError: true in result<br/>Validation failures, not-found"]
    D --> E["Agent reads error,<br/>self-corrects, retries"]

Protocol errors (JSON-RPC codes) are for the client. Tool execution errors (isError: true) are for the agent; the spec states these should contain "actionable feedback that language models can use to self-correct and retry."

Actionable Error Pattern

Error Style Agent Can Self-Correct?
"Error" No
"Invalid date format" Maybe
"Invalid departure date: must be in the future. Current date is 2026-03-13." Yes

Include what was wrong, the constraint, and context to fix it -- the poka-yoke principle applied to errors, eliminating guesswork that drives retry loops (Anthropic).

Token Efficiency

Large tool catalogs can consume tens of thousands of tokens before the agent processes a request -- a server problem, not just a client problem.

The Scale of the Problem

Approach Tokens Success Rate
All tools loaded upfront (2,500 endpoints) ~1,170,000 Variable
Tool search (top-k matching) ~8,700 Comparable
Code Mode (typed SDK + 2 tools) ~1,000 Not reported
Dynamic Toolsets (search/describe/execute) 96% reduction 100% reported

Sources: Anthropic, Cloudflare, Speakeasy.

Server-Side Mitigations

  • Keep tool lists small. Single responsibility per server; non-overlapping toolsets.
  • Design for lazy discovery. Agents discover tools contextually, not upfront (Bui 2026). Write clear server instructions so tool search finds yours.
  • Make responses clearable. Return only what the agent needs next. Tool result clearing is "one of the safest lightest touch forms of compaction" (Anthropic).
  • Schemas dominate per-tool token cost. Trim optional fields; consider $ref deduplication for shared types.

When This Backfires

The checklist assumes a stable, internally-owned API. Conditions that invert that:

  • Enums vs. evolving upstream APIs. Enumerated values (enum) encode a snapshot; when the upstream adds one, agents hit validation failures until redeploy. Thin string types trade strict validation for durability.
  • Schemas do not cover input sanitization. The STDIO execution model in Anthropic's official MCP SDKs runs commands even when the local process fails to start, exposing servers to command injection unless the author sanitizes inputs (OX Security, SecurityWeek). Argument sanitization is the mitigation, not richer schemas.
  • Description drift. Hand-written descriptions are an artifact to keep in sync. Auto-generated wrappers lose prose quality but cannot drift.
  • Over-consolidation hurts routing. One polymorphic tool pushes disambiguation into the schema; the right ceiling depends on description distinctness, not count.

Server Design Checklist

[ ] Each tool follows verb_noun snake_case naming
[ ] Every parameter has a description with constraints and examples
[ ] Enums and defaults are used wherever possible
[ ] Tool descriptions state when NOT to use the tool
[ ] Errors include the constraint, the violation, and recovery context
[ ] Read-only context is exposed as resources, not tools
[ ] Tool list is under 15 tools per server
[ ] Responses return only what the agent needs for its next decision
[ ] Server has clear instructions for tool search discoverability

Key Takeaways

  • Pick the right primitive first: tools for agent-invoked actions, resources for read-only context, prompts for user-triggered workflows.
  • Use verb_noun snake_case names under 32 characters; include 1–5 realistic examples in descriptions to push accuracy from 72% to 90%.
  • Put self-correction into error messages: include the constraint, the violation, and recovery context so agents can retry without a human.
  • Keep tool catalogs small — large lists burn thousands of tokens before the agent processes a single request; design for lazy discovery.
  • Enums and additionalProperties: false reduce guesswork; negative guidance ("do NOT use for metrics") prevents misrouting.
Feedback