Semantic Tool Output: Designing for Agent Readability¶

Return human-readable, contextually filtered output from agent tools to reduce hallucination and improve downstream call accuracy.

Also known as

Tool Output Design, Token-Efficient Tool Design, Agent-Friendly Output. For the cost angle — designing tool outputs to minimize token consumption — see Token-Efficient Tool Design.

Why Output Design Matters¶

Agents reason over tool output as natural language. When tools return opaque identifiers, machine-oriented fields, or oversized payloads, they waste context and make it harder for the model to extract what matters (Anthropic, Writing effective tools for agents). Output format is a reliability lever independent of model capability.

Principles¶

Replace Identifiers with Semantic Equivalents¶

UUIDs, MIME types, and internal codes are opaque to agents:

{"id": "a3f4b2c1-...", "type": "application/vnd.openxmlformats-officedocument.wordprocessingml.document"}

Replace them with the natural-language fields the agent will reason about:

{"name": "Q3 Budget Review", "file_type": "Word document"}

Resolving alphanumeric UUIDs to semantic language (or even a 0-indexed scheme) "significantly improves Claude's precision in retrieval tasks by reducing hallucinations" (Anthropic). The agent can then reference the object by name without miscopying an identifier.

Return Only Contextually Relevant Fields¶

Omit data the agent will never use. A tool returning 40 fields when the agent needs name and email wastes context on 38 irrelevant fields, and every extra field is a hallucination opportunity — the agent may reference, invent, or misinterpret fields it was not asked to act on. Anthropic's tool-use guidance is to return only high-signal information and include only the fields Claude needs for its next step (Claude API docs, Tool use). Design schemas around decision-relevant fields, and expose expanded output as an optional mode.

Implement Pagination and Filtering at the Tool Layer¶

Tools that return full datasets shift filtering to the agent, which either hallucinates the filter or loads the entire dataset into context. Anthropic recommends "some combination of pagination, range selection, filtering, and/or truncation with sensible default parameter values" for any response that can grow large (Anthropic). In practice:

Accept filter parameters (status=open, created_after=2024-01-01) and apply them before returning results.
Return a page of results with a cursor, not an unbounded list.
Provide sensible defaults (limit=20) that prevent accidental context flooding.

Use Enums for Response Granularity¶

When an agent needs different levels of detail at different points, expose a response_format enum (Anthropic) rather than always returning full or minimal output:

{"response_format": "concise"}  // summary fields only
{"response_format": "detailed"} // full record with all fields

The agent selects the appropriate format based on its current context budget and task requirements.

Make Errors Actionable¶

Error responses should tell the agent what went wrong and how to fix it:

{"error": "Invalid date range: end_date must be after start_date. Received start=2024-03-01, end=2024-02-01."}

Not:

{"error": "400 Bad Request"}

Errors should "clearly communicate specific and actionable improvements, rather than opaque error codes or tracebacks" (Anthropic), letting the agent self-correct on the next call without human intervention.

Why It Works¶

LLMs are trained on next-token prediction and perform better with formats that match their training data (Anthropic). UUIDs and MIME strings are arbitrary byte sequences — agents grapple with natural-language identifiers significantly more successfully than with cryptic ones, reducing hallucinations in retrieval tasks. Returning only decision-relevant fields removes irrelevant signals the model might reference or misattribute, keeping the tool result tightly scoped to what the next action actually requires.

When This Backfires¶

Semantic filtering at the tool layer has failure modes:

Under-specification: a task-specific schema omits a field the agent unexpectedly needs. The agent either hallucinates the value or makes an extra round-trip — sometimes more expensive than returning the full record once.
Concise/detailed mismatch: when response_format is exposed but the agent picks the wrong mode, it operates on incomplete data without knowing it. Prompting the agent to reason about its data needs before calling the tool reduces this risk.
Schema drift: a "clean default" shaped by the first use case becomes misaligned as new tasks arrive, unless you version the schema or gate expansion behind opt-in.

When output scope is genuinely unpredictable across callers, a richer default with well-named fields is safer than a narrow schema that forces multiple calls.

Anti-Pattern: Developer-Convenience Output¶

Tools built for developer debugging often return everything — raw database records, full object graphs, internal identifiers, debug fields. That is fine for a developer reading a terminal. It is the wrong default for an agent consuming output in a context window. The fix is not to strip developer-useful data, but to separate concerns: a debug mode for developer use, a clean default for agent use.

Example¶

A get_customer tool returns a full database record by default:

{
  "id": "cust_8f3a91b2-47c1-4e2d-b891-3c5d7a2e0f14",
  "name": "Acme Corp",
  "email": "billing@acme.com",
  "plan": "enterprise",
  "stripe_id": "cus_NffrFeUfNV2Hib",
  "created_at": "2023-06-15T09:30:00Z",
  "updated_at": "2024-11-02T14:22:31Z",
  "metadata": {"segment": "mid-market", "csm_id": "emp_442"},
  "feature_flags": ["beta_dashboard", "v2_api"],
  "billing_address": { "line1": "123 Main St", "city": "Portland", "state": "OR", "zip": "97201" },
  "mrr_cents": 249900
}

An agent asked to "email Acme Corp their current plan details" needs three fields. Returning all twelve forces it to parse irrelevant data and risks it hallucinating references to stripe_id or feature_flags in the email. Redesign the tool to return a semantic, filtered response:

{
  "name": "Acme Corp",
  "email": "billing@acme.com",
  "plan": "Enterprise",
  "monthly_price": "$2,499.00"
}

The agent now has exactly what it needs — a human-readable name, the contact address, and a formatted price — with no opaque identifiers to misinterpret.

Key Takeaways¶

Replace opaque identifiers with semantic equivalents the agent can reference naturally.
Return only the fields that are decision-relevant for the tool's purpose.
Apply filtering and pagination at the tool layer, not in the agent's reasoning.
Use a response_format enum to let the agent match output depth to context budget.
Write error messages that diagnose the problem and specify the correction.

Agent-Computer Interface (ACI) — semantic output is one of four ACI design principles; affordances, constraints, and error prevention are the other three
Token-Efficient Tool Design
Graceful Tool-Output Truncation: The PARTIAL Signal — what to return when filtered output still overflows
Terminal Tool Output Compression — harness-side filtering when the tool itself cannot be redesigned
CLI Scripts as Agent Tools: Return Only What Matters
Machine-Readable Error Responses for AI Agents (RFC 9457)
Poka-Yoke for Agent Tools
Context Compression Strategies