Tool Description Quality¶
Tool descriptions — not just tool implementations — determine whether agents select the right tool for a task. Treating descriptions as prompt engineering surfaces is a direct multiplier on task success rate.
Also known as
Tool Selection Guidance, Selection Signals
Selection as a Reasoning Step¶
Agents do not browse a tool catalog before acting. They select tools by reasoning about which available tool best matches their current intent. A poorly described tool is invisible for use cases its description fails to communicate — even if the implementation would handle them correctly.
Per Anthropic's multi-agent research system post, improving tool ergonomics — including descriptions — reduced task completion time by 40% for agents using the updated tools.
The mechanism: tool descriptions are embedded into the agent's context at the reasoning step. Richer, distinctive descriptions create stronger semantic signals that align agent intent with the correct tool. Research on tool-level retrieval for multi-agent systems confirms this: coarse descriptions cluster functionally different tools together in embedding space, making correct selection unreliable.
Instruct Agents to Examine Tools First¶
When a tool set includes both generic and specialized tools, agents tend to match on the first plausible tool — often a generic one. Making the preference explicit in the system prompt counters this: "Before acting, review your available tools and select the one that best matches the task. Prefer specialized over generic tools." An agent that defaults to a generic search tool when a specialized domain-specific tool is available produces lower-quality results.
MCP Server Tool Descriptions¶
MCP servers expose many tools at once. Unclear descriptions at this scale cause systematic misuse: every agent makes the same wrong selection decision, compounding across all invocations. For MCP tools:
- Each tool description must be independently self-contained — agents may not have context from adjacent tools
- Do not assume agents read related tools before selecting the current one
- Include domain context in each description, not just in a top-level server description
Testing Tool Selection¶
Tool selection failures are often invisible during development — an agent calling the wrong tool with a plausible-looking result won't surface the error until compared against ground truth. To test selection:
- Instrument agent traces and log which tool was selected for each task type
- Compare selected tools against ground truth for a representative set of test cases
- Refine descriptions based on observed misselection patterns, not intuition about what descriptions should say
Iterating on Descriptions¶
Description iteration follows the pattern of prompt iteration: observe, identify failures, change, measure. The most common failure mode: a description accurate enough to describe what the tool does but not specific enough to tell the agent when to prefer it over alternatives.
The fix is positive selection signals: "Use this tool when X" and "Prefer this over [other tool] when Y." These are instructions to the agent, not documentation of the interface.
Example¶
The following pair shows the same MCP tool with a weak description and an improved one. The weak version is accurate but leaves selection decisions to the agent.
# Before: accurate but minimal — agent must guess when to use it
{
"name": "search_issues",
"description": "Search for issues in the project tracker.",
"inputSchema": {
"type": "object",
"properties": {
"query": {"type": "string"}
},
"required": ["query"]
}
}
# After: includes query syntax, return shape, and when to prefer it over alternatives
{
"name": "search_issues",
"description": (
"Search for issues in the project tracker. "
"Returns a list of issues matching the query, each with id, title, status, and assignee. "
"Supports field filters: status:open, status:closed, assignee:<username>, label:<name>. "
"Use this tool to find issues by keyword or filter. "
"Prefer this over list_issues when you have a search term or filter criteria. "
"Use list_issues instead when you need all issues in a project without filtering."
),
"inputSchema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search keywords and/or field filters. Example: 'login page status:open assignee:alice'"
}
},
"required": ["query"]
}
}
The improved description answers all three questions the page identifies: what the tool does, what it returns, and when to use it instead of list_issues. The query syntax example eliminates trial-and-error on filter format.
When This Backfires¶
Each description adds tokens on every invocation. Three conditions where this matters:
- Large MCP servers (50+ tools): verbose descriptions push tool context above 10k tokens. Use retrieval-based selection (embedding search to select a subset) over in-context enumeration.
- High-frequency loops: verbose descriptions add cost with diminishing returns after selection stabilizes.
- Genuinely similar tools: description quality cannot resolve near-identical tools — consolidate or differentiate at the implementation level. See Consolidate Agent Tools.
Key Takeaways¶
- Tool description quality is a direct performance lever — improving tool ergonomics (including descriptions) reduced task completion time by 40% in one case
- Prompt agents explicitly to prefer specialized over generic tools; make this instruction explicit in the system prompt
- MCP server tools require self-contained descriptions; do not assume agents read adjacent tool docs
- Test tool selection explicitly by logging which tools are selected for which tasks
- Add positive selection signals ("use this when...") not just capability descriptions
- At large tool set sizes (50+ tools), prefer retrieval-based selection over in-context enumeration to manage token cost