Tool Engineering¶
Design, expose, and manage the tools that agents use to act on the world -- from description quality and schema design through MCP servers, skills, hooks, and specialized tool patterns.
Fundamentals¶
Core principles for designing agent tools that are discoverable, unambiguous, and cost-effective.
- Tool Engineering — Design agent tools like APIs -- with documentation, examples, edge-case handling, and mistake-proofing -- not as boilerplate wrappers around existing functions
- Tool Minimalism and High-Level Prompting — Expose fewer, non-overlapping tools and provide goal-oriented instructions rather than step-by-step procedures
- Tool Description Quality — Tool descriptions -- not just implementations -- determine whether agents select the right tool; treat them as prompt engineering surfaces
- Write Tool Descriptions Like Onboarding Docs — Write tool descriptions assuming the agent has never seen the system -- include implicit context, query syntax, domain terms, and resource relationships
- Advanced Tool Use: Scaling Agent Tool Libraries — Three API-level features for managing hundreds of tools without drowning in context or losing selection accuracy
Tool Design¶
Structural patterns for tool interfaces, schemas, error handling, and output formatting.
- Designing for Agent Consumers (Agent Experience) — Treat the agent as a first-class consumer of your public SDK, CLI, API, and docs; the AX discipline that routes the surface-design tactics, distinct from harness engineering
- Agent-Computer Interface (ACI) — Tools are the agent's UI; the same principles that make human interfaces usable make agent tools effective
- Function-Level Debugger Interfaces for Coding Agents — Re-expose interactive debuggers at the function frame instead of the source line so LLM agents pay one turn per call, not one turn per step
- Semantic Tool Output — Return human-readable, contextually filtered output from agent tools to reduce hallucination and improve downstream call accuracy
- Typed Schemas at Agent Boundaries — Formal schemas at every agent-to-agent interface establish explicit contracts that prevent state mismanagement and silent failures
- Poka-Yoke for Agent Tools — Redesign tool interfaces so the wrong call cannot compile -- prevention over documentation
- Consolidate Agent Tools — Prefer fewer, higher-level tools that match how agents reason about tasks over many narrow tools that mirror API endpoint boundaries
- Toolset Agentization — Group frequently co-used tools into specialized sub-agents so the top-level planner chooses among fewer, coarser actions at each routing step
- Machine-Readable Error Responses (RFC 9457) — Request structured errors from HTTP APIs using Accept headers to replace brittle HTML parsing with deterministic control flow
- Headless-First Services: APIs for Agent Consumers — Expose the full product surface through API, MCP, and CLI so agents acting on behalf of users can complete any flow the GUI supports
- Tool Necessity Probing — Read tool-call decisions from the pre-generation hidden state with a linear probe — AUROC 0.89–0.96, 48% fewer tool calls at 1.7% accuracy loss
- Chance-Corrected Shortlist Depth Sizing — Bits-over-Random measures whether retrieval at depth K beats random at depth K; the missing chance-corrected metric for sizing fixed or adaptive tool-retrieval shortlists
MCP (Model Context Protocol)¶
Architecture and design guidance for MCP servers and clients -- the open protocol for agent-tool communication.
- MCP Client/Server Architecture — Architectural best practices covering transport selection, tool granularity, error handling, capability negotiation, and security
- MCP Client Design — Host-side logic for connecting to servers, negotiating capabilities, routing tool calls, caching descriptions, and degrading gracefully on failure
- MCP Elicitation — How MCP servers collect structured user input mid-task, and how Elicitation and ElicitationResult hooks let you automate, validate, or block those requests
- MCP LLM Sampling — How MCP servers request host LLM inference mid-execution via sampling/createMessage, creating hybrid tools that combine deterministic logic with embedded AI reasoning
- MCP Server Design — A server author's checklist for tool naming, schema design, error handling, resource exposure, and token efficiency
- Production MCP Agent Stack — Sequence six MCP decisions -- server location, tool grouping, schema delivery, result processing, auth, token storage -- into a coherent production deployment
- MCP Tool Result Persistence via _meta Annotation — Mark individual MCP tool outputs as durable through Claude Code's compaction pipeline with
_meta["anthropic/maxResultSizeChars"] - Proprietary-to-Open-Standard Migration — When a proprietary extension system gets replaced by an open protocol, rebuild on the standard rather than port the old architecture
- Scoped MCP Server Discovery: Most-Specific-Wins Resolution — Resolve duplicate MCP server definitions across user, workspace, and project config files using a most-specific-wins precedence rule
- MCP alwaysLoad: Classifying Servers as Eager or Just-in-Time — Decide which MCP servers earn unconditional context residence and which stay deferred behind tool search, using token cost, hit rate, and selection accuracy as signals
- Documentation-Grounding MCP Servers for Vendor SDKs — Wire a vendor's docs-MCP endpoint to ground code generation in current API surfaces, only when SDK churn, token budget, and trifecta posture all permit it
- Tool Cloning and Provenance Assessment — Raw repository counts overstate the diversity of MCP and Skills marketplaces because many entries are cloned, lightly modified, or template-derived — pair Jaccard and ssdeep before drawing ecosystem conclusions
- Hint-Driven Concurrency for Read-Only MCP Tools — The MCP
readOnlyHintannotation became a concurrency dispatcher input once Codex CLI 0.134.0 shipped parallel execution for read-only tool calls — a wall-clock win that only holds when annotations are audited and trusted - Push-Event MCP Channels: Inverting the Pull-Tool Polarity — An MCP server that declares
claude/channelflips the polarity from pull-on-demand to push-when-it-happens, gated by sender allowlist and an always-open session — useful when warm context is worth keeping - Auth-Isolation as the MCP-vs-CLI Selection Heuristic — Reach for MCP when authenticated access needs its credentials kept out of the agent's context window; choose a CLI when there is no auth boundary to protect
Skills¶
Packaging domain knowledge and reusable capabilities as agent skills with reliable invocation and lifecycle governance.
- Skill as Knowledge Pattern — Design skills as pure knowledge containers -- domain rules, heuristics, and reference material -- not executable behavior, so they remain portable across agents
- CLI-First Skill Design — Design agent skills as CLI tools so the same interface serves both humans debugging locally and agents automating through shell tool calls
- Skill Authoring Patterns — Practical patterns for building, testing, and troubleshooting agent skills -- categories, description craft, implementation patterns, and debugging
- SKILL.md Frontmatter Reference — All SKILL.md frontmatter fields: invocation control, subagent delegation, tool restriction, hooks, and argument handling
- Skill Context Isolation — Run a skill in an isolated subagent context so its auxiliary tokens never enter the main chat; the parent receives only the distilled result
- Skill Library Evolution — How agent skill libraries grow, get pruned, and evolve through versioning, quality gates, and lifecycle governance
- Skill Tool Runtime Enforcement — Use the Skill tool to load command prompts at invocation time rather than telling agents to read the file -- eliminates stale instructions and path drift
- Google ADK Skills — How Google ADK implements the Agent Skills standard via SkillToolset, inline
models.Skill, and three auto-generated tools mapped to L1/L2/L3 progressive disclosure - Interpreter Skills — Ship a SKILL.md plus an importable module so the model decides when the behavior fires while the runtime executes a reviewed, testable function — the named, versionable unit on top of an embedded code interpreter
Hooks & Lifecycle¶
Deterministic interception points that enforce policy, automate side effects, and audit agent behavior without relying on model compliance.
- Hooks and Lifecycle Events — Hooks run deterministic code at defined points in an agent's execution -- before and after tool calls, at session boundaries -- enabling enforcement and audit
- Conditional Hook Execution — Use the
iffield on hook handlers to filter by tool name and arguments, eliminating subprocess spawns for non-matching calls - Hook Catalog — A reference catalog of high-value hooks grouped by purpose: CLI enforcement, destructive operation guardrails, sandboxing, and workflow automation
- On-Demand Skill Hooks — Register PreToolUse hooks through a skill invocation to arm strict guardrails for a single session without imposing friction on every workflow
- PostToolUse BSD/GNU Detection — Catch BSD/GNU CLI incompatibilities at runtime with a PostToolUse hook, feed fixes back via additionalContext, and persist knowledge to CLAUDE.md
- StopFailure Hook: Observability for API Error Termination — The StopFailure hook fires when a Claude Code turn ends due to an API error, giving harnesses a deterministic signal to log failures, alert operators, and feed external recovery workflows
- PreCompact Hook: Vetoing Compaction at Lifecycle Boundaries — Claude Code's PreCompact hook can now block compaction outright, deferring context compression until the agent reaches a safer checkpoint
- PostToolUse continueOnBlock: Refusal With a Load-Bearing Reason — Feed a hook's rejection reason back to the agent as a continuation signal instead of stopping the turn, turning routable policy violations into guided corrections
- Terminal Tool Output Compression — Harness-side post-processing collapses predictable shell-output noise (lockfile diffs,
ls -l,npm installprogress, unchanged diff hunks) before the model sees it, with a banner that lets the agent opt out per call - MessageDisplay Hook: Transforming Assistant Text at the Display Boundary — A Claude Code 2.1.152 hook event fires on every outbound assistant message and lets a hook rewrite or hide the text before the user sees it — the display-side analogue of PostToolUse output replacement
- PostToolBatch Hook: Once-Per-Decision-Cycle Injection at the Batch Boundary —
PostToolBatchfires exactly once per parallel tool batch, before the next model call — the cardinality-matched injection point for conventions and validations that would otherwise duplicate N times acrossPostToolUse
Specialized Tools¶
Purpose-built tool patterns for file operations, web research, CLI integration, and editor-level assistance.
- Batch File Operations via Bash Scripts — Consolidate multiple file writes into a single bash script execution to reduce per-call overhead, token consumption, and sequential latency
- Browser Automation for Research — When an agent's HTTP client is blocked by CDN bot detection, switch to browser automation tools like Playwright to fetch content
- CLI Scripts as Agent Tools — Write thin wrapper scripts that pre-filter system output so agents receive a decision-ready summary rather than raw command output
- Cross-Repo Agent Search — Expose a GitHub-API-backed text-search tool to reach code outside the workspace, and compose it with local indexed search under remote-index trade-offs
- Filesystem-Based Tool Discovery — Structure MCP tools as files in a directory tree and let the agent load only the definitions it needs, reducing token overhead by up to 98%
- Indexed Regex Search for Agent Tools — Back an agent's regex search with a trigram or suffix-array index so query latency stays bounded on large repositories, at the cost of freshness machinery
- Next Edit Suggestions — A proactive editing paradigm where the AI predicts both where and what to edit next, between reactive autocomplete and autonomous agent mode
- Override Interactive Commands — Suppress interactive prompts with a one-line instruction override so the same command definition serves both human-in-the-loop and automated execution
- Self-Healing Tool Routing — Route agent tool calls through a cost-weighted graph; recompute paths on failure and escalate to the LLM only when no feasible path exists
- Terminal Tools for Agents: send_to_terminal and Background Interaction — Use VS Code's send_to_terminal tool and backgroundNotifications setting to give agents bidirectional control over background terminal processes
- Unix CLI as Native Tool Interface — A single run(command) tool backed by Unix CLI can replace large function catalogs, leveraging pretraining on shell usage and built-in discovery primitives
- Web Search Agent Loop — Instead of firing a single query, wrap retrieval in a cycle of search, evaluate, refine, and synthesize -- giving the agent autonomy to decide when evidence is sufficient
- Lexical-First Retrieval for Agentic Search — A tuned BM25 index paired with a frontier LLM and deep retrieval can match or beat dense retrieval on deep-research benchmarks -- when the agent loop is strong enough to filter the ranking noise