Tool Engineering¶

Design, expose, and manage the tools that agents use to act on the world -- from description quality and schema design through MCP servers, skills, hooks, and specialized tool patterns.

Fundamentals¶

Core principles for designing agent tools that are discoverable, unambiguous, and cost-effective.

Tool Engineering — Design agent tools like APIs -- with documentation, examples, edge-case handling, and mistake-proofing -- not as boilerplate wrappers around existing functions
Tool Minimalism and High-Level Prompting — Expose fewer, non-overlapping tools and provide goal-oriented instructions rather than step-by-step procedures
Tool Description Quality — Tool descriptions -- not just implementations -- determine whether agents select the right tool; treat them as prompt engineering surfaces
Write Tool Descriptions Like Onboarding Docs — Write tool descriptions assuming the agent has never seen the system -- include implicit context, query syntax, domain terms, and resource relationships
Advanced Tool Use: Scaling Agent Tool Libraries — Three API-level features for managing hundreds of tools without drowning in context or losing selection accuracy

Tool Design¶

Structural patterns for tool interfaces, schemas, error handling, and output formatting.

Designing for Agent Consumers (Agent Experience) — Treat the agent as a first-class consumer of your public SDK, CLI, API, and docs; the AX discipline that routes the surface-design tactics, distinct from harness engineering
Agent-Computer Interface (ACI) — Tools are the agent's UI; the same principles that make human interfaces usable make agent tools effective
Function-Level Debugger Interfaces for Coding Agents — Re-expose interactive debuggers at the function frame instead of the source line so LLM agents pay one turn per call, not one turn per step
Semantic Tool Output — Return human-readable, contextually filtered output from agent tools to reduce hallucination and improve downstream call accuracy
Typed Schemas at Agent Boundaries — Formal schemas at every agent-to-agent interface establish explicit contracts that prevent state mismanagement and silent failures
Poka-Yoke for Agent Tools — Redesign tool interfaces so the wrong call cannot compile -- prevention over documentation
Consolidate Agent Tools — Prefer fewer, higher-level tools that match how agents reason about tasks over many narrow tools that mirror API endpoint boundaries
Toolset Agentization — Group frequently co-used tools into specialized sub-agents so the top-level planner chooses among fewer, coarser actions at each routing step
Machine-Readable Error Responses (RFC 9457) — Request structured errors from HTTP APIs using Accept headers to replace brittle HTML parsing with deterministic control flow
Headless-First Services: APIs for Agent Consumers — Expose the full product surface through API, MCP, and CLI so agents acting on behalf of users can complete any flow the GUI supports
Tool Necessity Probing — Read tool-call decisions from the pre-generation hidden state with a linear probe — AUROC 0.89–0.96, 48% fewer tool calls at 1.7% accuracy loss
Chance-Corrected Shortlist Depth Sizing — Bits-over-Random measures whether retrieval at depth K beats random at depth K; the missing chance-corrected metric for sizing fixed or adaptive tool-retrieval shortlists

MCP (Model Context Protocol)¶

Architecture and design guidance for MCP servers and clients -- the open protocol for agent-tool communication.

MCP Client/Server Architecture — Architectural best practices covering transport selection, tool granularity, error handling, capability negotiation, and security
MCP Client Design — Host-side logic for connecting to servers, negotiating capabilities, routing tool calls, caching descriptions, and degrading gracefully on failure
MCP Elicitation — How MCP servers collect structured user input mid-task, and how Elicitation and ElicitationResult hooks let you automate, validate, or block those requests
MCP LLM Sampling — How MCP servers request host LLM inference mid-execution via sampling/createMessage, creating hybrid tools that combine deterministic logic with embedded AI reasoning
MCP Server Design — A server author's checklist for tool naming, schema design, error handling, resource exposure, and token efficiency
Production MCP Agent Stack — Sequence six MCP decisions -- server location, tool grouping, schema delivery, result processing, auth, token storage -- into a coherent production deployment
MCP Tool Result Persistence via _meta Annotation — Mark individual MCP tool outputs as durable through Claude Code's compaction pipeline with _meta["anthropic/maxResultSizeChars"]
Proprietary-to-Open-Standard Migration — When a proprietary extension system gets replaced by an open protocol, rebuild on the standard rather than port the old architecture
Scoped MCP Server Discovery: Most-Specific-Wins Resolution — Resolve duplicate MCP server definitions across user, workspace, and project config files using a most-specific-wins precedence rule
MCP alwaysLoad: Classifying Servers as Eager or Just-in-Time — Decide which MCP servers earn unconditional context residence and which stay deferred behind tool search, using token cost, hit rate, and selection accuracy as signals
Documentation-Grounding MCP Servers for Vendor SDKs — Wire a vendor's docs-MCP endpoint to ground code generation in current API surfaces, only when SDK churn, token budget, and trifecta posture all permit it
Tool Cloning and Provenance Assessment — Raw repository counts overstate the diversity of MCP and Skills marketplaces because many entries are cloned, lightly modified, or template-derived — pair Jaccard and ssdeep before drawing ecosystem conclusions
Hint-Driven Concurrency for Read-Only MCP Tools — The MCP readOnlyHint annotation became a concurrency dispatcher input once Codex CLI 0.134.0 shipped parallel execution for read-only tool calls — a wall-clock win that only holds when annotations are audited and trusted
Push-Event MCP Channels: Inverting the Pull-Tool Polarity — An MCP server that declares claude/channel flips the polarity from pull-on-demand to push-when-it-happens, gated by sender allowlist and an always-open session — useful when warm context is worth keeping
Auth-Isolation as the MCP-vs-CLI Selection Heuristic — Reach for MCP when authenticated access needs its credentials kept out of the agent's context window; choose a CLI when there is no auth boundary to protect

Skills¶

Packaging domain knowledge and reusable capabilities as agent skills with reliable invocation and lifecycle governance.

Skill as Knowledge Pattern — Design skills as pure knowledge containers -- domain rules, heuristics, and reference material -- not executable behavior, so they remain portable across agents
CLI-First Skill Design — Design agent skills as CLI tools so the same interface serves both humans debugging locally and agents automating through shell tool calls
Skill Authoring Patterns — Practical patterns for building, testing, and troubleshooting agent skills -- categories, description craft, implementation patterns, and debugging
SKILL.md Frontmatter Reference — All SKILL.md frontmatter fields: invocation control, subagent delegation, tool restriction, hooks, and argument handling
Skill Context Isolation — Run a skill in an isolated subagent context so its auxiliary tokens never enter the main chat; the parent receives only the distilled result
Skill Library Evolution — How agent skill libraries grow, get pruned, and evolve through versioning, quality gates, and lifecycle governance
Skill Tool Runtime Enforcement — Use the Skill tool to load command prompts at invocation time rather than telling agents to read the file -- eliminates stale instructions and path drift
Google ADK Skills — How Google ADK implements the Agent Skills standard via SkillToolset, inline models.Skill, and three auto-generated tools mapped to L1/L2/L3 progressive disclosure
Interpreter Skills — Ship a SKILL.md plus an importable module so the model decides when the behavior fires while the runtime executes a reviewed, testable function — the named, versionable unit on top of an embedded code interpreter

Hooks & Lifecycle¶

Deterministic interception points that enforce policy, automate side effects, and audit agent behavior without relying on model compliance.

Hooks and Lifecycle Events — Hooks run deterministic code at defined points in an agent's execution -- before and after tool calls, at session boundaries -- enabling enforcement and audit
Conditional Hook Execution — Use the if field on hook handlers to filter by tool name and arguments, eliminating subprocess spawns for non-matching calls
Hook Catalog — A reference catalog of high-value hooks grouped by purpose: CLI enforcement, destructive operation guardrails, sandboxing, and workflow automation
On-Demand Skill Hooks — Register PreToolUse hooks through a skill invocation to arm strict guardrails for a single session without imposing friction on every workflow
PostToolUse BSD/GNU Detection — Catch BSD/GNU CLI incompatibilities at runtime with a PostToolUse hook, feed fixes back via additionalContext, and persist knowledge to CLAUDE.md
StopFailure Hook: Observability for API Error Termination — The StopFailure hook fires when a Claude Code turn ends due to an API error, giving harnesses a deterministic signal to log failures, alert operators, and feed external recovery workflows
PreCompact Hook: Vetoing Compaction at Lifecycle Boundaries — Claude Code's PreCompact hook can now block compaction outright, deferring context compression until the agent reaches a safer checkpoint
PostToolUse continueOnBlock: Refusal With a Load-Bearing Reason — Feed a hook's rejection reason back to the agent as a continuation signal instead of stopping the turn, turning routable policy violations into guided corrections
Terminal Tool Output Compression — Harness-side post-processing collapses predictable shell-output noise (lockfile diffs, ls -l, npm install progress, unchanged diff hunks) before the model sees it, with a banner that lets the agent opt out per call
MessageDisplay Hook: Transforming Assistant Text at the Display Boundary — A Claude Code 2.1.152 hook event fires on every outbound assistant message and lets a hook rewrite or hide the text before the user sees it — the display-side analogue of PostToolUse output replacement
PostToolBatch Hook: Once-Per-Decision-Cycle Injection at the Batch Boundary — PostToolBatch fires exactly once per parallel tool batch, before the next model call — the cardinality-matched injection point for conventions and validations that would otherwise duplicate N times across PostToolUse

Specialized Tools¶

Purpose-built tool patterns for file operations, web research, CLI integration, and editor-level assistance.

Batch File Operations via Bash Scripts — Consolidate multiple file writes into a single bash script execution to reduce per-call overhead, token consumption, and sequential latency
Browser Automation for Research — When an agent's HTTP client is blocked by CDN bot detection, switch to browser automation tools like Playwright to fetch content
CLI Scripts as Agent Tools — Write thin wrapper scripts that pre-filter system output so agents receive a decision-ready summary rather than raw command output
Cross-Repo Agent Search — Expose a GitHub-API-backed text-search tool to reach code outside the workspace, and compose it with local indexed search under remote-index trade-offs
Filesystem-Based Tool Discovery — Structure MCP tools as files in a directory tree and let the agent load only the definitions it needs, reducing token overhead by up to 98%
Indexed Regex Search for Agent Tools — Back an agent's regex search with a trigram or suffix-array index so query latency stays bounded on large repositories, at the cost of freshness machinery
Next Edit Suggestions — A proactive editing paradigm where the AI predicts both where and what to edit next, between reactive autocomplete and autonomous agent mode
Override Interactive Commands — Suppress interactive prompts with a one-line instruction override so the same command definition serves both human-in-the-loop and automated execution
Self-Healing Tool Routing — Route agent tool calls through a cost-weighted graph; recompute paths on failure and escalate to the LLM only when no feasible path exists
Terminal Tools for Agents: send_to_terminal and Background Interaction — Use VS Code's send_to_terminal tool and backgroundNotifications setting to give agents bidirectional control over background terminal processes
Unix CLI as Native Tool Interface — A single run(command) tool backed by Unix CLI can replace large function catalogs, leveraging pretraining on shell usage and built-in discovery primitives
Web Search Agent Loop — Instead of firing a single query, wrap retrieval in a cycle of search, evaluate, refine, and synthesize -- giving the agent autonomy to decide when evidence is sufficient
Lexical-First Retrieval for Agentic Search — A tuned BM25 index paired with a frontier LLM and deep retrieval can match or beat dense retrieval on deep-research benchmarks -- when the agent loop is strong enough to filter the ranking noise