Agent Design¶

Architecture, delegation, memory, control, reliability, and harness patterns for building effective agents.

Core Design¶

Foundational architecture decisions — how to structure agents, delegate work, and separate concerns.

Agent-First Software Design — Architect systems where AI agents are the primary consumers, using machine-readable APIs and structured outputs instead of visual UIs
The Agent Stack Bet: Architectural Decisions for Production Agents — Four architectural bets — identity, universal context, durability, and platform — that move production-agent concerns from application code to the platform layer, with the conditions under which each bet pays off
Emergent Architecture in AI-Driven Codebases — AI coding agents produce codebases with measurable architectural biases — pattern replication, abstraction bloat, and stack convergence — that compound across agent runs
Agentic AI Architecture: From Prompt-Response to Goal-Directed Systems — Reference architecture separating cognitive reasoning from execution, a topology taxonomy for multi-agent coordination, and an enterprise hardening checklist
Agent Development Lifecycle for Agent Products — A four-phase meta-lifecycle (build, test, deploy, monitor) where verdict-labelled production traces feed the next evaluation and build cycle
Agentic Flywheel: Self-Improving Agent Systems — A closed loop where agents analyze their own traces and metrics to generate harness improvements that make all future agent work better
Agent-Discoverable Slash Commands — Slash commands become model-callable primitives when the agent's planner can read their descriptions and invoke them mid-loop, collapsing the boundary between human-invoked shortcuts and agent-invoked tools
Agents vs Commands: Separation of Role and Workflow — Commands define what to do; agents define who does it — separating orchestration from expertise lets you change either without touching the other
Agentless vs Autonomous: When Simple Beats Complex — Simple two-phase workflows often outperform complex autonomous agents — empirical evidence for starting with constrained approaches rather than maximizing AI autonomy
Agent Terminology Disambiguation for AI Coding Systems — Eight overlapping terms — workflow, autonomous agent, AI assistant, RAG pipeline, workflow engine — describe distinct systems with distinct failure modes; matching the term to what you are building changes which patterns apply
Classical SE Patterns as Agent Design Analogues — Classical GoF patterns and SOLID principles have direct structural analogues in agent systems, with the concern shift moving from reuse to control and safety
Coding Agent Scope Expansion: When to Extend Beyond the Codebase — Extending a coding agent into browser, ops, and knowledge-work only works when the coding scaffold — loop, verification, eval, credential boundary — transfers into the new domain
Cognitive Reasoning vs Execution: A Two-Layer Agent Architecture — Separate the agent layer that decides what to do from the layer that acts — typed tool interfaces enforce the boundary and make each independently testable
CoALA Decision-Making Loop as an Orchestration Lens — The propose -> evaluate -> select -> act loop from CoALA, used as a vocabulary for locating where each orchestration tactic improves an agent — not a prescription for runtime structure
CoALA Structured Action Space: Internal vs External Actions — Split the agent's actions into internal (reason, retrieve, learn) and external (ground) — the boundary surfaces cost, reversibility, and permission profiles that monolithic tool lists hide
Discrete Phase Separation — Prevent context contamination by running research, planning, and execution in separate conversations — only distilled artifacts cross phase boundaries
Domain-Scoped Parallel Exploration for Multi-File Change Localization — Partition a localization agent's exploration along domain seams when a change spans subsystems — context isolation, not parallelism, is the active ingredient
Trained Repository Explorer Sub-Agent (FastContext) — Delegate repo exploration to a trained 4B–30B sub-agent that returns file-path + line-range citations to the solver — context isolation specialised to repo exploration with citation-precision training
The Delegation Decision: When to Use an Agent vs Do It Yourself — Agent delegation has overhead; match task characteristics to agent strengths rather than delegating everything or nothing
Delegation Threshold Calibration for Orchestrator Agents — Calibrate the orchestrator's delegate-or-handle-inline threshold against handoff cost, review tax, and the 15× token multiplier of true multi-agent
Empowerment Over Automation — AI tools should skip tedious work while preserving your autonomy over architectural decisions, domain logic, and creative choices
Eval Strategy by Agent Generation: A Structure-to-Eval Locator — Six structural levels of agent architecture — prompt, chain, ReAct loop, workflow graph, modern loop, harness — each open an eval surface the prior level cannot see
Execution-First Delegation — Instead of scripting steps, specify the outcome and the boundaries; the agent determines how
Execution Lineage: DAG of Artifacts vs Agent Loops — Represent revisable AI-native work as a DAG of artifact-producing computations with explicit dependencies, stable boundaries, and identity-based replay — so unrelated edits don't perturb the final output
Externalization in LLM Agents: Memory, Skills, Protocols, and Harness — Reliable agents externalize cognitive burdens into persistent infrastructure — four components that transform hard internal problems into tractable retrieval and composition tasks
Governed Sources of Truth for Analytics Agents (Structure Over Access) — Layer semantic models, lineage, and skill routers between an analytics agent and the warehouse; raw corpus access alone moved Anthropic's accuracy by less than a point
Inversion Analysis: Surface Capabilities Competitors Cannot Replicate — Inversion asks what your architecture enables that others cannot replicate, producing novel integrations rather than feature parity
Open Agent School Pattern Mapping — Map the Open Agent School academic pattern taxonomy to practical coding-agent primitives like maxTurns, PreToolUse hooks, and CLAUDE.md memory
Persona-as-Code: Defining Agent Roles as Structured Documents — Encode each agent's domain, responsibilities, constraints, output artifacts, and scope exclusions as a Markdown file so roles are explicit, auditable, and composable
ReAct (Reason + Act): Interleaved Reasoning-Action Loops — Interleave a thought, a tool call, and an observation each step, re-conditioning on real evidence — pays back on sparse-feedback tool-grounded tasks, wastes inference on predictable workflows
Role Orchestration on a Single Model — Invoke the same frozen small model in three distinct roles — summariser, agent, corrector — to roughly double task-goal completion on constrained hardware without additional training
Graph of Thoughts: Directed Graph Reasoning for Multi-Path Problems — Model reasoning as a directed graph to aggregate insights across independent paths — the Aggregate operation that neither CoT nor Tree of Thoughts can express
Petri Net of Thoughts: Formal Process Models as Prompting Scaffolds — Use Petri net formalism to derive reasoning structure from process evidence, giving each LLM call a state-aware prompt constrained by formally defined transitions
Self-Discover Reasoning: LLM-Composed Reasoning Structures — Enable the model to compose a task-specific reasoning plan from a library of atomic modules before solving, outperforming fixed strategies like Chain-of-Thought on complex analytical tasks
Runtime Scaffold Evolution — Agents synthesize, modify, and deploy custom tools during active problem-solving rather than relying on a fixed toolkit
Layered Domain Architecture — Pin one intra-domain layer order (Types → Config → Repo → Service → Runtime → UI) with downward-only dependencies enforced by a linter so agents place new code in the same slot every session
Model-Neutral Agent Architecture: Model Portability Over Cloud Portability — A conditional architectural bet — model portability pays back faster than cloud portability when cross-vendor capability churns quarterly and the team has portable evals; inverts on single-CLI platforms and workloads bound to vendor-native features
Separation of Knowledge and Execution — Structure agent systems in three layers — skills (knowledge), agents (execution), and commands (orchestration) — so each layer changes independently
Solver-Externalized Constraint Reasoning (MaxSAT/SMT Encoding) — Have the agent emit solver code for z3, python-sat, or OR-Tools instead of reasoning through constraints in prose — then verify the solver's output against the original intent
Structured Agentic Software Engineering (SASE) — A framework for transitioning from AI-augmented to goal-directed agentic SE, with structured artifacts (MRPs, CRPs, BriefingScript) that close the speed-vs-trust gap
Task-Specific Agents vs Role-Based Agents — Build agents for specific tasks rather than generic roles — narrow scope produces more precise output and reduces context confusion
Three Reasoning Spaces: Plan, Bead, and Code — Treat plan space, bead space, and code space as explicit gates — transitioning between them deliberately prevents architecture drift during implementation

Memory & State¶

How agents persist, retrieve, and synthesize information across turns and sessions.

Memory Retrieval as a Control Decision — Treat memory injection as a control decision — abstain, gate before action, or utility-rank retrieved memory rather than always injecting top-k
Agent Memory Patterns: Learning Across Conversations — Persist knowledge across conversations using scoped memory systems so agents accumulate institutional knowledge rather than starting fresh every session
Code-Native Memory Substrates — Root agent memory in codebase artifacts — typed VCS units, AST diffs, and git-backed task graphs — so structure replaces lossy natural-language summaries
Executable Memory: User-State as Code for Personalized Agents — Compile user memory into typed code with rule functions — beats retrieval 99% vs 6-43% on aggregate queries, loses to text retrieval on plain recall
AX/UX/DX Triad: Three Experience Layers in Agent Systems — Treat Agent Experience, User Experience, and Developer Experience as separate design surfaces — conflating them degrades all three
CoALA Memory Taxonomy as a Classifier for Harness Artifacts — Apply CoALA's four memory types as a classifier over the artifacts an agent harness already has; the value is diagnostic, not prescriptive
Durable Interactive Artifacts: Agent Output Outside the Transcript — Treat agent outputs as persistent, re-openable, re-runnable workspace objects separate from the chat transcript — the transcript holds reasoning, the artifact holds state
Episodic Memory Retrieval — Retrieve relevant past interaction episodes — not isolated facts — so agents recall what was tried, what failed, and what worked when facing similar problems
Generative Agents Memory Stream — Three-layer architecture (observation stream, scored retrieval, reflection synthesis) for maintaining coherent behavior across long-running, high-observation-density agent sessions
Handoff Skill: Structured Context Transfer Between Agent Sessions — A model-invocable skill that compacts the current session into a temp-file handoff document a fresh agent can pick up — invoked at an explicit transfer point, distinct from harness-detected recap and from raw-transcript forwarding
Layered Mutability: Governing Persistent Self-Modifying Agents — A five-layer lens (pretraining, alignment, self-narrative, memory, weight-level adaptation) for deciding where governance attaches in persistent agents and when compositional drift will ratchet past a visible-layer rollback
Structured Task-State Ledger for Tool-Calling Agents (LedgerAgent) — Maintain task state as a typed dictionary outside the prompt and gate write tools against executable policy predicates — pays back on multi-turn, policy-bound, write-heavy workflows where pass^k reliability matters
Memory Synthesis from Execution Logs — Extract causal lessons from agent execution traces — what worked, what failed, which approaches were abandoned and why — so every run makes future runs more effective
RAG over Thinking Traces — For reasoning-intensive tasks, swap the document corpus for prior thinking trajectories — the same retrieve-then-generate pipeline beats both no-RAG and document-RAG with flat or lower inference cost
Session Initialization Ritual: How Agents Orient Themselves — A mandatory startup sequence that every agent session executes before touching code — verify state, orient to progress, confirm baseline health, then act
Session Recap: Goal-Shaped Handoff at Context Boundaries — A structured, agent-authored artifact at compaction, resume, or fork boundaries that preserves goal-state rather than text-density — restoring why and what-next across discontinuities
Memory Transfer Learning: Cross-Domain Memory Reuse — How coding agents transfer learned memories across different task domains, why abstraction level determines transferability, and when cross-domain memory causes negative transfer
Subtask-Level Memory for Software Engineering Agents — Store and retrieve memory at individual reasoning stages, not whole sessions, to prevent misguided retrieval when tasks share surface similarity
Tiered Memory Architecture: Episodic-to-Semantic Consolidation — Separate raw episode storage from a curated semantic tier and promote facts between them only on observed re-use — pays off for long operation windows, adds cost without benefit elsewhere

Control & Orchestration¶

Patterns for steering agent behavior, detecting convergence, and managing execution flow.

Background Todo Agent — Route the agent's todo-list maintenance loop to a small background model so the frontier model spends its attention budget on the active sub-task instead of bookkeeping
In-Agent Task Prioritization: Ranking the Next Action — Rank pending work by a composite score (urgency, value, dependency, blast radius, staleness) — distinct from routing and scheduling — so the agent's scarce attention lands on the item that pays back most per turn
Controlling Agent Output: Concise Answers, Not Essays — Matching the agent's response format to what you actually need reduces noise and preserves context budget
Critic Agent Pattern — A second model reviews the primary agent's plan before execution begins, catching structural errors early when recovery is cheap
Deterministic Orchestration for Structured Modernization — When the workflow shape is stable, encoding orchestration in code reserves the LLM for translation choices — comparable accuracy at up to 3.5x lower token cost
LLM-as-Code Agentic Programming for Agent Harnesses — When the workflow shape is enumerable, the program holds control flow and the LLM is a callable component; context scales with call-tree depth instead of accumulated steps
AST-Grounded Critic Loop for Documentation Maintenance — Combine AST-anchored retrieval with a critic-guided Reflexion loop, treating doc-vs-code consistency as a structural verification problem rather than a text-generation problem
DSPy: Programmatic Prompt Optimization — Replace hand-written prompts with optimizable modules: define a metric, supply training examples, and an optimizer searches prompt and few-shot space automatically for compound pipelines
GEPA: Reflective Prompt Evolution with Pareto Selection — Evolve prompts by reflecting on execution traces in natural language and keeping the Pareto frontier of candidates per-instance — tuned for rich textual feedback and sample-efficient gains over MIPROv2 and RL
Evaluator-Optimizer Pattern — Two distinct LLM roles in a loop: a generator produces output and an evaluator critiques it, feeding structured feedback back until a quality threshold is met
Event-Driven Agent Routing — Route work between agents and human teams by reacting to status-change events rather than maintaining a central coordinator that owns the full workflow
Goal Monitoring and Progress Tracking — Planning tells the agent what to do; monitoring tells you whether it actually did it and whether it wandered off
Grill Me: Developer-Initiated Plan Interrogation — Direct the agent to challenge your plan rather than execute it, surfacing hidden assumptions and decision gaps before implementation begins
Classifier-Subagent Run Mode: Per-Call Permission Routing — A three-tier router — allowlist, sandbox, classifier subagent — for shell, MCP, and fetch calls, with project-specific permissions.json custom instructions steering the classifier's allow/refactor/escalate verdict across Cursor, Claude Code, and Codex
Inference-Time Tool-Call Reviewer — A reviewer agent inspects each provisional tool call before dispatch, gated by Helpfulness-Harmfulness metrics that quantify when feedback adds net value
Interactive Clarification for Underspecified Tasks — Agents that explore the codebase first and ask targeted clarification questions recover up to 74% of the performance lost to underspecified inputs
Issue Requirements Preprocessing — Transforming raw issue descriptions into structured requirements before code generation improves patch resolution rates by 17% on average
Minimum-Sufficient Control Ladder: Escalate by Failure Mode — An ordered algorithm for adding agent control mechanisms — Tool Use, Reflection, Evaluator-Optimizer, Human-in-the-Loop, Parallelization — only when a named failure mode justifies the next rung
Plan Compliance in Agents: Measure What They Execute, Not What You Wrote — Agents silently deviate from instructed plans; plan quality, phase alignment, and periodic reminders determine whether the plan you wrote actually runs
Proactive Idle-Time Anticipation (ProAct) — Predict likely next user needs from dialogue history plus persistent memory during the idle window between turns, then prefetch evidence under a value gate — pays back only when need chains are predictable and push fatigue is tolerable
Progressive Disclosure for Agent Definitions — Keep agent definitions minimal — identity and scope only — and load detailed task knowledge on demand through skills rather than front-loading everything
Prompted Uncertainty Decomposition for Clarification Routing — Elicit action confidence and request uncertainty as two separate prompted scalars so a black-box agent asks the user only when ambiguity lives in the spec
Self-Reporting Loops: Autonomous Routines That File Their Own Backlog — Scheduled and autonomous runs file out-of-scope observations to the tracker so signal survives the session boundary, contingent on a trusted substrate, deduplication, and severity routing
Specialized Agent Roles — Assign distinct specializations to parallel agents so they complement rather than compete on the same problems
Sprint Contracts — A pre-coding agreement between planner, generator, and evaluator agents that converts vague goals into graded scoring dimensions before implementation begins — preventing evaluator rationalization
Steering Running Agents: Mid-Run Redirection and Follow-Ups — Send a mid-execution message that redirects tool calls without discarding the context already built
Tool Preamble: User-Visible Status Updates Before Tool Calls — A short visible message before tool execution in multi-step agent runs reduces perceived latency without altering behaviour; apply at phase boundaries, not per call
Verification-Gated Agent Autonomy via Automated Review — Pair broader agent autonomy with an automated review gate that screens output — trust scales through verification rather than per-action human approval, under specific conditions

Reliability¶

Making agents robust — backpressure, idempotency, cost awareness, error recovery, and self-correction.

Agent Circuit Breaker — Wrap external tools with per-tool failure-tracking state machines that block calls during degraded states, preventing token waste on retry loops
Self-Healing Production Agent — A closed-loop pipeline that detects post-deploy regressions, triages causality, and dispatches a sub-agent to open a fix PR — with human review at the merge gate
The Advisor Strategy: Frontier Model as Strategic Advisor — Pair a cost-effective executor model with a frontier advisor that provides strategic guidance on hard decisions — within a single API call, no orchestration required
Agent Backpressure: Automated Feedback for Self-Correction — Automated tooling — type systems, test suites, linters, CI pipelines — creates feedback loops that agents use to self-correct without human intervention
Behavioral Drivers of Coding Agent Success and Failure — Four observable failure clusters and three behavioral patterns that predict success — derived from trajectory analysis of 19 agents across 8 frameworks and 14 LLMs
Cross-Vendor Competitive Routing — Assign competing vendor agents to the same task, collect independent results, and let a human or automated gate select the winner
Decoupled Search Grounding: A Vendor-Agnostic Grounding Boundary — Lift retrieval out of the reasoning model and into an MCP-compatible gateway so provider, caching, and evidence rendering become independently tunable controls — pays off only when strict output contracts, cacheable query mix, and real multi-vendor routing all hold
Dual-Budget Control for Search Agents — Under hard limits on both tool calls and generated tokens, score each candidate action by Value-of-Information per unit budget and spend the next unit on the highest-ranking action
Effective Feedback Compute (EFC) for Harness Comparison — A trace-level scaling coordinate that credits feedback only when it is informative, valid, non-redundant, and retained — replacing raw tokens or tool calls when comparing two harnesses on the same multi-turn task
Gateway Model Routing — Treat an Anthropic-compatible gateway as both inference target and model catalogue, so a single config knob controls what the harness can call and what it shows in the picker
Exception Handling and Recovery Patterns — Agents fail; the question is whether they fail forward (recover and continue) or fail catastrophically (corrupt state, lose progress, repeat work)
Feedback as Capability Equalizer — Weaker models with high-quality iterative feedback outperform stronger models without it — invest in feedback loop quality before upgrading the model
Five-Failure-Layers Diagnostic — Before swapping models, force every observed agent failure through a fixed harness-layer attribution — task spec, context, execution environment, verification, state — so "the model is dumb" resolves to a specific gap
Heuristic-Based Effort Scaling in Agent Prompts — Encode resource allocation rules in system prompts so agents spend proportional effort — few tool calls for simple lookups, many subagents for complex research
Idempotent Agent Operations: Safe to Retry — Design agent operations so that running the same task twice produces the same end state — not duplicate artifacts, conflicting state, or compounded errors
Interactive Effort Sliders: Per-Turn Reasoning-Budget Controls — Expose reasoning budget as an interactive, per-turn operator control — the third option alongside static effort config and heuristic effort scaling
Long-Running Agents: Durability, Checkpoints, and Resumability — The operational shape of agents that work for hours, days, or weeks — three walls (finite context, no persistent state, unreliable self-grading) and the five primitives that recur across Anthropic, Cursor, and Google designs
Remote Agent Host Sessions over SSH and Dev Tunnels — Run the agent loop on a remote host whose lifecycle is decoupled from the client editor — SSH attach, dev tunnel reverse, or cloud worker — so sessions survive laptop sleep, network drops, and editor restarts
Per-User Supervisor Process for Background Agent Sessions — A managed-daemon model where a per-user supervisor spawns each background agent session as a detached process, reconnects via on-disk roster, evicts idle non-pinned sessions, and restarts in place onto an updated binary
Observation Contract Preservation — Tool outputs like presigned URLs and session tokens are contract-bound — preserve their bytes and respect their expiry on the second call, or the chain fails silently
Per-Call Budget Hints on Tool Invocations — Lift the reasoning or returned-token cap on individual tool calls — narrowly, when the call is infrequent and information-dense — instead of re-tuning the global default
Per-Tool Extended Reasoning Opt-In: Tool-Call-Scoped Budgets — A single tool call opts itself into deeper reasoning via a per-call parameter, leaving the turn's global reasoning effort unchanged for every other step
Progressive Spend Threshold Alerting for Agent Cost Governance — Pair a soft cost cap with progressive alerts at fixed budget percentages so operators get graceful intervention windows before agent work hits a hard cutoff
Reasoning Budget Allocation: The Reasoning Sandwich — Allocate maximum reasoning compute to planning and verification phases, reduced compute to execution — rather than using a fixed level throughout
Rollback-First Design: Every Agent Action Should Be Reversible — Before choosing how an agent will perform an action, choose how you will undo it — if recovery costs more than one command, reconsider the approach
RubricRefine: Pre-Execution Rubric Refinement for Code-Mode Tool Use — Generate a task- and registry-specific rubric, score candidate tool-use code against explicit contract checks, and repair failures before any execution — for multi-step tool sequences where contract violations run silently to completion
Selective Checkpoint Restore Across Code and Conversation State — Restore the agent's mental model and re-edit, restore the edits and re-plan, or reset both — when code and conversation are stored separately, the choice is three actions, not one
Specialized Small Language Models as Agent Sub-Tools — Hide a small fine-tuned model behind a tool-call interface so a large orchestrator can offload high-volume narrow operations — search, exploration, terminal output filtering — without spending its own context budget
Tail Control for Agent Workflows: Engineering for the Failure Tail, Not the Average — Engineer non-deterministic agent workflows for worst-case usability — per-step p95 timeouts, hedged re-draws, graceful degradation — because the failure tail sets the reliability behind an API, not the median run
Task Feasibility Awareness: Stop Before You Start — An up-front check that the current tools can satisfy a task at all, halting the infeasible before the agent burns a long chain — the inverse of premature completion
Tenant Model Policy: Organization-Scoped Rules for AI Model Selection — An admin-tier policy plane that decides which AI models an org or tenant can invoke — only safe when policy is strict-priority, denials are explicit, and rules age with model deprecations
Wink: Classifying and Auto-Correcting Coding Agent Misbehaviors — An async trajectory-observer system that classifies misbehaviors into three categories and injects targeted course-corrections
WIP=1 and Little's Law: Kanban Throughput Theory for Agent Task Design — Cap an agent's active task count at one until verification passes; Little's Law makes the cycle-time consequence quantitative

Harness & Tools¶

The runtime infrastructure that hosts and constrains agent execution.

Agent Composition Patterns: Chains, Fan-Out, Pipelines, Supervisors — Multi-agent workflows follow four structural patterns — sequential chains, parallel fan-out, staged pipelines, and supervisor-coordinator — each suited to different task structures
Production Hosting Topology for Self-Hosted Agent SDK Runtimes — Pick a container-lifecycle pattern, autoscale on token rate, mediate credentials through a sidecar proxy, and route long-running sessions by consistent hashing so a self-hosted Agent SDK survives real concurrency, multi-tenancy, and prompt injection
Cloud-Agent Three-Layer State Decoupling — Split a cloud agent's state across agent loop, machine state, and conversation state so pods, sessions, and threads each migrate, hibernate, and recover independently
Dual-Write Append-Mirror for Agent Transcript Externalization — Write the agent transcript to local disk first and forward each batch to a remote store as a best-effort mirror — so a store outage degrades the externalization, not the agent
Agent Harness: Initializer and Coding Agent — Structure long-running agent work as two distinct phases — an initializer that prepares the environment, and a coding agent that picks up reliably from wherever any prior session left off
Agent Runtime Middleware: Per-Call Interception Pipeline — Compose cross-cutting concerns as ordered pre/post handlers around every model and tool call, with a placement matrix for middleware vs. hooks vs. tool wrappers vs. prompt rules
Agent Pushback Protocol — Agents evaluate requests at both implementation and requirements level, surface concerns, and wait for explicit confirmation before executing
Model a Single Agent Turn as Many Inference and Tool-Call Iterations — A single user-facing turn is an iterative sequence of model inference and tool execution steps, not a single round-trip inference call
Delta Channels: Bounded Checkpoint Storage for Append-Only Agent State — Store only the per-step diff and write a full snapshot every K steps so long-session checkpoint storage stays O(N) instead of O(N²) and resume latency stays bounded
Deferred Permission Pattern — Use PreToolUse hook defer decisions to pause headless Claude Code sessions at tool calls and resume them after out-of-band human approval
Most-Restrictive-Wins Fusion for Parallel Agent Control Returns — The deny > defer > ask > allow merge function that fuses parallel hook decisions, classifier verdicts, and permission rules into a single agent-control answer
Tool Confirmation Carousel: Batched UI for Per-Call Approvals — A carousel control reviews multiple pending tool calls in one navigable surface instead of scattered modals — useful only for residual approvals that allowlists and sandboxes cannot absorb
Six-Shape Approval Response Taxonomy — The Claude Agent SDK exposes six distinct responses to a tool-approval prompt (approve, approve with changes, approve and remember, reject, suggest alternative, redirect entirely) composed from three callback knobs over a binary protocol
Harness Design Dimensions and Archetypes — Five dimensions and five archetypes from a 70-project empirical study — a population-level lens for reading harness choices and predicting where effort is missing
Harness Engineering — The discipline of designing agent environments — layered architecture, mechanical enforcement, legibility — so agents reliably produce correct results
Harness Impermanence — Author agent scaffolding as depreciating capital — design for low cost of removal so native model capability can replace it cleanly
Fleet Harness Attribution — Pin model and task, swap whole harnesses, and measure pass rate alongside input-token consumption across a model fleet to attribute outcomes to the harness layer rather than the model
Isometric Harness Ablation — Pin the model, remove one harness subsystem at a time, measure the score drop — the resulting per-subsystem table ranks investment priorities
Lane-Based Execution Queueing — Isolate concurrent agent tasks into named queues with per-lane concurrency limits to prevent output interleaving, race conditions, and deadlocks
Managed vs Self-Hosted Agent Harness — Decision framework for choosing between managed agent services and self-hosted harnesses based on compliance, memory ownership, model routing, and ops capacity
Multi-Shape BYOK Provider — One BYOK provider that natively speaks Chat Completions, Responses, and Messages — with the API family declared per endpoint — replaces single-shape compatibility adapters that silently down-translate provider-specific capability
Per-Model Harness Tuning — Treat the backing model as a first-class harness variable — express prompt, tool, and middleware deltas as declarative model-keyed overrides instead of forcing one configuration to work everywhere
Recursive Agent Harnesses (RAH) — A parent agent generates and runs a script that spawns subagent harnesses in parallel — each with its own tools and context — making the recursive unit a full harness rather than a bare model call
Scoped Browser DevTools Access for Runtime Diagnosis — Give the coding agent a read-oriented Chrome DevTools Protocol attachment for diagnosing runtime, network, and console errors — but only when the agent's other tools cannot close the lethal trifecta against the imported DOM
Temporary Compensatory Mechanisms — Design scaffolding that compensates for current model limitations as removable layers, not load-bearing architecture
The Think Tool — A mid-stream reasoning checkpoint that fires between tool calls, giving agents an explicit space to reflect on tool output before deciding the next action
VS Code Agents App: Agent-Native Parallel Task Execution — Run multiple agent sessions simultaneously across projects — each session inherits workspace custom instructions and MCP servers, enabling practical fan-out task execution