Skip to content

Agent Design

Architecture, delegation, memory, control, reliability, and harness patterns for building effective agents.

Core Design

Foundational architecture decisions — how to structure agents, delegate work, and separate concerns.

Memory & State

How agents persist, retrieve, and synthesize information across turns and sessions.

Control & Orchestration

Patterns for steering agent behavior, detecting convergence, and managing execution flow.

Reliability

Making agents robust — backpressure, idempotency, cost awareness, error recovery, and self-correction.

Harness & Tools

The runtime infrastructure that hosts and constrains agent execution.

  • Agent Composition Patterns: Chains, Fan-Out, Pipelines, Supervisors — Multi-agent workflows follow four structural patterns — sequential chains, parallel fan-out, staged pipelines, and supervisor-coordinator — each suited to different task structures
  • Production Hosting Topology for Self-Hosted Agent SDK Runtimes — Pick a container-lifecycle pattern, autoscale on token rate, mediate credentials through a sidecar proxy, and route long-running sessions by consistent hashing so a self-hosted Agent SDK survives real concurrency, multi-tenancy, and prompt injection
  • Cloud-Agent Three-Layer State Decoupling — Split a cloud agent's state across agent loop, machine state, and conversation state so pods, sessions, and threads each migrate, hibernate, and recover independently
  • Dual-Write Append-Mirror for Agent Transcript Externalization — Write the agent transcript to local disk first and forward each batch to a remote store as a best-effort mirror — so a store outage degrades the externalization, not the agent
  • Agent Harness: Initializer and Coding Agent — Structure long-running agent work as two distinct phases — an initializer that prepares the environment, and a coding agent that picks up reliably from wherever any prior session left off
  • Agent Runtime Middleware: Per-Call Interception Pipeline — Compose cross-cutting concerns as ordered pre/post handlers around every model and tool call, with a placement matrix for middleware vs. hooks vs. tool wrappers vs. prompt rules
  • Agent Pushback Protocol — Agents evaluate requests at both implementation and requirements level, surface concerns, and wait for explicit confirmation before executing
  • Model a Single Agent Turn as Many Inference and Tool-Call Iterations — A single user-facing turn is an iterative sequence of model inference and tool execution steps, not a single round-trip inference call
  • Delta Channels: Bounded Checkpoint Storage for Append-Only Agent State — Store only the per-step diff and write a full snapshot every K steps so long-session checkpoint storage stays O(N) instead of O(N²) and resume latency stays bounded
  • Deferred Permission Pattern — Use PreToolUse hook defer decisions to pause headless Claude Code sessions at tool calls and resume them after out-of-band human approval
  • Most-Restrictive-Wins Fusion for Parallel Agent Control Returns — The deny > defer > ask > allow merge function that fuses parallel hook decisions, classifier verdicts, and permission rules into a single agent-control answer
  • Tool Confirmation Carousel: Batched UI for Per-Call Approvals — A carousel control reviews multiple pending tool calls in one navigable surface instead of scattered modals — useful only for residual approvals that allowlists and sandboxes cannot absorb
  • Six-Shape Approval Response Taxonomy — The Claude Agent SDK exposes six distinct responses to a tool-approval prompt (approve, approve with changes, approve and remember, reject, suggest alternative, redirect entirely) composed from three callback knobs over a binary protocol
  • Harness Design Dimensions and Archetypes — Five dimensions and five archetypes from a 70-project empirical study — a population-level lens for reading harness choices and predicting where effort is missing
  • Harness Engineering — The discipline of designing agent environments — layered architecture, mechanical enforcement, legibility — so agents reliably produce correct results
  • Harness Impermanence — Author agent scaffolding as depreciating capital — design for low cost of removal so native model capability can replace it cleanly
  • Fleet Harness Attribution — Pin model and task, swap whole harnesses, and measure pass rate alongside input-token consumption across a model fleet to attribute outcomes to the harness layer rather than the model
  • Isometric Harness Ablation — Pin the model, remove one harness subsystem at a time, measure the score drop — the resulting per-subsystem table ranks investment priorities
  • Lane-Based Execution Queueing — Isolate concurrent agent tasks into named queues with per-lane concurrency limits to prevent output interleaving, race conditions, and deadlocks
  • Managed vs Self-Hosted Agent Harness — Decision framework for choosing between managed agent services and self-hosted harnesses based on compliance, memory ownership, model routing, and ops capacity
  • Multi-Shape BYOK Provider — One BYOK provider that natively speaks Chat Completions, Responses, and Messages — with the API family declared per endpoint — replaces single-shape compatibility adapters that silently down-translate provider-specific capability
  • Per-Model Harness Tuning — Treat the backing model as a first-class harness variable — express prompt, tool, and middleware deltas as declarative model-keyed overrides instead of forcing one configuration to work everywhere
  • Recursive Agent Harnesses (RAH) — A parent agent generates and runs a script that spawns subagent harnesses in parallel — each with its own tools and context — making the recursive unit a full harness rather than a bare model call
  • Scoped Browser DevTools Access for Runtime Diagnosis — Give the coding agent a read-oriented Chrome DevTools Protocol attachment for diagnosing runtime, network, and console errors — but only when the agent's other tools cannot close the lethal trifecta against the imported DOM
  • Temporary Compensatory Mechanisms — Design scaffolding that compensates for current model limitations as removable layers, not load-bearing architecture
  • The Think Tool — A mid-stream reasoning checkpoint that fires between tool calls, giving agents an explicit space to reflect on tool output before deciding the next action
  • VS Code Agents App: Agent-Native Parallel Task Execution — Run multiple agent sessions simultaneously across projects — each session inherits workspace custom instructions and MCP servers, enabling practical fan-out task execution
Feedback