Recursive Agent Harnesses (RAH)¶

A parent agent runs a script that spawns subagent harnesses in parallel, making the recursive unit a full harness rather than a model call.

When to Use It¶

Recursive Agent Harness (RAH) is conditional, not a default. Use it only when all three hold (Lumer et al., 2026; Anthropic multi-agent retrospective):

The work decomposes into genuinely independent subtasks — no shared naming, types, or call sites that need reconciliation (see Cohesion-Aware Task Partitioning for the partition-cost formalism).
Each subtask has a cheap verification signal the parent can use to accept or reject the subagent's result (passing tests, lint, schema check).
The task value justifies a ~15× token multiplier over a single-agent run (Anthropic, 2025).

If any of the three fails, prefer a single-threaded linear agent with a compression sub-LLM (Cognition, 2025).

What Recurses¶

The pattern names what the recursive unit is. In Recursive Language Models (RLMs), it's a bare model call — the LLM examines a long prompt and calls itself programmatically on segments inside a Python REPL (Zhang, Kraska, Khattab, 2025). In a Recursive Agent Harness, it's a full harness: filesystem tools, code execution, planning, and its own context. The parent agent writes and runs a script that spawns subagent harnesses in parallel for fine-grained workloads, and falls back to structured function calls for minor subtasks (Lumer et al., 2026).

	RLM	RAH
Recursive unit	Model call	Full agent harness
What the unit sees	Text segment	Filesystem, shell, tools
Where intermediate state lives	Outer model's variables	Subagent's context + filesystem
Failure mode	Long-context degradation	Conflicting parallel decisions

Production Instance: Dynamic Workflows¶

Claude Code Dynamic Workflows ship a working instance (Claude Code docs): the parent agent writes a JavaScript orchestration script that a background runtime executes, coordinating up to 1,000 subagents per run (16 in-flight) with results held in script variables instead of the orchestrator's context. The parent generates code rather than control flow, each subagent inherits its own harness, and the concurrency cap bounds coordination overhead.

Why It Works¶

When the three preconditions hold, RAH wins for one reason: each subagent inherits a fresh context window plus its own tools, moving work that would have crowded the parent's window into (a) a per-subagent window and (b) executable actions a runtime can verify, instead of prompt tokens the parent must read (Lumer et al., 2026).

The mechanism's strength is bounded by how independent the subtasks really are. When subagents' work conflicts, the recursive structure cannot reconcile it — the parent only sees the returned artefacts and must choose between them without visibility into the reasoning that produced each one (Cognition, 2025).

When This Backfires¶

RAH fails under specific, common conditions.

Coupled coding work. Anthropic's multi-agent retrospective: "most coding tasks involve fewer truly parallelizable tasks than research" (Anthropic, 2025). Parallel subagents working on shared naming, types, or call sites make implicit decisions that conflict on return, and the parent must reconcile them — eating the speedup (Cognition, 2025). See Cohesion-Aware Task Partitioning for the partition-cost mechanism.
Low-value tasks. Multi-agent runs use roughly 15× the tokens of a single chat. A small refactor, doc edit, or simple bug fix cannot justify the multiplier; the recursive structure pays the cost without earning it back. The Agent-Headcount Vanity Metric is the corresponding anti-pattern when the token cost is not paid back.
No leaf-level verification signal. RAH assumes the parent can judge each subagent's output cheaply. Without an objective check per subtask, the parent rationalises weak results rather than rejecting them — the recurring multi-agent failure cluster identified across 1,642 traces (Cemri et al., 2025; see also Multi-Agent SE Design Patterns).
Single-paper provenance. The RAH numbers — 71.75% to 81.36% on Oolong-Synthetic with a Codex baseline, 89.77% with Claude Sonnet 4.5 — come from one paper, one benchmark, 199 samples (Lumer et al., 2026). No independent replication yet.

Cognition's argument is that a single-threaded linear agent with a compression sub-LLM preserves the context-window benefit without the conflicting-decisions risk (Cognition, 2025).

Example¶

The Lumer et al. paper does not publish its parent-agent script. The closest production realisation is Claude Code's Dynamic Workflows runtime — a parent agent writes a JavaScript script the runtime executes:

Run a workflow to audit every API endpoint under src/routes/ for missing auth checks

The parent agent produces an orchestration script along these lines:

// Sketch of a Dynamic Workflows parent script
const endpoints = await agent({
  agentType: "Explore",
  prompt: "List every route handler under src/routes/"
});

const findings = await parallel(endpoints.map(ep => ({
  agentType: "audit-page-worker",
  prompt: `Check ${ep} for missing auth middleware`,
})));

const verified = await agent({
  agentType: "skeptic",
  prompt: `Refute each finding: ${JSON.stringify(findings)}`,
});

return verified.filter(f => !f.refuted);

The verified step is what makes this RAH rather than ordinary fan-out — an adversarial check at each recursion node gives the parent a cheap signal for accepting or rejecting each subagent's result. Without it, the pattern collapses into the conflicting-decisions failure mode.

Key Takeaways¶

The recursive unit is a full harness (tools, execution, planning), not a model call — that's what distinguishes RAH from RLMs
Use only when subtasks are genuinely independent, leaf verification is cheap, and task value justifies a ~15× token cost
The parent generates and runs a script — intermediate results live in script variables, not the parent's context, which is why the pattern scales
Evidence is one paper, one benchmark; Dynamic Workflows is the most credible production exemplar, but the empirical case is narrow

Claude Code Dynamic Workflows — production runtime for the pattern, with the 1,000-agent cap and script-as-orchestrator model
Recursive Best-of-N Delegation — companion reliability pattern: K candidates per recursion node, judge-selected, to contain the conflicting-decisions risk
Cohesion-Aware Task Partitioning — the partition-cost lens that decides whether subtasks are independent enough to recurse over
Agent Harness: Initializer and Coding Agent — the harness this pattern makes the recursive unit of
Anthropic's Effective Agents Framework — the workflow-vs-agent taxonomy RAH extends with a code-first orchestration script
Deep Agent Runtime — the runtime layer that durably executes the script the parent writes
Orchestrator-Worker Pattern — the structural pattern RAH specialises by making the worker a recursive harness