Recursive Sub-Agent Delegation: Depth Limits and Trade-offs in Nested Hierarchies¶
Nested sub-agent hierarchies trade compounding token cost, latency, and tracing burden for one more isolated context window at each delegation level.
A sub-agent that itself spawns sub-agents produces a delegation tree rather than the star of flat Sub-Agents for Fan-Out or single-level Orchestrator-Worker. The relevant design parameter is depth, and its trade-offs differ from the trade-offs of wider fan-out at one level.
When the Depth Is Worth Buying¶
Apply the pattern only when each level corresponds to a sub-problem cohesive enough to fit one context window, and when cross-level tracing is already instrumented. Recursive nesting earns its keep when:
- The problem tree is genuinely hierarchical — a codebase investigation that opens N subsystems where each opens M modules, or a multi-tool migration with per-tool sub-stacks. A flat dispatch funnels every leaf result through the parent's context window, wasting the isolation sub-agents exist to provide (Anthropic — How we built our multi-agent research system frames sub-agents as "intelligent information filters").
- The deepest leaf performs real work, not another re-dispatch. The opencode #18100 failure documents the opposite: 47 sessions across 20 levels, with depths 2–18 pure explore-to-explore pass-through until depth 19 finally ran a
grep. - Each level has a different model or prompt assignment that materially improves the leaf decision. Cursor's SDK keeps a per-level prompt and model (Cursor changelog, 4 Jun 2026) — the runtime feature that makes depth a useful lever rather than a cost multiplier.
Cross-Tool Depth Posture (June 2026)¶
Major coding assistants ship three different defaults, reflecting three views on where the risk sits:
| Tool | Depth posture | Source |
|---|---|---|
| Claude Code 2.1.172 | Hard cap at 5 levels; leaf sub-agents at depth 5 do not receive the Agent tool | Claude Code changelog |
| Cursor SDK (Jun 4 2026) | No depth limit; nesting works automatically for any agent that defines sub-agents | Cursor changelog |
| OpenAI Codex | Configurable agents.max_depth, defaults to 1 |
Codex subagents docs |
Claude Code's fixed cap protects against runaway recursion at the cost of flattening genuinely deep problems. Codex makes the same default decision with an opt-out. Cursor treats depth as purely a problem-shape question and pushes cost control to the operator. Read your tool's choice as a constraint on which shapes it can address out of the box.
What Each Level Costs¶
graph TD
O[Orchestrator<br/>Level 0] --> L1A[Sub-Agent<br/>Level 1]
O --> L1B[Sub-Agent<br/>Level 1]
L1A --> L2A[Sub-Agent<br/>Level 2]
L1A --> L2B[Sub-Agent<br/>Level 2]
L2A --> L3A[Leaf Worker<br/>Level 3]
L2B --> L3B[Leaf Worker<br/>Level 3]
L1B --> L2C[Sub-Agent<br/>Level 2]
L2C --> L3C[Leaf Worker<br/>Level 3]
Token cost compounds with branching, not just depth. Anthropic's multi-agent research system already uses ~15× chat tokens at one level of orchestration (Anthropic engineering post); the multiplier compounds again at each additional level. Depth 3 with branching 3 produces roughly an order of magnitude more tokens than a single-level fan-out across the same leaves. The opencode failure is the worst case: every intermediate session bills full coordination tokens for zero useful output.
Context loss is per-boundary. Each level summarises its children before returning to its parent, so a leaf's finding is re-compressed once per level on the way up. Nuance that mattered to the leaf can be silently dropped at level 2 because the summariser judged it unimportant — and downstream agents treat upstream output "as trusted input, invisible by default" (Augment — Multi-Agent AI Systems).
Observability degrades faster than depth. A flat fan-out has one parent and N children — a debugger reads two transcripts and reconstructs the run. A depth-3, branching-3 tree has 13 sessions; reconstructing why a leaf failed requires correlation IDs and a nested trace view (MLflow on multi-agent observability recommends OpenTelemetry GenAI conventions to keep the full execution graph visible).
Why It Works¶
The mechanism when it works is per-level context isolation: each sub-agent at level N owns one sub-problem whose entire exploration fits in its own context window, and returns a small distilled artefact to level N-1. The root never sees the file reads, failed greps, or dead ends from level 3 — only the summary of level 1, which absorbed and compressed everything below. Anthropic's framing of sub-agents as "intelligent information filters" (engineering post) generalises across depth precisely when each level corresponds to a real partition of the problem. Per-level prompt and model assignment (Cursor SDK) is what makes depth a lever rather than a cost multiplier — strong models at the root, cheap models at the leaves.
When This Backfires¶
- Leaf re-dispatch loops. A sub-agent whose tool set overlaps its parent's tends to re-delegate rather than execute. opencode #18100 documents 18 layers of pure pass-through before any real work happened. Mitigation: give leaf agents no Agent/Task tool (Claude Code's depth-5 leaves explicitly lose the Agent tool — changelog) and require each level's sub-agent to declare what it returns, not just what it delegates.
- Synthesis context overflow at the root. Even when each level summarises cleanly, the root must reason over N-level-deep distilled outputs from every branch. The same overflow that hits single-level fan-out at 4+ substantive worker outputs (Orchestrator-Worker failure modes) compounds with depth because each level's summary is itself a synthesis of its children's summaries.
- Latency-sensitive workflows. Each level adds a synchronous orchestration step; coordination overhead has been observed to scale super-linearly with chained agents — one analysis of a four-agent pipeline measured ~950ms of coordination overhead against ~500ms of processing (Towards Data Science — The Multi-Agent Trap). For interactive coding sessions, depth 4 adds seconds before any leaf does work.
- Cost-sensitive batch work. The ~15× baseline (Anthropic post) multiplies again at each additional level. A scheduled audit across N repositories using depth 3 with branching 3 burns ~27× the tokens of a flat dispatch covering the same leaves.
- Cross-level tracing not instrumented. Without correlation IDs and a tree-shaped trace view, a failing run is undebuggable — the orchestrator only sees its direct children's summaries, and which grand-child caused the failure is invisible. Treat tracing as a prerequisite, not a follow-up. See Emergent Behavior Sensitivity for the per-prompt fragility that compounds with depth.
Example¶
A real example of recursive delegation earning its keep is Claude Code's depth-5 cap applied to a refactor that touches five subsystems. The root orchestrator at level 0 receives "migrate fetch() to apiClient across the auth, billing, search, notifications, and admin subsystems." It spawns one sub-agent per subsystem at level 1. The auth sub-agent further decomposes its subsystem into "login flow," "session middleware," and "OAuth callbacks" at level 2, and the login-flow sub-agent at level 2 spawns one leaf per file at level 3. Levels 3 and 4 perform the actual edits; level 5 is unreachable in this run.
# Claude Code: leaf migration worker
---
name: leaf-migrator
tools: [Read, Edit, Bash] # no Agent tool — leaves cannot re-delegate
model: haiku
maxTurns: 12
isolation: worktree
---
Apply the fetch -> apiClient migration to the single file path passed in.
Run the targeted test for that file. Return: { path, edits_applied, tests_pass }.
The leaf worker is deliberately tool-restricted: no Agent (or Task) tool, no further nesting. This is the same posture Claude Code enforces at depth 5 (changelog) and the mitigation the opencode issue argues for (#18100). Each higher level summarises its children's structured returns into a parent-shaped report; the root receives five subsystem summaries rather than the ~40 per-file edit logs.
If the same refactor used flat fan-out, the root would directly own ~40 leaf workers and either synthesise 40 structured returns (orchestrator context overflow) or batch them in chunks (eliminating most of the parallelism benefit). Depth here is buying coherent per-subsystem aggregation at the cost of three extra orchestration rounds.
Key Takeaways¶
- Depth is a design lever, distinct from fan-out width — each level buys one more isolated context window and compounds tokens, latency, summarisation loss, and tracing burden.
- Cap depth by tool posture (Claude Code's 5, Codex's default 1) and by problem shape: nest only when sub-problems are themselves cohesively filterable.
- Strip the
Agent/Tasktool from leaf sub-agents to prevent re-dispatch loops — the dominant failure mode of unbounded recursion. - Treat OpenTelemetry-style nested tracing with correlation IDs as a prerequisite for any depth ≥3 hierarchy; without it, root-cause analysis is intractable.
- Default to flat fan-out; reach for depth only when the problem tree has real hierarchical structure and per-level prompt/model assignment is doing useful work.