Skip to content

The Three Loops of Agentic Coding: A Diagnostic Vocabulary

Name three nested loops in an agent session — tool, verification, convergence — so the symptom you see tells you which intervention applies.

The three loops are a diagnostic mental model, not a new architecture. An agent session looks like one undifferentiated stream of activity until you slice it into the inner tool loop (model and tool calls inside a single turn), the verification loop (run tests or build, feed results back, fix), and the outer convergence loop (plan → act → verify repeated across turns until the work converges). Each loop has a different boundary, a different termination condition, and a different way of failing — so the observable symptom tells you which loop is broken and which fix to reach for.

When To Use This Frame

This frame pays off in two specific conditions:

  • The agent has stopped making progress and the cause is not obvious. "It looks stuck" is too vague to act on. Naming the loop where progress stopped narrows the candidate fixes from a dozen to two or three.
  • The session has run long enough that multiple loops are running at once. A 30-second typo fix has one loop. A multi-turn feature implementation with mid-turn tool calls, test runs, and partial reverts has all three running simultaneously, and which one is failing matters.

Skip the frame for single-shot tasks (one-line config edits, version bumps) where there is functionally one loop. Skip it for exploratory or debugging sessions where the goal is to discover what is wrong — the operator move there is read-more, not pick-a-loop.

The Diagnostic Table

Observable symptom Likely failing loop Operator intervention
Same tool call repeating with similar arguments; no progress between turns Inner tool loop Inspect tool definitions and permissions; clip the iteration cap; see Loop Detection
Tests or build keep failing; agent's fix-attempts cycle through the same handful of edits Verification loop Improve the error signal: shorter feedback cycle, more specific test, paste full stack trace; see Failure-Driven Iteration
Each turn produces a fresh-looking attempt; diff oscillates, scope drifts, no version is meaningfully closer to done Outer convergence loop Stop iterating on the artefact; revisit the plan or scope; see Convergence Detection
Many tool calls but tests are green and the diff stabilises Healthy — converging None — let it finish

The signals are distinguishable in the trace. The inner tool loop spinning shows up as visible repeated tool calls inside a single agent turn. A failing verification loop shows up as a sequence of test runs that stay red across turns. A non-converging outer loop shows up as a series of substantially different diffs across turns without any of them reaching a stable state. Modexa's analysis of stuck agent loops — a single turn consuming millions of tokens before hitting a wall — is the canonical inner-loop pathology (Modexa, 2026).

What Each Loop Is

Loop 1: Tool Loop (Inside a Turn)

A user-facing turn is itself a loop: the model emits a response, the harness executes any tool call, the result is appended to the prompt, and the model is re-queried. Termination is "no pending tool calls" (Unrolling the Codex Agent Loop, OpenAI); context grows inside the turn because each tool result is appended. Full treatment in Model a Single Agent Turn as Many Inference and Tool-Call Iterations. The boundary that matters for diagnosis: this loop is entirely inside one turn, and a stuck tool loop appears as repeated tool calls without a final assistant message.

Loop 2: Verification Loop (Across Turns, Tests as Boundary)

The verification loop runs the code, observes failure, passes the error back, verifies the fix — sourced as run / inspect / ask / review-diff in GitHub's Copilot CLI guide (GitHub Blog). Termination is binary and machine-checkable: a test or build runs green. Anthropic's evaluator-optimizer is the same shape with an explicit second model as evaluator (Building Effective Agents, Anthropic). Full treatment in Failure-Driven Iteration. The boundary that matters: the loop terminates only when the verification tool says PASS — "the agent thinks it's done" does not count.

Loop 3: Convergence Loop (Across Turns, Stable State as Boundary)

The outer loop iterates plan → act → verify across many turns. For tasks with a deterministic test harness, the verification loop is sufficient: tests pass, stop. For prose, specs, design documents, and partially-specified code tasks, no machine-checkable gate exists, so the outer loop needs convergence signals — change velocity, output size, content similarity — to decide when further passes yield diminishing returns (Convergence Detection, drawing on Madaan et al., Self-Refine).

The outer loop's failure mode is "progress that doesn't converge" — the diff between consecutive turns stays large, the agent alternates between two trade-offs, or the output grows each turn. Lee et al.'s RefineBench found self-refinement without an external check gains only +1.8 percentage points over five iterations on frontier models and that models routinely halt early under overconfidence (RefineBench, 2025) — so the outer loop needs external checks or convergence signals, not the model's own confidence.

How The Loops Nest

graph TD
    A[User intent] --> B[Outer convergence loop]
    B --> C[Plan]
    C --> D[Agent turn]
    D --> E[Tool loop: inference + tool calls]
    E -->|No pending tool calls| F[Verification loop: run tests/build]
    F -->|Red| D
    F -->|Green| G{Converged?}
    G -->|No| C
    G -->|Yes| H[Done]

A symptom at one layer often masks a failure at another: the agent looks stuck mid-tool-loop but the spec was ambiguous (outer loop); tests stay red but the test encodes a flawed assumption (verification-loop boundary wrong). Naming the loops separately makes "where is the failure" answerable.

Why It Works

Each loop produces a distinct observable signal that maps to a distinct operator intervention. Repeated identical tool calls inside one turn signal a runaway tool loop; the fix is an iteration cap (Modexa, 2026). Persistent red tests across turns signal a verification-loop failure; the fix is a sharper error signal — shorter feedback cycle, narrower test, full stack trace pasted rather than summarised (Claude Code best practices, Anthropic: Effective Harnesses). Oscillating diffs across turns signal a non-converging outer loop; the fix is to stop iterating and revisit the plan, using convergence signals rather than intuition (Convergence Detection). Without the frame, the operator reaches for the nearest intervention; with it, the symptom routes to the matching fix.

How This Differs From Other Tri-Loop Framings

Two other tri-loop framings exist; each cuts at a different axis, so they are additive rather than interchangeable.

Framing Axis Loops named
This page (Tool / Verification / Convergence) Failure-mode diagnostics — which intervention applies Inside one turn / across turns with tests / across turns with no test gate
Kim & Yegge, Vibe Coding (IT Revolution) Lifecycle timeframes — what to invest in at each cadence Inner (within-task) / middle (memory, coordination) / outer (governance, CI/CD)
Kief Morris's why / how loops (Martin Fowler) Human positioning — where the human sits Why loop (human-owned) / how loop (agent-owned, nested feature/story/code) — see Humans and Agents in Software Engineering Loops

Use the diagnostic loops when the question is "why is the agent stuck right now"; use Kim and Yegge's for "what should we invest in this quarter"; use Morris's for "should the human be reviewing diffs or fixing the harness." Mixing labels without naming the axis produces confused conversations.

When This Backfires

The frame adds cognitive overhead — three loops to remember — worth paying only when diagnosis is hard. It backfires in four conditions:

  • Single-shot tasks. A typo fix or version bump runs entirely inside one tool loop. No verification or convergence loop exists to diagnose.
  • Code tasks with a sharp test gate. When pytest -x is the entire stopping criterion, the convergence loop collapses into the verification loop — two loops suffice. This is the case Philipp Schmid argues from (philschmid.de) and it holds for most CRUD-shaped code work.
  • Team already uses Kim & Yegge or Morris's framing. Introducing competing tri-loop vocabulary produces terminology drift, not better diagnosis. Adopt the local framing or accept the rename cost explicitly.
  • Pure exploration or debugging. When the goal is to discover what is wrong, "which loop am I in" is the wrong question. The move is read more, hypothesise — not pick-a-loop.

The signal: if you can name the failing loop in under five seconds, the frame is doing its job. If you find yourself debating which loop a symptom belongs to, the underlying issue is something else (spec ambiguity, missing context) and the label is a distraction.

Example

A developer is implementing a multi-step refactor with Claude Code. After 40 minutes the session is still working. Three symptoms appear in close succession:

Symptom 1: A single turn runs for two minutes without surfacing a response. Inspecting the trace shows the agent has called Read on the same file four times with slight path variations. This is the inner tool loop spinning — the file is in an unexpected location and the agent is searching by retry. Intervention: pause the session, run Glob manually, paste the correct path into the next prompt, lift the iteration cap.

Symptom 2: After the file is located, the agent makes a fix. pytest stays red across three consecutive turns; each turn the agent tries a different small edit but the failing test does not change. This is the verification loop failing — the error message in the test is too generic ("AssertionError") to give the agent useful signal. Intervention: rerun the test with -v --tb=long, paste the full traceback, and ask the agent to fix the root cause rather than the symptom.

Symptom 3: Tests pass. But the next turn's diff looks materially different from the previous turn's — the agent has refactored a function that was already correct, introducing risk. The turn after that, the agent reverts most of those changes. This is the outer convergence loop oscillating — the agent has finished the original task but the session has not stopped. Intervention: end the session; the task converged two turns ago.

The same session showed three failures across three different loops, each with a distinct fix. Without the named-loop frame they all read as "the agent is being weird"; with it, each one routed to a specific intervention page.

Key Takeaways

  • Three nested loops compose an agent session: the inner tool loop (within a turn), the verification loop (tests as boundary), and the outer convergence loop (stable state as boundary).
  • Each loop fails with a distinct observable signature — spinning tool calls, persistent red tests, oscillating diffs — so the symptom routes to the matching intervention.
  • The frame's contribution is diagnostic, not architectural — it does not change how agents are built, only how operators read traces.
  • Single-shot tasks and tasks with a deterministic test gate do not need the frame; the verification loop is sufficient.
  • Two other tri-loop framings exist (Kim & Yegge's lifecycle, Morris's human-positioning); pick the one that matches the question you're asking and label which axis you're cutting on.

Sources

Feedback