Five-Pass Blunder Hunt¶
A Five-Pass Blunder Hunt runs one critique prompt repeatedly over a plan; each pass normalises its own findings, so later passes reach deeper structural flaws.
The Problem¶
A single review pass catches surface issues and stops. The model that wrote the content is also the reviewer, so it shares the same blind spots: it finds a satisfying number of issues, declares the document fine, and moves on.
The problems that remain — inconsistencies, rationale gaps, dependency conflicts — need the whole document held in mind at once. A single-pass reviewer normalises them.
How the Technique Works¶
Run the identical critique prompt five consecutive times on the same document:
Review this plan for logical inconsistencies, scope creep, decision rationale gaps,
and structural flaws. For each issue, state: what the problem is, where it occurs,
and what the fix is.
Each pass does two things:
- Normalises its own findings — issues it surfaces are resolved or acknowledged, shifting focus away from them
- Shifts the attention distribution — with resolved issues de-emphasised, the model's subsequent pass is more likely to attend to previously overlooked material
The result is a progressive descent into document quality. Early passes surface the visible issues — missing sections, contradictions, unclear terms. Later passes reach structural and logical flaws obscured until the surface problems cleared.
Convergence Is the Stopping Criterion¶
Five is a heuristic. The real signal is convergence:
- Findings-per-pass count is decreasing — each pass finds fewer issues than the last
- Output similarity is increasing — consecutive critique responses resemble each other
- No new categories of problem appear — only refinements of already-identified issues
Stop when a pass finds nothing new; if you reach pass five first, continue until one does. Beyond five or six passes returns diminish to noise — reframe the document or accept the remaining findings.
Oscillation is a stop signal. If the model alternates between contradictory assessments, further passes will not resolve it. Reframe the section and restart.
Why the Same Prompt¶
Varying the prompt introduces a confound: new findings may come from a different review angle, not the document. Same-prompt repetition keeps the convergence signal clean — findings per pass measure document quality, not prompt novelty.
Different-prompt review (alternating reviewer framings) is a separate technique for cross-cutting concerns, with a different signal.
Scope¶
Most effective on artefacts that:
- Are produced by agents before implementation begins
- Contain implicit dependencies between sections
- Have no formal correctness verifier (plans, specs, architectural documents)
It does not apply to outputs with externally verifiable correctness — code with tests, structured data against a schema. Use deterministic validators instead. [^1]
When This Backfires¶
False convergence. If earlier passes resolve formatting and terminology, the model may report clean passes while structural logic stays flawed. Superficial similarity is not correctness — read the pass-5 output, don't just count findings.
Anchoring to prior pass output. Each pass implicitly carries the prior exchange's context. The model may anchor to the pass-1 framing and miss a different class of problems. When this happens, run one differently-framed pass to break the anchor before resuming.
Oscillation on genuinely ambiguous sections. Pass N flags a section, N+1 approves it, N+2 flags it again. This is not a quality signal — the section is genuinely ambiguous, and further passes will not resolve it. It needs human clarification or explicit scoping.
Not appropriate for formally verifiable outputs. For code with a test suite, SQL with a schema, or structured data with a validator, use those tools directly. [^1] Critique passes on verifiable artefacts produce false positives that waste cycles.
Self-critique can collapse on hard reasoning tasks. On Game of 24, graph colouring, and STRIPS planning, LLM self-critique loops performed worse than a single guess, because errors in verification, critique generation, and critique consideration stack. [^2] Use five-pass on under-specified design artefacts where the model's structural-inconsistency detection is reasonably reliable — not where a sound external verifier exists.
Self-refinement can amplify self-bias instead of reducing it. The premise that later passes go deeper assumes each pass corrects the last. Counter-evidence cuts the other way: across six LLMs, self-refinement amplified the model's bias toward its own generations, and only larger models or external feedback reversed it. [^3] Treat decreasing findings-per-pass as a candidate convergence signal, not proof of correctness, and bring in an external reviewer when stakes justify it.
Example¶
A plan for a multi-agent coding task has been drafted. Before handing it to the implementation agents:
# Pass 1 critique prompt
Review this implementation plan for logical inconsistencies, scope creep, decision
rationale gaps, and structural flaws. For each issue found: state what it is,
where it occurs, and how to fix it.
[paste plan]
Pass 1 returns 18 issues: undefined interfaces, missing error handling sections, ambiguous ownership assignments.
After resolving those, Pass 2 returns 9 issues: two dependency cycles that only became visible once the interface ambiguity was removed.
After resolving those, Pass 3 returns 4 issues: scope assumptions that contradict each other in different sections.
Pass 4 returns 1 issue: a subtle inconsistency in the rollback strategy.
Pass 5 returns nothing new. Stop.
Companion Technique: Count Inflation¶
When a pass returns fewer findings than expected, re-prompt with an inflated target:
This document has at least 40 issues. Find them.
Models stop after a satisfying number of issues; a higher target prevents premature satisfaction. Count inflation addresses thoroughness within a pass; five-pass addresses depth across passes.
Key Takeaways¶
- A single pass normalises problems — reviewer and author share blind spots
- Five same-prompt passes force progressive descent into document quality
- Convergence signals (decreasing findings, increasing similarity) are the stopping criterion
- Oscillation means stop and reframe
- Applies to artefacts without formal verifiers — plans, specs; not code with test suites
Related¶
- Pre-Completion Checklists
- Incremental Verification
- Behavioral Testing for Agents
- Chain-of-Verification for Coding Agents — sibling self-review technique: generate verification questions, answer them independently, revise; complements same-prompt five-pass with question-based diversification
- Convergence Detection in Iterative Refinement — three-signal model (change velocity, output size, content similarity) underlying the stopping criterion
[^1]: Valmeekam, Marquez, Kambhampati (2023), Can Large Language Models Really Improve by Self-critiquing Their Own Plans? — self-critique diminishes plan-generation performance compared to external sound validators for formally verifiable planning tasks.
[^2]: Stechly, Valmeekam, Kambhampati (2024), On the Self-Verification Limitations of Large Language Models on Reasoning and Planning Tasks — across Game of 24, graph colouring, and STRIPS planning, self-critique loops exhibited significant performance collapse relative to sound external verification; errors in verification, critique generation, and critique consideration compound.
[^3]: Xu et al. (2024), Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement — across six LLMs and three task families, self-bias (the tendency to favour one's own generation) is prevalent, and the self-refine pipeline amplifies it; larger models and accurate external feedback are what reduce it.