Skip to content

Closed-Loop Role-Based Refinement

Role-based refinement splits the self-improving agent loop into five specialized roles, adding persistent knowledge and gated persistence to prevent regression.

Beyond the Single-Loop Flywheel

Closed-loop role-based refinement structures the self-improvement cycle as five specialized roles -- Competitor, Analyst, Coach, Architect, and Curator -- each with a bounded contract, feeding output into the next role in sequence.

Single-loop patterns like the agentic flywheel and the continuous agent improvement workflow treat improvement as one activity. Role-based refinement splits it into five distinct responsibilities.

AutoContext implements this as five collaborating agent roles, with knowledge persisting between runs to avoid cold starts.

Five-Role Decomposition

Each role maps to a stage in the improvement loop, but with explicit contracts that prevent scope bleed:

graph LR
    A[Competitor] -->|results| B[Analyst]
    B -->|explanation| C[Coach]
    C -->|playbook updates| D[Architect]
    D -->|structural changes| E[Curator]
    E -->|approved knowledge| A
Role Responsibility Contract
Competitor Propose and execute strategies against the current task Produces results; does not analyze or persist them
Analyst Explain why strategies succeeded or failed Produces explanations; does not modify playbooks
Coach Update playbooks and hints based on analysis Modifies knowledge artifacts; does not propose strategies
Architect Suggest structural changes to the system itself Proposes tool and pipeline modifications; does not execute tasks
Curator Gate what persists -- approve, reject, or roll back knowledge changes Controls persistence; does not generate content

The key constraint: each role's output is the next role's input, and no role exceeds its contract.

Persistent Knowledge Layers

Cold starts waste each session rediscovering context. Role-based refinement counters this with structured knowledge that survives across runs:

Layer Contents Update frequency
Playbooks Validated strategies and approaches Updated by Coach after each analysis cycle
Hints Tactical observations not yet promoted to playbook status Updated frequently; pruned by Curator
Tools Reusable scripts and utilities discovered during execution Added by Architect; validated before persistence
Reports Analysis outputs and progress snapshots Append-only; used for trend detection

Unlike simpler patterns (claude-progress.txt, AGENTS.md), hints are tentative and playbooks are validated -- promotion between them is gated by the Curator.

Staged Validation and Rollback

Not every proposed improvement should persist. The system applies validation gates at multiple stages:

graph TD
    P[Proposed change] --> V1[Preflight check]
    V1 -->|pass| V2[Prevalidation]
    V2 -->|pass| V3[Probe run]
    V3 -->|pass| V4[Staged validation]
    V4 -->|pass| C[Committed to knowledge]
    V1 -->|fail| R[Rolled back]
    V2 -->|fail| R
    V3 -->|fail| R
    V4 -->|fail| R

Weak strategies roll back automatically, preventing regressions where changes pass initial tests but degrade edge cases. Guards include stagnation detection, dead-end management, and rapid gating.

Frontier-to-Local Distillation

A cost-performance pattern: use frontier models (Claude, GPT-4) for exploration in the Competitor and Analyst roles, encode validated strategies in playbooks, then execute with local models (Ollama, vLLM, MLX) on later runs. Frontier models re-engage only on stagnation or novel problems.

The ACE framework (arxiv:2510.04618) applies the same Generate/Reflect/Curate decomposition and reports +10.6% on agent benchmarks (and +8.6% on finance) over strong baselines without fine-tuning; on AppWorld it matches the top-ranked production agent overall and surpasses it on the harder test-challenge split -- evidence that structured role decomposition with persistent context beats single-loop patterns.

Applying the Pattern

The five roles map to any multi-agent system without requiring AutoContext's full implementation:

If you have... Map the roles to...
Claude Code sub-agents Five sub-agents with role-scoped system prompts
A CI/CD pipeline Five pipeline stages with distinct responsibilities
A manual review process Five review passes, each checking one dimension
A single-agent loop Five phases within the same session, with explicit transitions

The minimum viable version: separate "generate" from "evaluate" from "persist." The evaluator-optimizer pattern covers the first two; a Curator role to gate persistence is the third step that prevents regression.

Example

A minimal five-role loop using Claude sub-agents with role-scoped system prompts:

import anthropic

client = anthropic.Anthropic()

ROLES = {
    "competitor": "Propose and execute a strategy for the given task. Return only results.",
    "analyst":    "Explain why the strategy succeeded or failed. Return only analysis.",
    "coach":      "Update the playbook based on this analysis. Return only playbook changes.",
    "architect":  "Suggest structural improvements to the system. Return only proposals.",
    "curator":    "Approve or reject the proposed changes. Return APPROVE or REJECT with reason.",
}

def role_turn(role, content):
    response = client.messages.create(
        model="claude-opus-4-5",
        max_tokens=1024,
        system=ROLES[role],
        messages=[{"role": "user", "content": content}],
    )
    return response.content[0].text

task      = "Optimize the retry logic in our API client."
results   = role_turn("competitor", task)
analysis  = role_turn("analyst",    f"Task: {task}\nResults: {results}")
playbook  = role_turn("coach",      analysis)
proposals = role_turn("architect",  playbook)
decision  = role_turn("curator",    proposals)

if decision.startswith("APPROVE"):
    print("Persisting:", decision)
else:
    print("Rolled back:", decision)

Each role receives only the prior role output -- no shared context window. The Curator decision gates persistence; rejected proposals are discarded without modifying the knowledge store.

When This Backfires

Role decomposition adds coordination overhead that pays off only across many iterations. Three conditions where the pattern is worse than a simpler alternative:

  • Single-session or low-iteration tasks. Persistent knowledge layers add no value if the agent runs once or twice; the five-role handoff just adds latency.
  • Curator as bottleneck. A synchronous Curator gate on the critical path stalls the loop when approval is cautious. Teams needing rapid iteration may find a two-role evaluator-optimizer loop more practical than the full five-role handshake.
  • Fuzzy role contracts. If the Analyst proposes playbook edits or the Coach analyses results, boundaries collapse and handoff failures become hard to attribute. The pattern needs strict prompt discipline.

A two-role evaluator-optimizer loop is often sufficient when tasks are bounded, the improvement signal is clear, and persistence is not a goal.

Key Takeaways

  • Split the self-improving loop into five role-scoped contracts -- Competitor, Analyst, Coach, Architect, Curator -- so no role exceeds its mandate.
  • Persistent knowledge layers (playbooks, hints, tools, reports) eliminate cold starts; the Curator gates promotion between them.
  • Staged validation with automatic rollback prevents improvements that pass initial tests but regress on edge cases.
  • The overhead pays off only across many iterations -- prefer a two-role evaluator-optimizer loop for bounded, low-iteration tasks.
Feedback