Web Search Agent Loop¶

A web search agent loop wraps retrieval in a cycle of search, evaluate, refine, and synthesize, letting the agent decide when evidence is sufficient.

Pipeline vs. Control Loop¶

Classic search-augmented generation retrieves once, then generates. The agent loop iterates until a termination condition fires, with loop detection guarding against cycles that never settle.

flowchart LR
    A[Formulate Query] --> B[Search]
    B --> C[Evaluate Results]
    C -->|Gaps found| D[Refine Query]
    D --> B
    C -->|Sufficient| E[Synthesize]

Three decisions per iteration:

Decision	Question
Continue	Are there gaps worth filling?
Pivot	Should the query strategy change?
Stop	Is evidence sufficient to answer?

Core Mechanics¶

Query Formulation¶

Decomposition: Split complex questions into independent sub-queries
Plan-then-execute: Generate queries per plan step, passing prior results as context, as in Perplexity Pro Search
Broad-to-narrow: Start broad; narrow on intermediate findings

Result Evaluation¶

Filter results before they enter context:

Signal	What to check
Relevance	Addresses the query or tangential?
Credibility	An arxiv paper or official docs (primary) over an aggregator or blog
Freshness	Current enough for the question?
Redundancy	New information or a duplicate?

Gap-driven follow-ups: Target queries at what is still unknown
Context accumulation: Feed earlier results into later iterations
Query reformulation: When results are poor, rephrase or narrow

Synthesis¶

Combine findings with source attribution; flag conflicting evidence.

Termination Strategies¶

Strategy	Mechanism	Tradeoff
Budget cap	Max iterations or tool calls	Simple; may stop early or late
Plan completion	Stop when all planned steps execute	Requires good upfront planning
Evaluator decision	A second LLM judges sufficiency	More accurate; adds cost and latency
Diminishing returns	Track information gain per iteration	Requires a gain metric
Loop detection	Detect repeated queries; terminate or pivot	Prevents wasted cycles

Pair a hard cap with a softer quality signal. Anthropic's multi-agent research system scales by query type: 1 agent at 3–10 tool calls for fact-finding, 2–4 subagents at 10–15 calls for comparisons, 10+ for complex research.

Architecture Patterns¶

Two-Tool Separation¶

Claude Code splits web research across two tools (reference):

WebSearch: server-side search returning titles and URLs only
WebFetch: URL plus prompt; a Claude Haiku pass extracts a targeted answer instead of raw HTML

Discovery stays cheap; deep reading is trimmed before reaching context.

Orchestrator-Worker¶

An orchestrator spawns workers in parallel:

flowchart TD
    O[Orchestrator] -->|sub-query 1| W1[Worker 1]
    O -->|sub-query 2| W2[Worker 2]
    O -->|sub-query 3| W3[Worker 3]
    W1 -->|findings| O
    W2 -->|findings| O
    W3 -->|findings| O
    O --> E{Enough?}
    E -->|No| O
    E -->|Yes| S[Synthesize + Cite]

Anthropic's research system runs a lead researcher with 3–5 parallel subagents in their own contexts, then a separate citation agent attributes claims to sources.

Breadth and Depth Parameters¶

LangChain's Open Deep Research exposes Breadth (parallel queries per iteration) and Depth (refinement cycles) as knobs. A supervisor spawns researchers per breadth and recurses for depth. Termination is deterministic: stop at the breadth, depth, or per-agent cap.

Why It Works¶

The first query reflects only what the agent knew before searching; each round surfaces evidence that reshapes what is worth asking next. Anthropic reports a 90.2% improvement over single-agent research from two mechanisms — parallel subagents widen the explored surface, and each subagent's own context lets findings compound without polluting the lead. Gap-driven reformulation also avoids "query lock-in."

When This Backfires¶

Skip the loop and use a single query plus light validation when:

The answer lives on one page: official docs, an RFC, or a README make iteration pure latency and token spend
Fact-finding has a verifiable shape: a short answer with a clear authority does not benefit from iteration
Cost and latency dominate: Anthropic notes multi-agent research uses ~15× the tokens of single-turn chat; unbounded depth/breadth multiplies this
The question is subjective or contested: more sources amplify disagreement and can manufacture false confidence
Breadth beats depth: for trend-spotting, a broad query with strong reranking often beats recursive narrowing
Sequential reasoning dominates: Google Research's 180-configuration scaling study found multi-agent coordination degrades sequential-reasoning tasks by 39–70% while improving parallelizable ones by ~81% — the win is task-shape-specific
Coordination breaks down at scale: CIO (March 2026) reports adding agents amplifies planning paralysis, instruction-ignoring, and redo-loops; chain deterministically rather than letting agents collaborate

Example: Configuring a Research Loop¶

A minimal research loop in pseudocode:

research(question, max_iterations=5):
    findings = []
    queries = decompose(question)

    for i in range(max_iterations):
        for q in queries:
            results = web_search(q)
            relevant = evaluate(results, question, findings)
            findings.extend(relevant)

        gaps = identify_gaps(question, findings)
        if not gaps:
            break
        queries = generate_followup_queries(gaps)

    return synthesize(question, findings)

The key design choices are in evaluate (what counts as relevant), identify_gaps (what is still missing), and the max_iterations budget.

Key Takeaways¶

The research loop is a control loop, not a pipeline — the agent decides when to continue, pivot, or stop
Separate discovery (search) from deep reading (fetch) to keep costs predictable
Always set a hard budget cap even when using quality-based stopping
Gap-driven follow-ups outperform minor variations on the same query
Repeated queries or diminishing result quality signal stagnation

Loop Detection — detecting and breaking repetitive agent behavior
Retrieval-Augmented Agent Workflows — RAG as a foundation for agent-driven retrieval
Sub-Agents and Fan-Out — parallel worker coordination pattern
Browser Automation as a Research Tool — fallback when HTTP fetch is blocked
Lexical-First Retrieval for Agentic Search — when a strong loop lets a tuned BM25 index match dense retrieval on deep-research benchmarks
Evaluator-Optimizer — iterative generate-evaluate loop pattern
Orchestrator-Worker — multi-agent coordination architecture
LLM-as-Judge Evaluation with Human Spot-Checking — using an LLM judge to evaluate agent outputs at scale