The Advisor Strategy: Frontier Model as Strategic Advisor¶
Pair a cost-effective executor model with a frontier advisor that provides strategic guidance on hard decisions — within a single API call, no orchestration required.
The Pattern¶
Most agent turns are mechanical — reading files, running commands, writing code. A few need strategic reasoning: choosing an architecture, recovering from a dead end, verifying completeness. A frontier model on every turn wastes compute; a cheap model alone misses the critical decisions.
The advisor strategy separates these at the API level. A cost-effective executor (Sonnet or Haiku) handles tool use; on hard decisions it consults a frontier advisor (Opus) that reads the full transcript and returns strategic guidance. Anthropic's advisor_20260301 tool implements this server-side in a single /v1/messages request — no decomposition logic, no extra round-trips.
sequenceDiagram
participant U as User
participant E as Executor (Sonnet/Haiku)
participant A as Advisor (Opus)
U->>E: Task
E->>E: Tool calls (read, search, write)
E->>A: Consult on hard decision
A-->>E: Strategic guidance (400-700 tokens)
E->>E: Continue execution with advice
E->>U: Result
How It Works¶
The executor decides when to call the advisor. The server runs a separate inference pass with the executor's full transcript. The advisor returns text guidance — thinking blocks are dropped, no tool calls, no user-facing output. The executor resumes, informed by the advice.
API Integration¶
Add the advisor to tools alongside your existing tools. The beta header advisor-tool-2026-03-01 is required (API docs):
response = client.beta.messages.create(
model="claude-sonnet-4-6", # executor
max_tokens=4096,
betas=["advisor-tool-2026-03-01"],
tools=[
{
"type": "advisor_20260301",
"name": "advisor",
"model": "claude-opus-4-7", # advisor
"max_uses": 3, # per-request cap
},
# ... your other tools
],
messages=[...],
)
| Parameter | Type | Default | Purpose |
|---|---|---|---|
type |
string | required | Must be "advisor_20260301" |
model |
string | required | Advisor model ID — billed at this model's rates |
max_uses |
integer | unlimited | Per-request cap on advisor calls |
caching |
object | off | Advisor-side prompt caching; breaks even at ~3 calls per conversation |
The advisor must be at least as capable as the executor; check the API docs for the current supported pairs.
Benchmark Results¶
From Anthropic's announcement:
| Configuration | Benchmark | Result | Cost Impact |
|---|---|---|---|
| Haiku + Opus advisor | BrowseComp | 41.2% vs 19.7% solo (+109%) | 85% cheaper than Sonnet alone |
| Sonnet + Opus advisor | SWE-bench Multilingual | +2.7pp over Sonnet solo | -11.9% cost per agentic task |
When to Consult the Advisor¶
The advisor pays off on decisions with high downstream cost if wrong. Anthropic's recommended timing for coding:
- After initial exploration — once the executor understands the problem, consult before committing to an approach.
- When stuck — errors recurring, approach not converging.
- Before declaring done — make the deliverable durable first (write the file, commit the change), then consult for a final review.
Cost Controls¶
Advisor tokens bill at Opus rates; executor tokens at executor rates. Savings come from the advisor producing only short guidance, not the full output (API docs).
- Per-request cap: set
max_usesto limit advisor calls per request. - Conversation-level cap: track client-side. At the ceiling, remove the advisor from
toolsand stripadvisor_tool_resultblocks from history. - Output compression: a per-message instruction (e.g., "keep guidance under 80 words") shortens output; Anthropic recommends asking for ~80% of your true ceiling since the advisor occasionally exceeds it.
- Effort pairing: Sonnet at medium effort + Opus advisor matches Sonnet at default effort.
When This Backfires¶
Each consultation is a second inference pass at Opus rates. A single strong model is better when (API docs):
- The executor consults often. Frequent calls shift the token mix toward Opus rates and can exceed Opus solo cost.
- Every turn needs frontier capability. Uniformly hard tasks offer no mechanical turns to offload.
- Single-turn Q&A or pass-through routing. No plan to form.
- Latency budgets are tight. Each call pauses the executor stream while Opus runs.
- Priority Tier is only on the executor. It does not cascade to the advisor, which rate-limits independently.
Relationship to General Patterns¶
An API-native implementation of established patterns:
- Cognitive reasoning vs execution separation — advisor as reasoning layer, executor as execution layer, boundary enforced server-side.
- Cost-aware agent design — model routing by complexity without manual cascade logic.
- Reasoning budget allocation — the reasoning sandwich via selective advisor calls rather than per-phase model switching.
Key Takeaways¶
- The advisor strategy pairs a cost-effective executor with a frontier advisor consulted only on hard decisions — no orchestration code required.
- A single API call handles the full flow: the executor invokes the advisor like any other tool, and the server manages context routing.
- Haiku + Opus advisor more than doubles standalone BrowseComp performance at 85% less cost than Sonnet alone.
- Cap advisor calls with
max_uses(per-request) and client-side tracking (per-conversation) to control spend. - Call the advisor after exploration, when stuck, and before declaring done — skip it on mechanical turns.