Prototype Before Optimizing¶

Prototype with generous budgets to establish a quality baseline before applying optimization pressure — otherwise compression hides regressions and locks in suboptimal architectures.

Related lesson: Monolith to Sub-Agents — this concept features in a hands-on lesson with quizzes.

The problem with early optimization¶

Teams often apply token budgets and prompt compression at the start of development, before understanding what high-quality behavior requires. The compressed workflow may look faster while silently degrading quality — there is no baseline to detect the regression.

The nibzard/awesome-agentic-patterns catalog identifies the root cause: "Teams often optimize token spend too early, forcing prompts and context windows into tight constraints before they understand what high-quality behavior looks like." This hides failure modes and can entrench architectures that only appear functional under constrained conditions.

See also: Token Preservation Backfire — the failure mode where efficiency instructions create a competing objective that overrides the agent's actual task.

The temporal dimension¶

Existing budget allocation patterns address structure: what to load and how much reasoning to allocate per phase. This pattern adds a temporal dimension: when in the development lifecycle to apply optimization pressure.

graph LR
    A[Prototype phase<br/>generous budgets] --> B[Baseline established<br/>quality measured]
    B --> C[Optimization phase<br/>A/B vs baseline]
    C --> D{Quality maintained?}
    D -->|Yes| E[Production deployment]
    D -->|No| C

Two separate stages with a hard gate between them:

Phase	Goal	Budget constraint
Prototype	Discover what quality looks like	Minimal — remove limits that hide failure modes
Production	Deliver quality efficiently	Enforce — but only against a measured baseline

How to prototype without hiding failure modes¶

During prototyping, the objective is learning, not efficiency. Constraints that make the workflow look fast before failure modes surface create false confidence.

Remove hard token ceilings per call. Let reasoning run until the model is done, not until a budget is exhausted. If the model hits a limit and produces a truncated result, you learn nothing about the actual failure boundary.

Enable multiple reasoning passes. Self-consistency and self-reflection loops improve reasoning quality but need generous budgets. Compressing these before you understand them removes the signal that reveals where the workflow actually breaks.

Set temporary spending ceilings per experiment, not per call. Bound the total cost of a discovery run, not individual responses within it. This caps spending without distorting individual outputs.

Track quality and token consumption together from the start. Without parallel measurement, you have no basis for the optimization phase.

What "generous" does not mean¶

Unlimited budgets in all phases is not the goal. LangChain's deep agent research found that continuous maximum reasoning compute across all phases scored lower (53.9% completion) than structured allocation (66.5%) because of agent timeouts — the model was spending resources on reasoning that did not improve execution steps (LangChain: harness engineering for deep agents).

"Generous" means: don't apply constraints that prevent failure modes from surfacing. It does not mean maximum compute everywhere regardless of phase.

The optimization gate¶

The shift from prototype to optimization needs a baseline — a documented, reproducible quality measurement. Without it, you cannot tell compression that degrades quality from compression that is safe.

The optimization phase runs as an A/B comparison:

Define your eval suite — tasks representative of real production use.
Record baseline metrics — quality scores, completion rates, and error rates under unconstrained conditions.
Apply one optimization at a time — token budget reduction, prompt compression, or context pruning.
Compare against the baseline — if quality metrics fall below threshold, the optimization is unsafe.

Eval-Driven Development for tool building covers the prototype-evaluate-analyze-iterate loop that makes this systematic.

Trade-offs¶

	Prototype-first	Optimize-first
Upfront cost	Higher inference spend	Lower immediate spend
Risk	Budget overrun during discovery	Locking in suboptimal architecture
Baseline for regression testing	Established	Absent
Failure mode visibility	High — limits don't mask errors	Low — compression hides degradation

The trade-off is real: higher upfront inference cost for faster baseline discovery and fewer premature architectural choices. Teams on tight budgets can bound total experiment cost per discovery run while keeping per-call limits off.

Key Takeaways¶

Apply token constraints after establishing a quality baseline, not before — you need something to regress against
During prototyping, remove limits that prevent failure modes from surfacing; set spending ceilings per experiment, not per call
The optimization phase is an A/B comparison against the baseline, one change at a time
"Generous budget" means unconstrained per-call limits during discovery, not maximum compute everywhere — continuous maximum reasoning can degrade completion rates due to timeouts
Track quality and token consumption together from the start; without parallel measurement, optimization targets nothing

Context Budget Allocation: Every Token Has a Cost — structural allocation: what to load and how much
Reasoning Budget Allocation: The Reasoning Sandwich — phase-level allocation: max compute for planning/verification, reduced for execution
Eval-Driven Development: Write Evals Before Building Agent Features — defining success criteria before building, plus the prototype-evaluate-analyze-iterate loop for tool building
Token Preservation Backfire — the failure mode when efficiency instructions override task completion
The Velocity-Quality Asymmetry — why compounding quality debt reverses velocity gains
Prompt Compression: Maximizing Signal Per Token — how to compress safely once a baseline exists