Context Engineering¶
The discipline of designing what information enters a model's context window, how it is structured, and what is excluded — to maximise the quality and reliability of agent output.
Fundamentals¶
Core concepts that define context engineering as a practice and establish the structural patterns every other technique builds on.
- Context Engineering: The Discipline of Designing Agent Context — Context engineering is the practice of designing what information enters a model's context window, how it is structured, and what is excluded
- Context Priming — Load relevant context before asking an agent to act; the order information enters the context window shapes the quality of everything that follows
- Layered Context Architecture — Ground agents in multiple distinct context sources — schema, code, institutional knowledge, and persistent memory — rather than relying on any single signal
- Context Budget Allocation — Context is a finite budget; every token preloaded into the context window displaces a token available for reasoning, tool results, and implementation
- Discoverable vs Non-Discoverable Context — Only put non-discoverable information in agent instruction files; if the agent can find it in the codebase, let it find it
- Instruction-Guided Code Completion — Functional correctness and instruction adherence are independent capabilities; explicit implementation constraints and model selection close the gap
Attention & Positioning¶
Models do not attend uniformly across the context window. These pages cover where attention concentrates, where it drops off, and how to structure content accordingly.
- Attention Sinks — Transformer models disproportionately attend to initial tokens regardless of their semantic content; position determines attention weight, not importance
- Lost in the Middle — Model attention is strongest at the start and end of a context window; content in the middle receives significantly less focus regardless of its importance
- Context Window Dumb Zone — Output quality degrades as context fills, but the onset depends on task type; retrieval, reasoning, and code generation hit different thresholds
- Manual Compaction as Dumb Zone Mitigation — Auto-compaction fires at ~95% context fill, long after reasoning quality has degraded; manual compaction reframes context management as reasoning quality preservation
- Observation Masking — Strip intermediate tool results from conversation history once they have served their purpose to keep active context lean without losing the work product
- Context Window Anxiety — Advanced models exhibit behavioral shortcuts as context limits approach; strategic buffers, counter-prompting, and token budget transparency counteract premature task closure
- Turn-Level Context Decisions — Every completed turn is a branching point with five options: continue, rewind, clear, compact, or delegate to a subagent; choosing well is the core skill of context management
Compression & Caching¶
Strategies for fitting more useful content into less space, and for making repeated prefixes cheaper through provider caching mechanisms.
- Context-Window Diagnostic Tooling — Surface which tool calls are inflating the context window so you can optimize specific culprits rather than prune blindly
- Context Compression Strategies — Long-running agents accumulate context that eventually fills the window; tiered compression — offloading large payloads and summarising history — lets agents continue working without losing task continuity
- Selective Rewind Summarization — A user-chosen cut point compresses earlier turns to a summary while the recent turns stay verbatim — a targeted alternative to whole-session compaction
- Elastic Context Orchestration — A per-turn vocabulary of context operations — Skip, Compress, Snippet, Rollback, Delete — that lets long-horizon search agents tier retention by current task relevance instead of accumulating raw trajectory
- Prompt Compression — Write instructions that convey the same guidance in fewer words; shorter, denser instructions improve agent compliance and reduce token cost
- Prompt Caching: Architectural Discipline for Agents — Treat prompt caching as a structural constraint on prompt composition, with cross-provider economics and extended-TTL guidance folded in
- Static Content First for Cache Hits — Place static content at the beginning of the prompt and variable content at the end to maximize prompt cache hits and keep inference costs linear
- Stateful Iteration State-Carry — Carry typed persistent state across long agent loops through a state-read tool instead of replaying the full transcript each turn; converts O(n²) total token cost to O(n) when loops are long and observations are large
- Exclude Dynamic System Prompt Sections for Cross-Machine Cache Sharing — Move per-machine context (cwd, OS, shell, memory paths) out of the Claude Code system prompt so identical fleet configurations share one prompt-cache entry across users and machines
- KV Cache Invalidation in Local Inference — When Claude Code prepends an attribution header to prompts sent to local models, it invalidates the KV cache on every request and causes ~90% slower inference
- Semantic Density Optimization — Maximize task-relevant tokens in a codebase by eliminating zero-information ceremony while preserving naming, documentation, and commit context that agents cannot reconstruct without inference cost
- Validating Token-Optimized Formats Inside Agentic Loops — Switching tool schemas from JSON to TOON or TRON saves up to 27% tokens but regresses accuracy by 9-14 percentage points in end-to-end agentic loops; input-side and output-side compression carry different risk
- Source Code Minification for State-in-Context Agents — Stripping comments, whitespace, and shortening identifiers cuts input tokens 42% but drops SWE-bench Verified resolution rate from 50% to 38% — apply only when measured savings beat the accuracy cost
- Cross-Lingual Prompt Preprocessing (Local-LLM Token Arbitrage) — A local small model translates non-English prompts to English and rewrites them into compact task-oriented form before send; cuts input tokens 34–47% only when latency, accuracy, and fidelity costs do not erase the savings
Assembly & Composition¶
How to build, layer, and route context to the right agent at the right time rather than dumping everything into a single prompt.
- Dynamic System Prompt Composition — Build system prompts from modular, priority-ordered sections rather than monolithic static text, enabling mode-specific variants and efficient API caching
- Narrative Problem Reformulation for Code Generation — Rewriting a fragmented coding problem as a coherent three-part narrative measurably shifts which algorithms a code LLM selects, with reported 18.7% zero-shot pass@10 gains concentrated on harder competitive-programming tasks
- Phase-Specific Context Assembly — Optimise the orchestration layer that prepares each agent per phase; planners get summaries, workers get targeted file excerpts and validation commands
- Prompt Chaining — Decompose a complex task into a sequence of LLM calls where each step processes the output of the previous one, enabling verification and gate-checking at each stage
- Prompt Layering — Agent instructions arrive from multiple sources simultaneously; understanding the precedence order and conflict resolution prevents unpredictable behavior
- Filter and Aggregate in the Execution Environment — Run data processing logic inside the code execution sandbox before surfacing results to the model, so only the relevant subset of data enters context
- Evolving Playbooks — Replace monolithic prompt rewrites with structured delta entries that accumulate, refine, and organize agent strategies without losing domain knowledge
Loading & Retrieval¶
Techniques for getting the right context into an agent on demand, whether from code repositories, APIs, or structured knowledge bases.
- Context Hub — Fetch current, versioned API documentation into agent context at generation time so agents write against the live spec rather than stale training-data snapshots
- Retrieval-Augmented Agent Workflows — Pull context into the agent at the moment it is needed rather than preloading it at session start
- Live Browser as Agent Context Channel — Subscribe an agent to the developer's running browser tabs as live context — lower friction than copy-paste, but the developer's logged-in session enters the indirect-injection blast radius
- App-Window Snapshot as Agent Context — Bind one hotkey to send the active app window — rendered screenshot plus accessibility-tree text — as a single context unit; the richer payload changes which cross-app handoffs are plausible to delegate
- Repository Map Pattern — Parse source files with tree-sitter to extract structural symbols, rank them by graph importance, then binary-search fit the most relevant entries into the agent's available token budget
- Deterministic Anchoring — Inject call-graph, inheritance, and config-dependency facts as plain-text comments so code-agent navigation converges run-to-run; the win is reproducibility, not capability
- Semantic Context Loading — Query codebases through Language Server Protocol semantics — symbol lookup, reference finding, type navigation — rather than reading raw files
- Seeding Agent Context — Strategically place files, comments, and markers that agents discover during exploration and use to shape their behaviour
- Grounding Agents in Code the Model Has Never Seen — When the model has no training signal for a proprietary SDK or custom framework, it generates against the closest public API in training; provisioning must displace that prior, not just supplement it
- Environment Specification as Context — Feed dependency versions, lock files, and runtime constraints into agent context to prevent the 50–70% accuracy drop caused by environment-blind code generation
- Repository-Level Retrieval for Code Generation — AI coding agents that retrieve cross-file context from dependency graphs, ASTs, and semantic embeddings generate more accurate code than those limited to local file context
- AOCI: Symbolic-Semantic Repository Indexing — A persistent, query-independent blueprint pairing architectural coordinates with semantic content — read whole before any task, distinct from on-demand retrieval and token-fitted repo maps
- Structured Domain Retrieval — Combine hierarchical knowledge graphs with coverage-driven case selection to retrieve domain-specific context that flat vector search misses
- Schema-Guided Graph Retrieval — Use one shared domain schema across graph construction, query decomposition, and typed retrieval to improve multi-hop reasoning precision over private knowledge bases
- Chunking Strategy for RAG-Based Code Completion — Function-based chunking is dominated by every other strategy on line-level code completion; Sliding Window and cAST sit on the Pareto frontier, and doubling cross-file context length matters more than chunking choice
- LLM-Driven Logical Retrieval — When the agent LLM is frontier-capable, letting it emit AND/OR/NOT Boolean queries against an inverted index matches an agentic hybrid baseline at 41× lower indexing cost — under specific lexical-overlap conditions
- Compositional Skill Routing — Decompose a query into atomic sub-tasks, retrieve one skill per sub-task, then compose the plan — earns its cost only above hundreds of skills, where decomposition quality caps the system
Error Handling & Drift Prevention¶
Keeping agents on track across long sessions by preserving failure signals and reinforcing goals.
- Context-Injected Error Recovery — When a tool call fails, inject structured error context into the next inference call to prevent retry loops before they form
- Error Preservation in Context — Keep failed actions and error traces visible in the agent's context window; error history acts as negative examples that shift model behavior
- Goal Recitation — Periodically rewrite objectives, to-do lists, and status summaries at the tail of context to exploit recency bias and prevent goal drift in long-running sessions