Skip to content

Layered Context Architecture

Ground agents in multiple distinct context sources — schema, code, institutional knowledge, and persistent memory — rather than relying on any single signal.

Also known as

Agent Memory Patterns, Multi-Layer Context Grounding

Why Schema Alone Is Insufficient

Schema is necessary but not sufficient. Tables that look similar may differ in critical ways that only the pipeline code producing them clarifies — for example, whether a table includes first-party-only traffic or all traffic.

OpenAI's data agent demonstrates this. For a corpus of 70,000 datasets, schema metadata alone could not distinguish tables with similar names but different inclusion criteria. The difference lived in the transformation code.

The Six-Layer Model

OpenAI's data agent uses six context layers, aggregated offline and retrieved at runtime:

Layer What It Provides
Table usage and lineage Which queries use this table, what it feeds downstream
Human annotations Notes, warnings, and clarifications added by data owners
Code-derived enrichment Column meanings inferred from the pipeline code that produces them
Institutional knowledge Launches, incidents, canonical metric definitions from wikis and Slack
Persistent memory Corrections and constraints accumulated from prior agent interactions
Live runtime queries Fresh values queried at request time for volatile data

Each layer addresses blind spots in the others. Code enrichment fills the gap schema leaves. Institutional knowledge explains anomalies neither schema nor code captures. Memory surfaces corrections documented nowhere else.

The Coding Agent Analogue

For a coding agent, the layers map to:

Layer Coding agent equivalent
File structure Directory tree, module boundaries
Language server symbols Types, interfaces, function signatures, references
Repository history git log, commit messages, PR descriptions
ADRs and RFCs Architecture decision records, design documents
Memory Per-repo conventions the agent has learned from corrections
Live queries Current build status, open issues, recent test results

No single layer is complete. Types express intent but not rationale; git history records what changed but not why; ADRs record decisions but not the implementing code.

Offline Pipeline, Runtime RAG

Loading all six layers per request is impractical — volume exceeds any context window. The architecture separates concerns:

  • Offline: aggregate all layers into normalized embeddings, refreshed on a schedule
  • Runtime: retrieve the most relevant subset for the query via retrieval-augmented generation (RAG)

Latency stays predictable regardless of corpus size. The agent receives the context relevant to its task, not everything that might be.

A survey of Agentic RAG architectures confirms production systems combine heterogeneous sources — structured queries, semantic search, graph knowledge bases, and tool APIs — with specialized agents handling each source in parallel.

Priority of Layers

Layers are not equal. When a human annotation contradicts what the pipeline code suggests, the resolution order must be explicit. Human annotations typically take priority over code-derived enrichment, which takes priority over schema inference. Persistent memory corrections outrank general institutional knowledge.

Document the resolution order. An agent that silently favors code over an annotation is wrong in exactly the cases the annotation exists to correct.

Retrieval Noise Is Real

More layers do not monotonically improve accuracy. An arxiv analysis of RAG as noisy in-context learning derives bounds showing retrieval gains shrink with more examples and can flip to hurt performance past a threshold. Practitioner reports on RAG at scale describe precision drops beyond ~10,000 documents and collapse past ~50,000. Before adding a layer, confirm the blind spot it closes causes real production errors, not a theoretical gap.

Example

The following TypeScript snippet shows a coding agent that retrieves context from multiple layers at runtime before answering a question about a function. Each layer fills a blind spot the previous one leaves.

// runtime RAG: assemble context from multiple layers before calling the model
async function buildContext(symbolName: string): Promise<string[]> {
  const chunks: string[] = [];

  // Layer 1 — file structure and module boundaries (always available)
  const fileTree = await getDirectoryTree("src/");
  chunks.push(`File structure:\n${fileTree}`);

  // Layer 2 — language server: type signature and references
  const signature = await lspHover("src/", symbolName);
  const refs = await lspReferences("src/", symbolName);
  chunks.push(`Type signature:\n${signature}`);
  chunks.push(`Referenced in: ${refs.join(", ")}`);

  // Layer 3 — git history: what changed and why
  const log = await execGit(`log --oneline -10 -- src/ | grep ${symbolName}`);
  chunks.push(`Recent commits:\n${log}`);

  // Layer 4 — ADR / design docs: rationale
  const adr = await searchDocs(`docs/decisions/`, symbolName);
  if (adr) chunks.push(`Architecture note:\n${adr}`);

  // Layer 5 — persistent memory: corrections from prior sessions
  const memory = await readMemory(`corrections/${symbolName}.md`);
  if (memory) chunks.push(`Prior correction:\n${memory}`);

  return chunks;
}

Each chunks.push call adds a layer. The type signature tells the agent what the function accepts; the git log tells it what recently changed and why; the ADR captures the design rationale; the memory entry surfaces a correction that isn't recorded anywhere else. No single layer would be sufficient — the type signature says nothing about the rationale, and the ADR says nothing about the current signature.

When This Backfires

The six-layer model is optimized for large, complex corpora. It carries real engineering overhead.

  • Small corpora — a codebase that fits in a context window gains nothing from RAG latency. Loading directly is simpler and faster.
  • Infrastructure cost — aggregation pipelines, embedding refresh, and vector stores add operational surface. For teams without existing data infrastructure, maintenance can outweigh accuracy gain.
  • Layer staleness — when offline pipelines and live queries diverge (e.g., an un-propagated schema change), the agent acts on contradictory context.
  • Priority rule complexity — as layers multiply, explicit priority rules get harder to maintain. An undocumented exception silently produces wrong answers that are difficult to trace.

A two-layer approach (schema + live queries) suffices for many agents. Add layers only when each source closes a production error, not a theoretical gap.

Key Takeaways

  • Schema or file structure alone cannot ground an agent in the meaning of a dataset or codebase.
  • Six context layers — usage/lineage, annotations, code enrichment, institutional knowledge, persistent memory, live queries — provide coverage no single source can match.
  • Use an offline aggregation pipeline and runtime RAG to keep latency predictable across large corpora.
  • Define explicit priority when layers conflict; human annotations typically override inferred context.
Feedback