Tiered Memory Architecture¶
A two-tier memory store whose pipeline promotes episodic facts into a semantic tier on re-use — improving long-window retrieval only for long, recurring sessions.
The Architecture¶
A flat-file memory store grows monotonically: every episode lands in the same JSONL or vector index, and signal dilutes as the corpus expands. Tiered architectures separate raw episodes from generalised facts and run a promotion pipeline between them.
MEMTIER (Sidik & Rokach, 2026) defines five components:
- Episodic JSONL store — observations, tool calls, and outcomes appended as structured records
- Five-signal weighted retrieval — relevance, recency, outcome, frequency, and structural compatibility scored per query
- Attention-attributed cognitive weight loop — entry weights updated from how the model actually attended to retrieved entries
- Asynchronous consolidation daemon — promotes episodic entries into a semantic tier when re-use crosses a threshold
- PPO-based retrieval policy — adapts the per-tier weights from feedback rather than hand-tuned constants
graph LR
A[Agent turn] --> B[Episodic JSONL<br>append-only]
B -.->|consolidation<br>daemon| C[Semantic tier<br>generalised facts]
A --> D[Weighted retrieval]
B --> D
C --> D
D --> E[PPO policy<br>tier weights]
E -.->|update| D
style C fill:#2d5a2d,stroke:#4a4a4a,color:#e0e0e0
style E fill:#2d4a5a,stroke:#4a4a4a,color:#e0e0e0
The semantic tier is not pre-populated. It accumulates only entries the daemon observes retrieved across multiple unrelated triggers — the test for whether a fact has generalised beyond its original episode.
What Promotion Buys You¶
Flat-store retrieval re-ranks every entry on every call, so a high-value generalised fact competes against thousands of low-value raw observations. Promoting it into a smaller, separately-scored tier evaluates that fact against fewer competitors and lets the tiers carry different weights (MEMTIER §3). Broader memory surveys treat consolidation between memory forms as a generalisation step, not a storage optimisation (Memory in the Age of AI Agents, arxiv:2512.13564).
Reported numbers are conditional. MEMTIER claims 38.2% accuracy on LongMemEval-S with Qwen2.5-7B — +33 percentage points over a full-context baseline at 5% on a 6GB consumer GPU (Sidik & Rokach, 2026). The baseline is weak; a tuned RAG-over-JSONL system operates well above 5%, so the margin over a non-tiered RAG store is smaller than the headline suggests. The paper is preprint-only and unreplicated. Treat the architecture as defensible, not the absolute numbers as load-bearing.
When Tiering Pays Off¶
The overhead — a consolidation daemon, attention-attribution loop, and PPO policy network — is amortised across long windows and recurring tasks. It is not free.
Tiering pays off when:
- Operation windows exceed a day or two. The 14-percentage-point degradation over 72 hours that motivates the design (MEMTIER abstract) is the regime where consolidation fires often enough to matter.
- Task structure is recurring. The PPO policy learns from outcome feedback; without recurring task signatures it never converges and tier-aware retrieval underperforms a static recency-weighted baseline.
- Retrieval is dilution-bound, not relevance-bound. Below a few thousand entries the embedding model dominates; tier separation contributes little.
- Cross-tenant isolation is required. A separate semantic tier with controlled promotion is the natural place for provenance and pruning policies once stored episodes become an attack surface (Memory Poisoning and Secure Multi-Agent Systems, arxiv:2603.20357).
When a Flat Store Is the Right Answer¶
Tiering adds two new failure surfaces — incorrect promotion (an episodic fact generalised into a wrong semantic rule) and policy drift (the PPO retrieval policy learning to over- or under-fetch from the wrong tier). Both compound with agent lifetime, the regime where memory is supposed to help (arxiv:2512.13564).
Skip tiering when:
- Sessions are short (sub-day) — promotion never fires often enough to amortise the daemon.
- Latency dominates accuracy — per-turn cost from consolidation, attention attribution, and a PPO policy inflates inner-loop time.
- Single-developer, single-tenant — tier isolation costs are not justified.
- A simpler design already meets the bar. A flat JSONL store with an embedding index (a flat RAG store), a recency multiplier, and periodic LLM-summarised compaction captures most of the value. Site patterns — episodic memory retrieval, memory synthesis from execution logs, Memory Retrieval as a Control Decision — cover the same ground at lower operational complexity.
Risks Specific to Tier Promotion¶
- Wrong-direction generalisation — frequency does not distinguish "valid across contexts" from "the same incident kept recurring in one context." Stack- or environment-specific entries get promoted then misapplied.
- Stale semantic facts persist longer — promoted entries weighted higher decay slower. Facts invalidated by a refactor outlive their originating episodes — the staleness mode in agent memory patterns, amplified by tier weighting.
- Policy drift on heterogeneous workloads — a PPO policy trained on one distribution silently retrieves from the wrong tier when the workload shifts.
Mitigate by gating promotion on a confidence signal, reviewer pass, or semantic-tier expiry — not on frequency alone.
Key Takeaways¶
- A two-tier store with consolidation is one design point on the agent-memory spectrum, not a default — it pays off for long operation windows and recurring task structure
- Promotion should be conditional on observed cross-context re-use, not raw frequency, to avoid generalising single-context facts
- Reported accuracy gains are against a weak full-context baseline; the margin over a well-tuned flat RAG store is smaller and unreplicated
- Tiering adds incorrect-promotion and policy-drift failure modes that scale with agent lifetime — audit the promotion step explicitly
Related¶
- Episodic Memory Retrieval — episode-keyed recall without explicit tier promotion
- Agent Memory Patterns: Learning Across Conversations — scope-based memory architecture for cross-session learning
- Memory Retrieval as a Control Decision — controller deciding whether to inject retrieved memory
- Memory Synthesis from Execution Logs — extracting causal lessons from execution traces into persistent knowledge
- Subtask-Level Memory for Software Engineering Agents — granularity choice in memory retrieval
- Memory Retrieval as a Control Decision — utility-score updates for stored memories from outcome feedback
- Generative Agents Memory Stream — three-layer architecture for long-running agents with high observation density
- Component-Isolated Memory Stress Testing — stress-tests the summarisation, storage, and retrieval stages of this pipeline so a regression attributes to one tier