Skip to content

Decentralized Memory for Self-Evolving Multi-Agent Systems

Per-agent private memory replaces a shared central store so each agent specialises on its own task distribution — useful only when the agent count is large enough to make central-store contention real, workloads are heterogeneous enough that specialisation has signal, deployments run long enough to amortise per-agent machinery, and writers are trusted.

Decentralized memory in a multi-agent system gives each agent its own persistent local store rather than a shared central repository — improvement becomes a federated process where each agent accumulates role-specific expertise without coordinating writes. The trade is write contention and central-store staleness in exchange for divergence between agents and loss of the shared-signal benefit a central store provides.

When This Pattern Applies

The architecture is qualified — verify all four preconditions before adopting:

  1. Large enough agent population — at single-digit agent counts, central-store contention is not a real cost; the dual-pool machinery is pure overhead.
  2. Heterogeneous-enough workloads — per-agent specialisation assumes each agent sees a consistent task distribution; under uniform workloads, agents redundantly relearn the same lessons.
  3. Long-enough deployments — the regret bound is asymptotic in T (Hao, Long, Zhao 2026, §3); short deployments never amortise the bandit machinery.
  4. Trusted writers — N independent stores multiply the memory-poisoning surface (Memory Poisoning in MAS, arxiv 2603.20357).

If any precondition fails, prefer a shared store or a single-agent design — see agent memory patterns or tiered memory architecture.

Architecture

Each agent maintains a dual-pool memory that the agent updates without coordination with peers (Hao, Long, Zhao 2026):

  • Exploitation pool — consolidated past trajectories for solutions the agent has verified
  • Exploration pool — LLM-generated candidates for novel contexts the exploitation pool does not cover
  • Stage-wise reweighting — an LLM-as-judge scores recent stages and adjusts the relative weight of each pool from feedback
graph LR
    A[Agent turn] --> B{Pool selector}
    B -->|exploit| C[Exploitation pool<br>verified trajectories]
    B -->|explore| D[Exploration pool<br>LLM candidates]
    C --> E[Action]
    D --> E
    E --> F[LLM-as-judge<br>stage-wise feedback]
    F -.->|reweight| B

    style C fill:#2d5a2d,stroke:#4a4a4a,color:#e0e0e0
    style D fill:#2d4a5a,stroke:#4a4a4a,color:#e0e0e0

Other agents in the system run the same loop against their own pools. Writes never cross agents.

Why It Works

Decentralized memory works because it separates write contention from retrieval competition and lets each agent's exploitation pool anchor on its own task distribution rather than diluting against unrelated peers' episodes — the dilution argument that motivates tiered memory architectures at the single-agent level. The exploration pool adds a stochastic-bandit term bounded at O(log T) cumulative regret, giving each agent a controlled rate of trying novel candidates against accumulated solutions (Hao, Long, Zhao 2026, §3). Independent results from G-Memory and Trainable Graph Memory reach comparable improvements via explicit relational structure rather than per-agent isolation — evidence that the operative variable is separating retrieval competition from write contention, not isolation per se. Tiering and graph-structuring are alternative levers on the same trade.

Reported Numbers

DecentMem reports up to +23.8% accuracy over the strongest centralized-memory baseline, +52.5% over no-memory systems, and 49% token reduction across AutoGen, DyLAN, and AgentNet on Qwen3 (4B/8B/14B) and Gemma4 (E2B/E4B) backbones across five math, code, QA, and embodied benchmarks (Hao, Long, Zhao 2026). These are preprint numbers, unreplicated — treat the architecture as defensible, not the numbers as load-bearing.

What Coordination Actually Remains

The system is more accurately described as locally-decentralized, globally-coordinated. The published design retains a task router, a shared LLM backbone (de facto alignment through identical weights), the LLM-as-judge that reweights pools (a shared evaluator with cross-agent influence), and shared benchmark definitions. Central-store contention is only one of several centralised dependencies — account for the rest when sizing the gain.

When This Backfires

Beyond the precondition failures above, two additional failure modes are worth naming:

  • Tasks requiring global coherence — when agents must produce mutually consistent artifacts (shared schemas, joined outputs), per-agent divergent memory produces locally-correct but globally-inconsistent decisions, the canonical decentralised-topology failure mode (Multi-Agent Topology Taxonomy).
  • Faithfulness gaps — agents with private memory frequently regress, acknowledge mistakes then repeat them, and apply learned strategies inconsistently (arxiv 2601.22436). Private memory alone does not produce reliable self-improvement.

A poisoned LLM-as-judge is a particular concern even with the "trusted writers" precondition held — the judge is shared across the supposedly-independent agents and propagates incorrect reweighting to every agent simultaneously.

Key Takeaways

  • Decentralized memory is one design point on the multi-agent memory spectrum, not a default — preconditions on agent count, workload heterogeneity, deployment horizon, and writer trust must hold
  • Per-agent dual-pool architecture (exploitation + exploration) with LLM-as-judge reweighting eliminates central-store contention but loses shared-signal benefits
  • Reported gains (+23.8% over centralized, +52.5% over no-memory) come from a paper that retains a router, shared backbone, and shared judge — call the system locally-decentralized, not fully decentralized
  • The operative mechanism — separating write contention from retrieval competition — is also achieved by tiered architectures and graph-structured memory at lower architectural cost in many regimes
Feedback