Attention Latch: When Agents Stay Anchored to Stale Instructions¶

Cumulative historical context in decoder-only Transformers can over-squash mid-task updates, leaving multi-turn agents anchored to obsolete constraints despite explicit contradictory instructions.

The Failure Mode¶

An agent receives an instruction that contradicts an earlier one mid-session — and keeps acting on the earlier one. Shehata and Li (2026) name this the Attention Latch: cumulative probabilistic weight of historical context overrides mid-task updates, anchoring the agent to obsolete constraints despite explicit contradictory instructions (Shehata & Li, 2026).

The latch is the behavioural face of Information Over-squashing in decoder-only autoregressive Transformers (Barbero et al., 2024) — distinct input sequences collapse to near-identical final-token representations as history grows, so a late instruction cannot move the representation far enough to change behaviour.

Why Over-Squashing Causes the Latch¶

Decoder-only attention is causal: information from earlier tokens contributes additively to the final-token representation through every subsequent layer. The unidirectional flow converging at the final token loses sensitivity to specific tokens, exacerbated by low-precision floating-point formats (Barbero et al., 2024). The longer the history, the smaller the marginal influence any single new instruction exerts.

This compounds with the U-shaped attention curve: a contradictory instruction inserted mid-session lands in the low-attention middle zone, where positional bias and over-squashing combine to suppress it (Liu et al., 2023).

graph TD
    H[Long multi-turn history] -->|cumulative weight| F[Final-token representation]
    U[Mid-task update] -->|low marginal influence| F
    F --> B[Behaviour anchored to old context]
    H -->|positional bias| M[Update in low-attention middle]
    M --> B

How to Recognise It¶

Distinct from instruction-following failure on a fresh prompt. Diagnostic signals:

The agent acknowledged the new instruction earlier in the turn but then acted on the old one.
Resetting the conversation and reissuing the same instruction produces compliance.
Compliance returns when the contradicting prefix is removed.

If all three hold, the cause is structural over-squashing rather than ambiguous wording.

Where It Triggers¶

Shehata and Li (2026) located the Attention Stability Boundary empirically across 9K trajectories on MultiWOZ 2.2. On the hardest tier — a semantic-hijacked 3-hop multi-fact synthesis task — vanilla ReAct on GPT-5.4 collapsed to 0.1% success (Shehata & Li, 2026). The boundary is reached when:

Histories are long enough for cumulative weight to dominate.
Mid-task updates contradict, rather than extend, prior constraints.
Retrieval results inject content that semantically resembles the contradicted instruction.

Independent work confirms the broader pattern: 100K+ token sequences exhibit goal drift across model families, predominantly through inaction (Arike et al., 2025); models deprioritise initial instructions as history grows even when they remain in context (Bui, 2026 §3.2).

Mitigations on a Spectrum¶

Match the mitigation cost to how often the latch fires in your workload. Lightweight options first.

1. Recency anchoring (lightweight)¶

Push current objectives into the high-attention tail at every step. Goal recitation rewrites the objective and to-do list after each tool call; event-driven system reminders inject the contradicting instruction as a fresh user-role message at the relevant decision point. These do not eliminate over-squashing — they place the new instruction where attention is strongest.

2. History reset (medium)¶

Bound cumulative history before it dominates. The Ralph Wiggum Loop restarts each iteration from a fresh context, re-reading the specification from disk; post-compaction re-read protocols restore foundational instructions after summarisation. These attack the latch at its root.

3. Architect/Executive separation (heavy)¶

Run high-level planning in one context (the Architect) and turn-by-turn execution in a separate, scoped context (the Executive) per turn — Shehata and Li's SSRP framework (Shehata & Li, 2026). Structural variants already covered on this site:

Cognitive Reasoning vs Execution Separation — typed-tool-interface seam between layers.
Discrete Phase Separation — conversation-boundary version, with each phase in its own conversation.

Choose this tier when lighter mitigations have been measured and found insufficient — the split adds an extra LLM call per turn, schema-versioning churn, and orchestration overhead, and most workloads do not cross the boundary (Microsoft Azure Architecture Center).

The Grounding Paradox¶

Heavy mitigations can overshoot. Shehata and Li (2026) report a Procedural Integrity audit at 98.8% adherence revealing a Grounding Paradox: high-stability models fail by refusing to generate output under retrieval-reasoning contamination — the agent holds its ground so firmly it stops responding to legitimate updates (Shehata & Li, 2026). Verify the failure has been removed, not relocated.

Where the Latch Does Not Fire¶

Short single-objective tasks. Cumulative history stays small relative to the latest turn.
Append-only updates. Extensions of prior context do not require overcoming over-squashing.
Aggressive harness-level resets. Frequent compaction or Ralph Wiggum-style restarts keep histories below the boundary.
Single-turn flows. The boundary is a multi-turn phenomenon.

Key Takeaways¶

The Attention Latch is the behavioural face of decoder-only over-squashing — a structural property, not a prompt bug.
It triggers when long histories collide with contradicting mid-task updates, especially in the U-shaped middle zone.
Mitigate on a spectrum: recency anchoring first, history reset next, architectural split only when measured drift justifies the overhead.
Heavy mitigations introduce the Grounding Paradox — verify the failure is removed, not relocated.

Lost in the Middle: The U-Shaped Attention Curve — the positional-bias half of the same problem
Goal Recitation: Countering Drift in Long Sessions — recency-anchoring mitigation
Event-Driven System Reminders — harness-injected reminders at decision points
Post-Compaction Re-read Protocol — restoring foundational instructions after summarisation
Objective Drift: When Agents Lose the Thread — the post-compaction sibling failure mode
The Ralph Wiggum Loop — bounded-history restarts that keep cumulative weight low
Cognitive Reasoning vs Execution Separation — typed-interface variant of the architectural split
Discrete Phase Separation — conversation-boundary variant of the architectural split