Skip to content

Manual Compaction as Dumb Zone Mitigation

Auto-compaction fires at ~95% context fill — long after reasoning quality has degraded. Manual compaction reframes context management from memory cleanup to reasoning quality preservation.

The Gap

Claude Code's auto-compaction triggers at approximately 95% of the context window. Benchmark research shows LLMs effectively leverage only 10-20% of a long context window for multi-step reasoning tasks, and code bug fixing collapses from 29% accuracy at 32K to 3% at 256K per LongCodeBench. By the time auto-compaction fires, the agent has been in the dumb zone for most of the session.

graph LR
    A["0%"] --> B["10-20%<br/>Reasoning degrades"]
    B --> C["50%<br/>Community threshold"]
    C --> D["85%<br/>LangChain trigger"]
    D --> E["95%<br/>Auto-compaction"]
    style B fill:#e74c3c,color:#fff
    style C fill:#f39c12,color:#fff
    style D fill:#3498db,color:#fff
    style E fill:#95a5a6,color:#fff

The gap between degradation onset and auto-compaction is where quality silently erodes.

When to Compact Manually

Use /compact at these transition points:

Trigger Example
Before reasoning-intensive work Architectural decisions, multi-step debugging
After large file reads no longer needed Read a 500-line file, extracted the three relevant functions
At task-type transitions Finished searching codebase, now planning refactor
When you notice quality degradation Agent starts repeating itself, missing obvious patterns
After completing a subtask Finished implementing feature A, moving to feature B

When Not to Compact

Compaction is lossy. Anthropic acknowledges that "overly aggressive compaction can result in the loss of subtle but critical context whose importance only becomes apparent later."

Avoid compacting when:

  • The agent is mid-reasoning and needs accumulated context to complete a chain of thought
  • Reference material (schemas, specs, API contracts) will be needed repeatedly
  • You are iterating on a single file where the full edit history informs the next change

In these cases, prefer /clear between unrelated tasks or use observation masking for selective cleanup.

Directing Compaction

Claude Code supports focused compaction:

Inline focus:

/compact Focus on the API changes and the test failures

Persistent focus via CLAUDE.md:

# Compact instructions

When compacting, always preserve:
- Current task objective and acceptance criteria
- File paths modified in this session
- Unresolved test failures and error messages
- Architectural decisions and their rationale

Custom compaction instructions are a first-class feature.

Why It Works

Transformer attention is computed across all tokens in the context window, creating n² pairwise relationships for n tokens. As the context grows, this attention budget spreads thin — the model's capacity to attend to any specific piece of information decreases while irrelevant tokens compete for the same fixed attention capacity. Compaction works by replacing the accumulated token mass with a dense summary, giving the model a focused context where relevant information receives proportionally more attention. Compacting early, before the window is saturated, avoids the window ever reaching the state where useful signal is crowded out by accumulated noise.

Lowering the Auto-Compaction Threshold

The CLAUDE_AUTOCOMPACT_PCT_OVERRIDE environment variable accepts values 1-100 and overrides the default trigger point:

# Set auto-compaction to 60% for a reasoning-heavy session
CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=60 claude
Session type Suggested threshold Rationale
Reasoning-heavy (architecture, debugging) 50-60% Preserves quality before significant degradation
Mixed retrieval and reasoning 70-80% Balances context availability with quality
Retrieval-heavy (search, lookup) 95% (default) Retrieval tolerates larger context loads

Monitoring Context Usage

Claude Code exposes context_window.used_percentage as a status line field:

{
  "statusline": "context: {context_window.used_percentage}%"
}

Partial Summarization

Claude Code supports partial summarization via the message selector ("Summarize from here"). This preserves recent context at full fidelity while compressing older turns — useful when exploration history can be discarded but recent implementation work cannot.

How Other Systems Handle This

System Trigger Approach
Claude Code (default) 95% Single binary compaction
Claude Code (override) Configurable 1-100% Same mechanism, earlier trigger
LangChain Deep Agents 85% Compression + 20K-token tool offloading
OPENDEV (ACC) 70/80/85/90/99% Five graduated stages
Manus N/A File system as external memory; avoids aggressive compaction entirely

Example

A developer is debugging a failing integration test in Claude Code. The session so far: reading 4 test files, grepping through 12 source modules, and reviewing CI logs. Context is at ~55%.

> /compact Focus on the three failing test assertions in test_payment_flow.py
>   and the PaymentService.process() method. Discard CI log output and
>   unrelated source files.

After compaction, context drops to ~15%. The developer then asks Claude to reason about the root cause — with a clean context window, the agent identifies a race condition it had previously overlooked.

For the next session, the developer sets an earlier auto-compaction trigger:

CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=55 claude

Key Takeaways

  • Manual compaction preserves reasoning quality — auto-compaction at 95% fires long after degradation begins.
  • Compact at task-type transitions, after bulk reads, or when output quality declines.
  • Use a focus directive or CLAUDE.md to control what survives summarization.
  • Set CLAUDE_AUTOCOMPACT_PCT_OVERRIDE to 50-70% for reasoning-heavy sessions.
Feedback