Skip to content

Context-Usage Attribution: Per-Source Breakdown of Agent Context

Break the context window into rules, skills, MCP returns, subagent transcripts, and conversation — so operators prune the source actually responsible, not the wrong one.

Two Cuts of the Same Telemetry

A single "78% of the context window" indicator names the symptom, not the cause. Two attribution cuts close it:

  • Per-tool attribution — which tool calls dumped the most tokens. Claude Code's /context command is the developer-facing example (Claude Code changelog).
  • Per-source attribution — which configuration source (rules, skills, MCP servers, subagents, conversation) is consuming the budget, regardless of which call put it there.

Cursor shipped per-source attribution on 2026-05-06: "You can now see a breakdown of your agent's context usage" (Cursor changelog). The categories — rules, skills, MCPs, subagents — match the units an operator can act on: unload a skill, disable an MCP server, prune a rule file, kill a subagent.

graph LR
    C[Context window<br/>78% full] --> T[Per-tool cut]
    C --> S[Per-source cut]
    T -->|grep dumped 8k| TR[Truncate / filter the call]
    S -->|skills 22%, MCP 31%| SR[Unload skill / disable server]

Categories the Breakdown Should Expose

Each category maps to a distinct remediation surface. A breakdown that collapses two of them — "static prompt: 36%" — leaves the operator unable to choose between unloading a skill and pruning a rule.

Category Why it's separate Remediation
Rules / instruction files Loaded at session start, persistent Prune CLAUDE.md / AGENTS.md against the rule budget
Skill definitions Descriptions always-on; full body loads on use Mark low-value skills name-only or off via skill overrides
MCP tool returns Grow with each call; cumulative Drop server, narrow tool selection, audit tool-output token cost
Subagent transcripts Forwarded back to parent on completion Tighten subagent output schema, summarise instead of forward
Tool outputs (non-MCP) File reads, grep, build logs Truncate at the call site; apply observation masking
Conversation history Compounds with turns Compact, or split into a fresh session
Cache prefix Read-only; cheap but counts against window Stable across turns — flag only when prefix bloats

OTel Path: The Same Cut, Exported

Claude Code's OTel exporter ships the attributes that make per-source attribution computable from telemetry rather than UI inspection. The claude_code.token.usage metric carries (Claude Code monitoring reference):

  • type"input", "output", "cacheRead", "cacheCreation"
  • query_source"main", "subagent", "auxiliary", or compaction/auxiliary thread names
  • model, effort, request-id correlation

Grouping by query_source produces the subagent-vs-main split; grouping by type separates active-input from cached-prefix tokens. The UI breakdown and the OTel export consume the same counts — Cursor's panel is the always-on surface, an OTel collector the post-hoc audit path. See agent observability via OTel for export wiring.

Action Signals

A breakdown without thresholds is just a chart. Useful signals:

  • MCP returns > 30% with rising trend — at least one server's outputs are unbounded. Drill into per-server tool-output token cost to find the offender.
  • Skills > 20% on a session that didn't invoke them — descriptions are too verbose; move low-priority skills to name-only.
  • Subagent transcripts > 15% — handoff schemas are missing; agents forward raw transcripts.
  • Cache prefix > 50% with active < 30% — the harness pays full attention cost on cached tokens. Confirm cache hit rate via OTel cacheRead.

When the Cut Is Wrong

Per-source attribution is the right axis when configuration sources are non-trivial. It misleads when:

  • Tool calls dominate the session. A long agentic run buckets file reads and grep output into one giant "tools"/"MCP" slice that points at no specific call. Switch to per-tool attribution (/context).
  • Single-shot deterministic prompts. No compounding, no point in attribution.
  • Tightly-pruned harnesses. When rules, skills, and MCPs are already minimal, the breakdown reports rounding noise.
  • The harness can't act on the cut. Without per-skill or per-MCP unload commands, knowing skills consume 22% offers no remediation path beyond restarting the session.
  • The headline counts only a subset of token types. A breakdown is trustworthy only when it sums input, output, and cache tokens; counting input alone undercounts the budget — Claude Code's percentage showed ~20% while the session was at its limit (claude-code#28167, #17959). Confirm the denominator covers every type before trusting a slice.

The two cuts are complementary — exposing both lets operators pick the axis matching the suspected cause. The Infinite Context anti-pattern is the failure both work against; per-source attribution is the cheaper always-on signal, pointing at slow-growing static sources before an emergency compaction.

Example

A session is at 82% full after twelve turns. Without attribution, the operator's options are: compact, restart, or guess.

The per-source breakdown shows: rules 8%, skills 28%, MCP returns 34%, subagent transcripts 6%, conversation 6%. The skills slice is the surprise — the session never explicitly invoked a skill. Listing loaded skills shows fourteen descriptions in the always-on context, each averaging 1,200 characters. The operator marks ten of them "name-only" in skillOverrides (Claude Code skills reference). The next session at the same point loads at 64%.

The breakdown made the difference between "prune skills" (right answer) and "compact the conversation" (the default reflex when only a single percentage is visible).

Key Takeaways

  • A single "X% full" indicator names the symptom, not the cause. Attribution cuts the same number into a remediation surface.
  • Per-source and per-tool attribution are complementary — different cuts of the same telemetry, each pointing at a different class of remediation.
  • The categories must match the remediation primitives: rules, skills, MCPs, subagents, tool outputs, conversation, cache prefix. Collapsing two of them defeats the cut.
  • Claude Code's OTel exporter already carries the attributes (type, query_source) needed to compute the breakdown from telemetry; the UI surface is one consumer, an OTel collector is another.
  • When configuration sources are minimal and tool calls dominate, per-tool attribution is the more actionable cut — pick the axis matching the suspected cause.
Feedback