Context-Usage Attribution: Per-Source Breakdown of Agent Context¶

Break the context window into rules, skills, MCP returns, subagent transcripts, and conversation — so operators prune the source actually responsible, not the wrong one.

Two Cuts of the Same Telemetry¶

A single "78% of the context window" indicator names the symptom, not the cause. Two attribution cuts close it:

Per-tool attribution — which tool calls dumped the most tokens. Claude Code's /context command is the developer-facing example (Claude Code changelog).
Per-source attribution — which configuration source (rules, skills, MCP servers, subagents, conversation) is consuming the budget, regardless of which call put it there.

Cursor shipped per-source attribution on 2026-05-06: "You can now see a breakdown of your agent's context usage" (Cursor changelog). The categories — rules, skills, MCPs, subagents — match the units an operator can act on: unload a skill, disable an MCP server, prune a rule file, kill a subagent.

graph LR
    C[Context window<br/>78% full] --> T[Per-tool cut]
    C --> S[Per-source cut]
    T -->|grep dumped 8k| TR[Truncate / filter the call]
    S -->|skills 22%, MCP 31%| SR[Unload skill / disable server]

Categories the Breakdown Should Expose¶

Each category maps to a distinct remediation surface. A breakdown that collapses two of them — "static prompt: 36%" — leaves the operator unable to choose between unloading a skill and pruning a rule.

Category	Why it's separate	Remediation
Rules / instruction files	Loaded at session start, persistent	Prune CLAUDE.md / AGENTS.md against the rule budget
Skill definitions	Descriptions always-on; full body loads on use	Mark low-value skills `name-only` or `off` via skill overrides
MCP tool returns	Grow with each call; cumulative	Drop server, narrow tool selection, audit tool-output token cost
Subagent transcripts	Forwarded back to parent on completion	Tighten subagent output schema, summarise instead of forward
Tool outputs (non-MCP)	File reads, grep, build logs	Truncate at the call site; apply observation masking
Conversation history	Compounds with turns	Compact, or split into a fresh session
Cache prefix	Read-only; cheap but counts against window	Stable across turns — flag only when prefix bloats

OTel Path: The Same Cut, Exported¶

Claude Code's OTel exporter ships the attributes that make per-source attribution computable from telemetry rather than UI inspection. The claude_code.token.usage metric carries (Claude Code monitoring reference):

type — "input", "output", "cacheRead", "cacheCreation"
query_source — "main", "subagent", "auxiliary", or compaction/auxiliary thread names
model, effort, request-id correlation

Grouping by query_source produces the subagent-vs-main split; grouping by type separates active-input from cached-prefix tokens. The UI breakdown and the OTel export consume the same counts — Cursor's panel is the always-on surface, an OTel collector the post-hoc audit path. See agent observability via OTel for export wiring.

Action Signals¶

A breakdown without thresholds is just a chart. Useful signals:

MCP returns > 30% with rising trend — at least one server's outputs are unbounded. Drill into per-server tool-output token cost to find the offender.
Skills > 20% on a session that didn't invoke them — descriptions are too verbose; move low-priority skills to name-only.
Subagent transcripts > 15% — handoff schemas are missing; agents forward raw transcripts.
Cache prefix > 50% with active < 30% — the harness pays full attention cost on cached tokens. Confirm cache hit rate via OTel cacheRead.

When the Cut Is Wrong¶

Per-source attribution is the right axis when configuration sources are non-trivial. It misleads when:

Tool calls dominate the session. A long agentic run buckets file reads and grep output into one giant "tools"/"MCP" slice that points at no specific call. Switch to per-tool attribution (/context).
Single-shot deterministic prompts. No compounding, no point in attribution.
Tightly-pruned harnesses. When rules, skills, and MCPs are already minimal, the breakdown reports rounding noise.
The harness can't act on the cut. Without per-skill or per-MCP unload commands, knowing skills consume 22% offers no remediation path beyond restarting the session.
The headline counts only a subset of token types. A breakdown is trustworthy only when it sums input, output, and cache tokens; counting input alone undercounts the budget — Claude Code's percentage showed ~20% while the session was at its limit (claude-code#28167, #17959). Confirm the denominator covers every type before trusting a slice.

The two cuts are complementary — exposing both lets operators pick the axis matching the suspected cause. The Infinite Context anti-pattern is the failure both work against; per-source attribution is the cheaper always-on signal, pointing at slow-growing static sources before an emergency compaction.

Example¶

A session is at 82% full after twelve turns. Without attribution, the operator's options are: compact, restart, or guess.

The per-source breakdown shows: rules 8%, skills 28%, MCP returns 34%, subagent transcripts 6%, conversation 6%. The skills slice is the surprise — the session never explicitly invoked a skill. Listing loaded skills shows fourteen descriptions in the always-on context, each averaging 1,200 characters. The operator marks ten of them "name-only" in skillOverrides (Claude Code skills reference). The next session at the same point loads at 64%.

The breakdown made the difference between "prune skills" (right answer) and "compact the conversation" (the default reflex when only a single percentage is visible).

Key Takeaways¶

A single "X% full" indicator names the symptom, not the cause. Attribution cuts the same number into a remediation surface.
Per-source and per-tool attribution are complementary — different cuts of the same telemetry, each pointing at a different class of remediation.
The categories must match the remediation primitives: rules, skills, MCPs, subagents, tool outputs, conversation, cache prefix. Collapsing two of them defeats the cut.
Claude Code's OTel exporter already carries the attributes (type, query_source) needed to compute the breakdown from telemetry; the UI surface is one consumer, an OTel collector is another.
When configuration sources are minimal and tool calls dominate, per-tool attribution is the more actionable cut — pick the axis matching the suspected cause.

Context-Window Diagnostic Tooling: Identifying Context-Heavy Tools — per-tool attribution, the complementary cut
Context Budget Allocation: Every Token Has a Cost — the budget the breakdown serves
The Infinite Context anti-pattern — the failure mode attribution prevents
Agent Observability: OTel, Cost Tracking, Trajectory Logs — the export path for attribution telemetry
Agent Debug Log Panel — the adjacent always-on surface for events rather than tokens
Per-Plugin Token-Cost Attribution via claude plugin details — the same attribution axis at plugin granularity