Cost-Aware Skill Rewriting: Preserve Operational Anchors, Not Skill Tokens¶
Rewriting a skill is an economic trade-off — stripping sparse operational anchors makes the agent explore and retry, raising total cost despite a shorter document.
When This Applies¶
Apply the anchor-preservation framing only when both conditions hold:
- The skill contains sparse operational anchors: single-line items the base model cannot reconstruct from training. An API constructor, a CLI flag, a validation threshold, a file-format convention, a formula, or a recovery rule (Xing et al., 2026).
- The agent can recover when those anchors are missing — a validator, a retry, or an exploration loop runs when the first attempt fails.
For pure-prose skills (style guides, taxonomies) or fire-and-forget workflows, generic Prompt Compression is the right strategy.
The Three Anchor Classes¶
Different skills hold different operational anchors. The Xing et al. paper identifies three rewriting strategies, each preserving a different class, and finds "no universally dominant template" — the right strategy depends on the task family (arxiv:2606.09421 §3):
| Anchor class | What it preserves | Wins on |
|---|---|---|
| API/code anchoring | Imports, API calls, object construction, commands, code snippets | Implementation-heavy tasks where one wrong constructor triggers a debugging cycle |
| Workflow guarding | Ordered steps, validation checks, constraints, named pitfalls | Procedurally-drifting tasks where step reordering produces silent failures |
| Rule/formula anchoring | Definitions, formulas, thresholds, schemas, conventions | Scientific and rule-governed tasks where one mis-stated threshold invalidates the output |
A skill profile — token count, code ratio, validator markers, API call frequency — determines which strategy fits. A learned task-conditioned policy that picks the strategy from these structural features reduces total agent cost by 7.0% and downstream agent-token cost by 6.0% on a 20-task held-out evaluation; in cross-model transfer across 86 tasks, reductions average 14.7% and 13.7% (arxiv:2606.09421 §4).
Why It Works¶
When an anchor is present, the agent commits the correct decision in one inference step. When absent, it must enumerate candidates, run them, observe failure, and retry. Each exploration step costs a full LLM inference, and output tokens dominate cost — ~3× more expensive per token than input (Compression Method Matters, arxiv:2603.23527). The skill-token saved by deletion is a few hundred input tokens, often cached on KV-cache hits; the per-exploration overhead is thousands of output tokens, never cached. The mechanism is this asymmetry between cached input savings and uncached output overhead — the same effect a randomized production trial of generic prompt compression documented, where aggressive compression (r≈0.2) increased total cost by 1.8% via output-token explosion despite reducing input by ~80%.
The Diagnostic¶
Before rewriting a skill, walk the body line-by-line and answer for each candidate cut:
- Is this an operational anchor? A single-line piece of operational knowledge the base model cannot guess — a specific flag, threshold, constructor argument, file path, or recovery rule.
- Would the agent fail or retry without it? If yes, the anchor pays for itself in saved exploration cost. Keep.
- Is this surrounding explanation that contextualises the anchor? Cut. Keep the anchor; drop the framing.
- Is this an example that implicitly defines the anchor? Cut only if the anchor is named explicitly elsewhere. SkillReducer found that moving examples that implicitly defined expected behaviour to reference modules was the dominant compression failure mode (arxiv:2603.29919).
This is the inverse of generic prompt compression's compression test ("can I remove a word without losing meaning?"). The cost-aware test is: can I remove this without forcing the agent to discover it by trial?
Example¶
A verbose skill section before rewriting — implementation-heavy, three real anchors buried in prose:
## Deploying to staging
When you want to deploy to the staging environment, you should use the
`stage-deploy` CLI. It is important that you pass the `--canary 10` flag,
which limits the initial rollout to 10% of staging traffic and prevents
a full-fleet bad-deploy. The deploy script will print a deploy ID once
the canary is healthy; you must capture this ID because the rollback
command requires it. If the canary fails health checks within five
minutes, you should not retry — the failure is logged in
/var/log/stage-deploy.log and the correct response is to investigate
the log entry, not re-run the deploy.
API/code anchoring rewrite — drops the explanation, keeps the three anchors (the flag, the deploy-ID capture rule, the no-retry recovery rule):
## Deploying to staging
- Deploy: `stage-deploy --canary 10`
- Capture the printed deploy ID — required for rollback
- On canary failure within 5 minutes: read `/var/log/stage-deploy.log`. Do not retry.
Same operational content, ~70% fewer tokens, and the three anchors that prevent exploration cycles are intact. A naive compression that also dropped "do not retry" would cut input tokens slightly more but risk a multi-thousand-token retry loop — net cost up, not down.
When This Backfires¶
The anchor-preservation framing has real costs that the 7.0% headline win must justify:
- Pure-prose skills with no sparse anchors — style guides, taxonomies, convention lists. Nothing operational to preserve; the profiler-rewriter-evaluator overhead pays for nothing. Use Prompt Compression directly.
- Selection-bottlenecked libraries — when descriptions get truncated to fit a character budget and the agent picks the wrong skill, in-skill anchors are unreachable. Fix description craft first (Anthropic best practices).
- Rapidly-changing APIs — preserved anchors that go stale within weeks actively misdirect the agent. A stale anchor is worse than an absent one; pair preservation with an explicit update cadence, or accept exploration cost. See Skill Library Technical Debt.
- Fire-and-forget workflows — without a validator, retry, or exploration loop, anchors aimed at preventing debugging cycles cost tokens for no payoff.
- The "less-is-more" baseline — SkillReducer reports 86% functional retention and 25.3% of skills actively improving under naive compression, attributed to reduced context-window distraction. For libraries of bloated human-written skills, the simpler "compress hard, keep code-fenced blocks intact" heuristic captures most of the value. Anchor-preservation wins only where exploration overhead exceeds compression-distraction overhead.
Key Takeaways¶
- A rewritten skill is cheaper only when it preserves the sparse operational anchors that prevent exploration, debugging, and retry — not because it has fewer tokens.
- Three anchor classes cover most skills: API/code anchoring (constructors, flags), workflow guarding (ordered steps, validators), rule/formula anchoring (thresholds, schemas). No template wins universally — match the strategy to the skill's structural profile.
- The mechanism is the asymmetry between cached input savings and uncached output overhead — output tokens cost ~3× input per token and never cache.
- The diagnostic test is "can I remove this without forcing the agent to discover it by trial?" — the inverse of generic compression's "without losing meaning?" test.
- For pure-prose skills, fire-and-forget workflows, or libraries bottlenecked at selection rather than execution, fall back to Prompt Compression.
Related¶
- Prompt Compression: Maximizing Signal Per Token — the lexical-density technique the cost-aware approach explicitly contrasts with
- Skill Authoring Patterns: Description to Deployment — sibling guidance on description craft, Gotchas, and skill composition
- Skill as Knowledge Pattern — what a skill should contain; this page covers what to remove
- Cost-Aware Tracing for Skill Distillation — the trace-side instrumentation that makes anchor-preservation decisions measurable
- Skill Library Technical Debt — library-level debt patterns including stale anchors and missing validators