Skill Supply-Chain Poisoning¶
Malicious skills injected into public registries exploit agent in-context learning to execute payloads hidden inside documentation examples — bypassing alignment that blocks explicit instruction injection.
The Mechanism¶
Coding agents extend behavior by retrieving skills at runtime. A skill is a Markdown document (a SKILL.md or equivalent) encoding workflows, API patterns, and conventions; agents treat it as authoritative reference when synthesising code.
Document-Driven Implicit Payload Execution (DDIPE) weaponizes that trust. Instead of explicit commands like "exfiltrate credentials" (which alignment blocks at 0% under strong defenses), DDIPE embeds malicious logic inside code examples and configuration templates in otherwise legitimate-looking skills. The agent reproduces those examples during normal task execution — the payload runs without an explicit instruction.
Research across four agent frameworks (Claude Code, OpenHands, Codex, Gemini CLI) and five models found DDIPE achieves 11.6%–33.5% bypass rates where explicit injection achieves 0%. Of 1,070 adversarial skills across 15 MITRE ATT&CK categories, 2.5% evaded both static detection and model alignment (arxiv 2604.03081).
Why Skills Are a Distinct Attack Surface¶
Skill supply-chain poisoning differs structurally from MCP tool signing attacks:
| MCP Tool Poisoning | Skill DDIPE | |
|---|---|---|
| Vector | Tool description / return value | Code examples in documentation |
| Trigger | Tool invocation | In-context pattern reproduction |
| Blocked by | Framework guardrails | Bypasses alignment (treats payloads as code, not instructions) |
| Detection | Tool schema inspection | Requires semantic code analysis |
SKILL.md files are dual-purpose: semantic documentation for the agent and installation instructions for the human operator. Threat actors weaponize "Prerequisites" sections to coax operators into installing malicious components, stacking a social-engineering layer onto the model-level attack (Snyk ToxicSkills study).
Real-World Scale¶
Snyk's February 2026 audit of 3,984 skills across ClawHub and skills.sh found:
- 36.82% (1,467 skills) contain at least one security flaw
- 13.4% (534 skills) contain critical issues including malware distribution and credential theft
- 100% of confirmed malicious skills combined code payloads with prompt injection — attacking both the code execution layer and the natural language instruction layer simultaneously
The ClawHavoc campaign compromised 1,184+ skills across the ClawHub registry, with five of the top seven most-downloaded skills at peak infection confirmed as malware delivering Atomic Stealer (AMOS) to macOS users (Snyk ToxicSkills study).
Responsible disclosure from the DDIPE research produced 4 confirmed CVEs and 2 deployed fixes across production frameworks (arxiv 2604.03081).
Defense Stack¶
Defense requires multiple independent layers. No single control is sufficient.
graph TD
A[Public Skill Registry] -->|Blocked| B[Internal Mirror]
B --> C[Intake Gate]
C --> D{Scan Pass?}
D -->|No| E[Quarantine]
D -->|Yes| F[Pin + Attest]
F --> G[Runtime Agent]
G --> H{Multi-Model Verify}
H -->|Consensus| I[Execute]
H -->|Divergence| J[Block + Alert]
1. Never Pull Directly from Public Registries¶
Block runtime fetches from open marketplaces. Agents load only from an internal mirror of vetted skills — public registries are never an execution-time dependency.
2. Intake Gates Before Registration¶
Before a skill enters the mirror, run layered intake checks:
- Static analysis — pattern matching for shell escapes, credential access, exfiltration calls
- Semantic scanning — LLM analysis of code examples for disguised payloads; Cisco AI Defense skill-scanner combines static, behavioral dataflow, and LLM checks (
--use-behavioral --use-llm) - Sandbox execution — render the skill in a sandbox and observe tool calls
- Human review — for skills with broad internal use
3. Version Pinning and Content Hashing¶
Lock skills to specific, audited versions with immutable content hashes — never version ranges. Skills can mutate from safe to malicious after vetting ("rug pull"). Pin at the hash, not the tag.
SchemaPin provides ECDSA P-256 signatures with DNS-anchored trust and TOFU key pinning for skill schemas — the open-source equivalent of package signing.
4. Multi-Model Verification at Runtime¶
Only 1.6% of adversarial payloads bypass all tested models simultaneously, and removing architectural guardrails amplifies one model's execution rate by 11.3× while leaving another unchanged (arxiv 2604.03081). Run skill execution through two independent models and require consensus on tool call patterns; defense layers interact asymmetrically, so test the specific combination before relying on it.
5. Least-Privilege Execution¶
Run skill-loading agents with a dedicated user, scoped filesystem access, and deny-by-default network egress with an allowlisted domain set. See Blast Radius Containment and Dual-Boundary Sandboxing.
Example¶
A skill intake gate using skill-scanner and hash pinning before internal registry entry:
Block direct pull from public registry (agent config):
# .claude/settings.json — deny runtime skill fetch from external registries
permissions:
deny:
- Bash(curl:https://clawhub.io/*)
- Bash(curl:https://skills.sh/*)
Intake gate before adding to internal mirror:
# 1. Scan the candidate skill (all engines, fail on high/critical)
skill-scanner scan ./candidate-skill/ \
--use-behavioral --use-llm --enable-meta \
--fail-on-severity high --format json > scan-report.json
# 2. Non-zero exit from skill-scanner signals failure; log and stop
[ $? -ne 0 ] && { echo "BLOCKED: skill-scanner found high/critical issues"; exit 1; }
# 3. Pin by content hash before registering to internal mirror
sha256sum candidate-skill/SKILL.md > candidate-skill/SKILL.md.sha256
# 4. Verify hash at agent load time — catches post-approval mutations
sha256sum --check candidate-skill/SKILL.md.sha256 || {
echo "Skill content mismatch — rug pull detected"; exit 1
}
The agent config blocks runtime pulls from public registries. skill-scanner catches malicious patterns before any skill reaches the mirror, and the hash pin detects post-approval mutations ("rug pulls").
When This Backfires¶
The full stack carries operational cost, and partial adoption is common — but leaves residual exposure:
- Scanner false positives: LLM-based semantic scanners misclassify legitimate security tooling, pen-test utilities, and obfuscated-but-valid config as malicious. Fail-on-high without review capacity blocks productive skills; lowering the threshold loses detection of real payloads.
- Pinning vs. patch velocity: Hash pinning prevents rug-pull mutations but also blocks legitimate security patches. Without a re-vetting workflow, pinning creates a patching backlog.
- Multi-model latency: Consensus roughly doubles per-invocation inference time. Latency-sensitive workflows disable the check to meet SLAs. Restrict it to first-use or high-privilege calls rather than every invocation.
- Mirror governance drift: Without a clear owner, the internal mirror becomes a rubber stamp and skills bypass the intake gate informally.
The full stack is most justified when agents load third-party skills at runtime with broad filesystem or network access. For internal skill sets authored by the same team, hash pinning plus code review may suffice.
Key Takeaways¶
- DDIPE hides payloads inside skill documentation code examples; in-context learning causes agents to reproduce them without explicit instruction, bypassing alignment that blocks direct injection at 0% under the same conditions
- 36.82% of publicly available skills have security flaws; 2.5% of adversarial skills evade both static detection and model alignment
- Treat skill registries with the same supply-chain rigor as npm or PyPI — vetting, pinning, and continuous monitoring, not one-time review
- Multi-model verification reduces adversarial bypass to 1.6% of payloads — no single model's alignment should be the sole runtime defense
- Internal mirrors with intake gates are the minimum viable posture: never allow agents to pull from public registries at execution time