Skip to content

Skill Supply-Chain Poisoning

Malicious skills injected into public registries exploit agent in-context learning to execute payloads hidden inside documentation examples — bypassing alignment that blocks explicit instruction injection.

The Mechanism

Coding agents extend behavior by retrieving skills at runtime. A skill is a Markdown document (a SKILL.md or equivalent) encoding workflows, API patterns, and conventions; agents treat it as authoritative reference when synthesising code.

Document-Driven Implicit Payload Execution (DDIPE) weaponizes that trust. Instead of explicit commands like "exfiltrate credentials" (which alignment blocks at 0% under strong defenses), DDIPE embeds malicious logic inside code examples and configuration templates in otherwise legitimate-looking skills. The agent reproduces those examples during normal task execution — the payload runs without an explicit instruction.

Research across four agent frameworks (Claude Code, OpenHands, Codex, Gemini CLI) and five models found DDIPE achieves 11.6%–33.5% bypass rates where explicit injection achieves 0%. Of 1,070 adversarial skills across 15 MITRE ATT&CK categories, 2.5% evaded both static detection and model alignment (arxiv 2604.03081).

Why Skills Are a Distinct Attack Surface

Skill supply-chain poisoning differs structurally from MCP tool signing attacks:

MCP Tool Poisoning Skill DDIPE
Vector Tool description / return value Code examples in documentation
Trigger Tool invocation In-context pattern reproduction
Blocked by Framework guardrails Bypasses alignment (treats payloads as code, not instructions)
Detection Tool schema inspection Requires semantic code analysis

SKILL.md files are dual-purpose: semantic documentation for the agent and installation instructions for the human operator. Threat actors weaponize "Prerequisites" sections to coax operators into installing malicious components, stacking a social-engineering layer onto the model-level attack (Snyk ToxicSkills study).

Real-World Scale

Snyk's February 2026 audit of 3,984 skills across ClawHub and skills.sh found:

  • 36.82% (1,467 skills) contain at least one security flaw
  • 13.4% (534 skills) contain critical issues including malware distribution and credential theft
  • 100% of confirmed malicious skills combined code payloads with prompt injection — attacking both the code execution layer and the natural language instruction layer simultaneously

The ClawHavoc campaign compromised 1,184+ skills across the ClawHub registry, with five of the top seven most-downloaded skills at peak infection confirmed as malware delivering Atomic Stealer (AMOS) to macOS users (Snyk ToxicSkills study).

Responsible disclosure from the DDIPE research produced 4 confirmed CVEs and 2 deployed fixes across production frameworks (arxiv 2604.03081).

Defense Stack

Defense requires multiple independent layers. No single control is sufficient.

graph TD
    A[Public Skill Registry] -->|Blocked| B[Internal Mirror]
    B --> C[Intake Gate]
    C --> D{Scan Pass?}
    D -->|No| E[Quarantine]
    D -->|Yes| F[Pin + Attest]
    F --> G[Runtime Agent]
    G --> H{Multi-Model Verify}
    H -->|Consensus| I[Execute]
    H -->|Divergence| J[Block + Alert]

1. Never Pull Directly from Public Registries

Block runtime fetches from open marketplaces. Agents load only from an internal mirror of vetted skills — public registries are never an execution-time dependency.

2. Intake Gates Before Registration

Before a skill enters the mirror, run layered intake checks:

  • Static analysis — pattern matching for shell escapes, credential access, exfiltration calls
  • Semantic scanning — LLM analysis of code examples for disguised payloads; Cisco AI Defense skill-scanner combines static, behavioral dataflow, and LLM checks (--use-behavioral --use-llm)
  • Sandbox execution — render the skill in a sandbox and observe tool calls
  • Human review — for skills with broad internal use

3. Version Pinning and Content Hashing

Lock skills to specific, audited versions with immutable content hashes — never version ranges. Skills can mutate from safe to malicious after vetting ("rug pull"). Pin at the hash, not the tag.

SchemaPin provides ECDSA P-256 signatures with DNS-anchored trust and TOFU key pinning for skill schemas — the open-source equivalent of package signing.

4. Multi-Model Verification at Runtime

Only 1.6% of adversarial payloads bypass all tested models simultaneously, and removing architectural guardrails amplifies one model's execution rate by 11.3× while leaving another unchanged (arxiv 2604.03081). Run skill execution through two independent models and require consensus on tool call patterns; defense layers interact asymmetrically, so test the specific combination before relying on it.

5. Least-Privilege Execution

Run skill-loading agents with a dedicated user, scoped filesystem access, and deny-by-default network egress with an allowlisted domain set. See Blast Radius Containment and Dual-Boundary Sandboxing.

Example

A skill intake gate using skill-scanner and hash pinning before internal registry entry:

Block direct pull from public registry (agent config):

# .claude/settings.json — deny runtime skill fetch from external registries
permissions:
  deny:
    - Bash(curl:https://clawhub.io/*)
    - Bash(curl:https://skills.sh/*)

Intake gate before adding to internal mirror:

# 1. Scan the candidate skill (all engines, fail on high/critical)
skill-scanner scan ./candidate-skill/ \
  --use-behavioral --use-llm --enable-meta \
  --fail-on-severity high --format json > scan-report.json

# 2. Non-zero exit from skill-scanner signals failure; log and stop
[ $? -ne 0 ] && { echo "BLOCKED: skill-scanner found high/critical issues"; exit 1; }

# 3. Pin by content hash before registering to internal mirror
sha256sum candidate-skill/SKILL.md > candidate-skill/SKILL.md.sha256

# 4. Verify hash at agent load time — catches post-approval mutations
sha256sum --check candidate-skill/SKILL.md.sha256 || {
  echo "Skill content mismatch — rug pull detected"; exit 1
}

The agent config blocks runtime pulls from public registries. skill-scanner catches malicious patterns before any skill reaches the mirror, and the hash pin detects post-approval mutations ("rug pulls").

When This Backfires

The full stack carries operational cost, and partial adoption is common — but leaves residual exposure:

  • Scanner false positives: LLM-based semantic scanners misclassify legitimate security tooling, pen-test utilities, and obfuscated-but-valid config as malicious. Fail-on-high without review capacity blocks productive skills; lowering the threshold loses detection of real payloads.
  • Pinning vs. patch velocity: Hash pinning prevents rug-pull mutations but also blocks legitimate security patches. Without a re-vetting workflow, pinning creates a patching backlog.
  • Multi-model latency: Consensus roughly doubles per-invocation inference time. Latency-sensitive workflows disable the check to meet SLAs. Restrict it to first-use or high-privilege calls rather than every invocation.
  • Mirror governance drift: Without a clear owner, the internal mirror becomes a rubber stamp and skills bypass the intake gate informally.

The full stack is most justified when agents load third-party skills at runtime with broad filesystem or network access. For internal skill sets authored by the same team, hash pinning plus code review may suffice.

Key Takeaways

  • DDIPE hides payloads inside skill documentation code examples; in-context learning causes agents to reproduce them without explicit instruction, bypassing alignment that blocks direct injection at 0% under the same conditions
  • 36.82% of publicly available skills have security flaws; 2.5% of adversarial skills evade both static detection and model alignment
  • Treat skill registries with the same supply-chain rigor as npm or PyPI — vetting, pinning, and continuous monitoring, not one-time review
  • Multi-model verification reduces adversarial bypass to 1.6% of payloads — no single model's alignment should be the sole runtime defense
  • Internal mirrors with intake gates are the minimum viable posture: never allow agents to pull from public registries at execution time
Feedback