Swarm Skills: Multi-Agent Extension of the Agent Skills Standard¶

Swarm Skills is a 2026 proposal extending Agent Skills with multi-agent roles, a workflow layout, and a self-evolution lifecycle for portable coordination protocols.

What the Spec Adds to Agent Skills¶

The base Agent Skills standard packages single-agent task knowledge in a SKILL.md. Swarm Skills layers on multi-agent semantics without replacing that contract (Swarm Skills paper):

Field added	Purpose
`kind: swarm-skill`	Discriminator that lets a single-agent host gracefully ignore the skill via `additionalProperties: true`
`teammate_mode`	Interaction paradigm — e.g., `build_mode` for autonomous execution, `plan_mode` for approval workflows
`roles[]`	Participants; each entry carries an `id`, required skills and tools, and an optional target model

A host that does not know the swarm-skill kind keeps loading the file as a regular skill and skips the unknown fields.

File Layout¶

A Swarm Skill is a directory, not a single file (Swarm Skills paper):

my-swarm-skill/
  SKILL.md          # frontmatter + natural-language body
  roles/            # one persona file per teammate id
    leader.md
    researcher.md
    reviewer.md
  workflow.md       # task dependency graph (sequential, parallel, fan-out/fan-in)
  bind.md           # operational boundaries: turns, token budgets, quality gates
  dependencies      # other Swarm Skills this one depends on
  evolutions.json   # runtime artifact — appended by the host, not hand-edited

The Host Agent loads only SKILL.md frontmatter to route a task. Role files, workflow.md, and bind.md are pulled on demand via the host's native read_file — no new DSL or runtime plugin. This extends Anthropic Skills' progressive-disclosure pattern to multi-agent metadata.

The CREATE / USE / PATCH Lifecycle¶

graph TD
    A[Multi-agent run] --> B{>=2 distinct roles<br/>and cross-agent deps?}
    B -- yes --> C[CREATE: distill trace<br/>into candidate Swarm Skill]
    B -- no --> A
    C --> D[USE: Host injects skill<br/>description into Leader prompt]
    D --> E[Run produces new trace]
    E --> F[PATCH: scan for friction<br/>append Evolution Record]
    F -->|>=10 records| G[Governance:<br/>SIMPLIFY / REBUILD / ROLLBACK]
    F --> D
    G --> D

CREATE — trajectory distillation. A trace with ≥2 distinct sub-agent roles and cross-agent dependencies is synthesized by an LLM into a candidate Swarm Skill.
USE — progressive disclosure. The Host reads description from frontmatter into the Leader prompt; full role and workflow content is loaded only after selection. Existing Evolution Records are appended to the relevant instructions.
PATCH — friction-driven optimisation. Post-execution analysis scans traces for circular dependencies, redundant loops, and premature termination, then appends a new Evolution Record (Context, Change Directive, Scoring Metrics) to evolutions.json.

Scoring Formula¶

Each Evolution Record carries a composite score that decides whether it stays, gets pruned, or triggers a rewrite (Swarm Skills paper):

S_i = w_E * E + w_U * U + w_F * F

Component	What it measures
Effectiveness (E in [0,1])	Qualitative impact of the patch, stabilised with Bayesian smoothing using a Beta(1,1) prior so early evaluations do not dominate
Utilization (U in [0,1])	Adoption rate — if the Leader consistently ignores an appended instruction, U decays
Freshness (F in [0,1])	Time-decay factor with an exponential half-life (e.g., 90 days) so stale optimisations gradually drop out

At ≥10 records, three governance actions unlock: SIMPLIFY prunes via LLM categorisation (delete, merge, refine, retain); REBUILD rewrites the spec, archives the prior version, and clears evolutions.json; ROLLBACK reverts to any archived state.

Reference Implementation¶

JiuwenSwarm is the reference Host Agent on openJiuwen.com; community swarm skills are intended to live at swarmskills.openjiuwen.com. The algorithm "strictly interacts with the schema defined by the Swarm Skills specification" — the authors claim this makes it portable to other multi-agent runtimes without framework-specific plugins (Swarm Skills paper).

What Is Not Yet Proven¶

The paper is a specification proposal with measurement support, not a benchmarked system. The authors call out the gaps explicitly (Swarm Skills paper):

No quantitative benchmarks of the self-evolution algorithm. The empirical work is a measurement study of 33 queries across 9 Anthropic Skills repositories crawled in April 2026 — it shows multi-role skills are being authored (e.g., engineering-team with 14 roles), but does not measure whether the scoring mechanism improves outcomes.
No conformance testing across diverse Host Agents. Effectiveness on hosts lacking native recursive read_file or dynamic sub-agent spawning "remains to be empirically evaluated."
First-run lock-in. A severely suboptimal initial workflow tends to accumulate patches rather than trigger REBUILD, entrenching bad decisions in an opaque chain of Evolution Records.
Adoption beyond the authors is unverified. The community Swarm Skills hub is currently inaccessible to anonymous fetch.

For projects already invested in framework-native multi-agent code, the portability benefit is theoretical until a second Host Agent ships conformant support. A related critique that some "LLM swarm" framings overstate genuine swarm behaviour applies (LLM-Powered Swarms: A New Frontier or a Conceptual Stretch?).

Key Takeaways¶

Swarm Skills extends Agent Skills with three frontmatter fields (kind, teammate_mode, roles[]) plus a roles/ + workflow.md + bind.md + evolutions.json layout — no new DSL.
A CREATE / USE / PATCH lifecycle distills multi-agent traces into reusable skills and appends Evolution Records on friction signals.
Scoring is S = w_E*E + w_U*U + w_F*F with Bayesian-smoothed effectiveness, decaying utilisation, and exponential-half-life freshness; governance fires SIMPLIFY, REBUILD, or ROLLBACK above 10 records.
The spec is a May 2026 proposal — no independent benchmarks of the self-evolution loop, no conformance tests on hosts other than JiuwenSwarm, and the authors flag "first-run lock-in" as an open failure mode.