Skip to content

Backlog Triage as a Named Agent Skill

A skill encodes a state machine into issue labels — each item carries one category and one state, then hands off a durable agent brief.

A named triage skill sits between human intent and AFK agent execution. It takes whatever lands on the issue tracker — half-written bug reports, customer notes, idea fragments, design-doc snippets — and decides whether the item is ready for an agent, a human, more information, or rejection. Matt Pocock's open-source /triage skill is the worked example below. The pattern is portable.

State Machine in Labels

Two category roles describe the work:

  • bug — something is broken
  • enhancement — new feature or improvement

Five state roles describe lifecycle position:

State Meaning
needs-triage maintainer needs to evaluate
needs-info waiting on reporter for more information
ready-for-agent fully specified, ready for an AFK agent
ready-for-human needs human implementation
wontfix will not be actioned

Every triaged issue carries exactly one category role and one state role. Conflicting states are flagged before action (SKILL.md).

graph TD
    A[Unlabeled] --> B[needs-triage]
    B --> C[needs-info]
    B --> D[ready-for-agent]
    B --> E[ready-for-human]
    B --> F[wontfix]
    C -->|reporter replies| B

The labels are the prompt. Five states collapse the open-ended question "what should happen with this issue?" into one of ten cells (state × category). Each cell has a fixed output shape — agent brief, needs-info template, out-of-scope record, polite close — so the skill cannot produce unstructured comments. This is the same constraint-as-prompt mechanism that gives structured tool use its reliability: the model picks a slot, not a format.

Per-Issue Process

For one issue, the skill runs six steps (SKILL.md):

  1. Gather context — read the body, comments, prior triage notes, ADRs in the affected area, and .out-of-scope/*.md. Surface any prior rejection that resembles this issue.
  2. Recommend category and state with reasoning. Wait for direction.
  3. Reproduce (bugs only) — trace the relevant code, run tests or commands, report repro / no-repro / insufficient-detail (SKILL.md). A confirmed repro produces a stronger brief.
  4. Grill if the issue needs fleshing out — run an interview-style refinement session.
  5. Apply the outcome — assign labels, post the corresponding template comment, close if wontfix.
  6. Disclaim provenance — every comment posted by the skill begins with > *This was generated by AI during triage.*.

The maintainer can override transitions at any time. "Move #42 to ready-for-agent" trusts the maintainer and skips grilling, but the skill still asks whether to write an agent brief before promoting (SKILL.md).

The Agent Brief Contract

ready-for-agent triggers an agent brief — a structured comment that becomes the authoritative spec for the downstream executor. The original issue body is context; the brief is the contract (AGENT-BRIEF.md).

Four rules govern brief writing:

  • Durability over precision — describe interfaces, types, and behavioral contracts. Never reference file paths or line numbers; the issue may sit for days while the codebase moves.
  • Behavioral, not procedural — describe what the system should do, not how to implement it. The downstream executor will explore the codebase fresh, reconstructing structure at run time (issue requirements preprocessing).
  • Complete acceptance criteria — every brief lists concrete, testable criteria. Each criterion is independently verifiable.
  • Explicit scope boundaries — state what is out of scope to prevent gold-plating.

This is the upstream complement to issue requirements preprocessing: the brief is the structured input the executor receives, written before the executor opens its first context window. The REAgent paper measures a 17.40% lift in resolution rate when the executor reconstructs structured requirements at run time (Kuang et al., 2026) — a triage-skill brief moves that work upstream and out of the executor's context budget.

Out-of-Scope as Institutional Memory

Rejected enhancements are written to .out-of-scope/<concept>.md — one file per concept, not per issue. The file captures the decision, the reasoning, and a "Prior requests" list of every issue that asked for the feature (OUT-OF-SCOPE.md). During context gathering on every new issue, the skill reads this directory and surfaces matches by concept similarity ("night theme" matches dark-mode.md).

The mechanism is durable institutional memory: the skill cannot re-litigate decided questions on each invocation, similar to how agent memory patterns preserve state across sessions.

When the Pattern Earns Its Cost

The triage skill is human-invoked and deliberately keeps the maintainer in the loop. That makes it the right model when:

  • The repo has an established codebase glossary or ADR set the agent can ground in
  • Issue volume justifies bookkeeping but doesn't warrant lights-out automation
  • The downstream executor is an AFK agent that benefits from a durable, structured brief
  • The team needs explicit institutional memory for rejected requests

It is the wrong model — a different shape fits — when:

  • High-volume bot issues (Dependabot, security scanners) need silence or auto-close, not classification. Run a different filter such as continuous triage; the state machine assumes human-authored intent.
  • Lights-out triage on every event is the goal. Use continuous triage on GitHub Actions with safe-outputs: [add-label, add-comment] (GitHub Agentic Workflows) — no maintainer in the loop.
  • Batch intake from a single source (a QA session, a customer interview transcript) needs deduplication and codebase investigation across many candidates at once. Use the QA session to issues pipeline.
  • Rigid issue-tracker workflow states (some Jira / ServiceNow configurations) cannot host the label-as-state-machine pattern without admin changes.
  • The issue tracker has no codebase glossary or ADRs. Step 1 of the per-issue process depends on grounding the agent. Without it, recommendations are shallow or hallucinated.

Failure Modes

  • Hallucinated agent briefs. If the maintainer skips reproduction and lets the skill auto-promote to ready-for-agent based on the reporter's description alone, the brief inherits any factual errors and the downstream agent burns context on a wrong-premise task. The skill's own guidance — confirm a repro before producing a brief — exists because this is the dominant failure mode (SKILL.md).
  • Brief drift. Briefs that reference file paths or line numbers go stale within days. The durability-over-precision rule is a hedge against this — interfaces and contracts survive renames; line numbers do not.
  • Out-of-scope concept matching. Matching is by concept similarity, not keyword. Failed matches cause the skill to re-evaluate already-rejected requests; over-eager matches close legitimately distinct issues. The maintainer confirms or rejects the match each time.
  • Tight .out-of-scope/ files for deferrals. Reasons like "we're too busy right now" are not real rejections; they are deferrals that produce out-of-scope files the maintainer later regrets. Substantive reasons reference scope, technical constraints, or strategic decisions (OUT-OF-SCOPE.md).

Example

Pocock's walkthrough on the Sandcastle repo shows the skill against a real backlog (AI Hero post).

/triage
Just give me all of the open issues that I haven't triaged yet.

The agent finds nine untriaged issues and labels each with a category and needs-triage. The maintainer then narrows: "Could you start with 477 for me?" The agent reads issue #477, finds existing triage notes with a stack trace, and recommends ready-for-agent. The maintainer asks the agent to reproduce the bug rather than trust the reporter:

Could you diagnose this yourself?

The agent traces task-ID variables, locates the literal-vs-placeholder mismatch, writes a regression test, applies the fix, pushes to main, and closes the issue. Context use sits at 46.5k of 100k tokens — the brief is short because the codebase grounding did the work upstream of the executor.

Key Takeaways

  • A named triage skill encodes a state machine into labels — every triaged issue carries one category and one state, with no ambiguous middle ground
  • The agent brief is the hand-off contract: durable, behavioral, with testable acceptance criteria and explicit scope boundaries
  • .out-of-scope/<concept>.md files give the skill institutional memory; the skill checks them on every new issue and surfaces matches before triaging
  • The pattern is human-in-the-loop by design — choose continuous triage for lights-out classification or the QA session pipeline for batch intake
  • Reproducing bugs before producing a brief is the most reliable defence against hallucinated specs that waste downstream agent context
Feedback