Skip to content

Encoding Values in AGENTS.md: Why Prose Without Verification Fails

Values written as AGENTS.md prose rarely change agent behavior; pair each one with a verification command or move it to a lower enforcement layer.

The Empirical Gap

Two recent corpus studies measured what developers encode in context files. Functional content dominates; values content is sparse.

Category Wei et al. (2,303 files) Liu et al. (466 OSS repos)
Implementation details 69.9% Top category
Architecture 67.7% 47 instances
Build / run commands 62.3% 40 instances
Error handling / debugging 24.4%
Security 14.5% 6 instances
Performance 14.5%
Accessibility, fairness, sustainability, tone Not measured (rare) None found

Liu et al. classified instructions by writing style — descriptive, prescriptive, prohibitive, explanatory, conditional — and reported no explicit ethical, accessibility, fairness, or tone instructions across the analyzed AGENTS.md files (Liu et al., 2025). Wei et al. note the same gap: developers "provide few guardrails to ensure that agent-written code is secure or performant" (Wei et al., 2025).

A later vision paper tempers how absolute that gap is: Treude et al., 2026 report that developers are already embedding fairness, accessibility, sustainability, tone, and privacy guidance, framing AGENTS.md as a "developer-authored governance layer." But the authors explicitly defer the question that matters here — whether agents reliably adhere to those values — to future work. Presence of values prose is not evidence it changes behavior, which is the gap this page addresses.

Why Values-as-Prose Fails

graph TD
    A[Values written as prose<br>in AGENTS.md] --> B[Compliance ceiling<br>~68% on long rule sets]
    A --> C[Primacy bias<br>earlier rules win attention]
    B --> E[Values omitted on individual turns]
    C --> E
    E --> F[No verification step<br>catches the omission]
    F --> G[Documented value,<br>unchanged behavior]

Frontier models top out at roughly 68% accuracy at 500 simultaneous instructions, and earlier instructions are satisfied more reliably than later ones — primacy effects peak around 150–200 instructions (Jaroslawicz et al. — How Many Instructions Can LLMs Follow at Once?). A "be accessible" sentence in a 500-line AGENTS.md inherits both penalties. Gloaguen et al. add a direct cost: verbose AGENTS.md files reduce task success and add ~20% inference cost on SWE-bench Lite and AGENTbench (Gloaguen et al., 2026).

Verification, Not Prose

Pair every value with a mechanical check the agent runs; reduce the AGENTS.md line to a pointer.

Value Prose-only (low signal) Verification-paired (high signal)
Accessibility "Write accessible UIs." "After UI changes, run pnpm test:a11y (axe-core); fix violations before commit."
Licensing "Respect open-source licenses." "Run pnpm licenses:check; only MIT, Apache-2.0, BSD-* allowed."
Fairness in data "Avoid biased datasets." "Run scripts/dataset-audit.py on every new dataset; CI fails on parity-check failure."
Security "Write secure code." "Run gitleaks detect and npm audit --omit=dev before commit."

This matches the broader finding that guardrails beat guidance — tool-specific commands are the only AGENTS.md content with reliable behavioral effect (Wei et al., 2025).

Where Values Actually Belong

If the goal is enforced values, AGENTS.md is rarely the right layer; each value usually has a lower-level mechanism:

  • Permissions / sandboxes — deny rules enforce "do not exfiltrate data" without prose
  • CI checks — accessibility linters, license scanners, dataset audits, dependency scans
  • Pre-commit hooks — secret scanning, formatting, deny-rule enforcement
  • Branch protection — "do not commit to main" becomes a server-side rule

AGENTS.md then references the mechanism: "Run make check-a11y. If it fails, do not propose merging." That works because the agent can verify the outcome. Prose still earns space when it is short and points to a mechanism; a long ethics preamble with no follow-through does not.

When This Backfires

Verification-pairing is the right default, but it has failure conditions:

  • Not every value is mechanizable. "Use inclusive tone" has no clean linter; forcing one yields a brittle matcher that misfires — worse than honest prose plus human review.
  • Over-mechanization breeds checkbox theater. Reduce a value to "CI is green" and teams optimize the check, not the value: a passing dataset-audit.py can certify data that is fair on the measured axis and biased on an unmeasured one.
  • Premature mechanisms misdirect. Wiring a check before the value is understood freezes a wrong proxy into CI; some values are better left as reviewed prose until a faithful check exists.

Where no faithful, cheap check exists, prose pointing at human review beats a misleading green check.

Example

A real "before" pattern, rewritten for verification:

Before — values as advisory prose:

# AGENTS.md

## Our values

We care deeply about accessibility, sustainability, and inclusive language.
Please write code that respects these values.

## Build

pnpm install
pnpm build

After — values pinned to mechanisms:

# AGENTS.md

## Build & verify

- pnpm install
- pnpm build
- pnpm test:a11y      # accessibility (axe-core); CI fails on violations
- pnpm licenses:check # sustainability/licensing (MIT/Apache-2.0/BSD-* only)
- pnpm test           # unit + integration

Run all four before proposing a commit. Do not propose merging if any fail.

See docs/a11y.md and docs/licensing.md for the underlying policies.

The "after" version contains the same values commitments. The difference is that every value points to a command the agent runs, the result of which the team can audit.

Key Takeaways

  • Corpus studies show developers rarely encode fairness, accessibility, sustainability, or tone in AGENTS.md; functional context dominates (Wei et al., Liu et al.)
  • Values-as-prose inherits the compliance ceiling and primacy bias — read by the model, applied unreliably, never verified
  • Verbose AGENTS.md actively reduces task success and raises cost ~20% (Gloaguen et al.); adding values prose has a real cost
  • Pair every value with a verification command, or move it to a lower-layer mechanism (permissions, CI, hooks, branch protection)
  • Keep AGENTS.md as a pointer: short rule, named command, link to policy

Sources

Feedback