Encoding Values in AGENTS.md: Why Prose Without Verification Fails¶

Values written as AGENTS.md prose rarely change agent behavior; pair each one with a verification command or move it to a lower enforcement layer.

The Empirical Gap¶

Two recent corpus studies measured what developers encode in context files. Functional content dominates; values content is sparse.

Category	Wei et al. (2,303 files)	Liu et al. (466 OSS repos)
Implementation details	69.9%	Top category
Architecture	67.7%	47 instances
Build / run commands	62.3%	40 instances
Error handling / debugging	24.4%	—
Security	14.5%	6 instances
Performance	14.5%	—
Accessibility, fairness, sustainability, tone	Not measured (rare)	None found

Liu et al. classified instructions by writing style — descriptive, prescriptive, prohibitive, explanatory, conditional — and reported no explicit ethical, accessibility, fairness, or tone instructions across the analyzed AGENTS.md files (Liu et al., 2025). Wei et al. note the same gap: developers "provide few guardrails to ensure that agent-written code is secure or performant" (Wei et al., 2025).

A later vision paper tempers how absolute that gap is: Treude et al., 2026 report that developers are already embedding fairness, accessibility, sustainability, tone, and privacy guidance, framing AGENTS.md as a "developer-authored governance layer." But the authors explicitly defer the question that matters here — whether agents reliably adhere to those values — to future work. Presence of values prose is not evidence it changes behavior, which is the gap this page addresses.

Why Values-as-Prose Fails¶

graph TD
    A[Values written as prose<br>in AGENTS.md] --> B[Compliance ceiling<br>~68% on long rule sets]
    A --> C[Primacy bias<br>earlier rules win attention]
    B --> E[Values omitted on individual turns]
    C --> E
    E --> F[No verification step<br>catches the omission]
    F --> G[Documented value,<br>unchanged behavior]

Frontier models top out at roughly 68% accuracy at 500 simultaneous instructions, and earlier instructions are satisfied more reliably than later ones — primacy effects peak around 150–200 instructions (Jaroslawicz et al. — How Many Instructions Can LLMs Follow at Once?). A "be accessible" sentence in a 500-line AGENTS.md inherits both penalties. Gloaguen et al. add a direct cost: verbose AGENTS.md files reduce task success and add ~20% inference cost on SWE-bench Lite and AGENTbench (Gloaguen et al., 2026).

Verification, Not Prose¶

Pair every value with a mechanical check the agent runs; reduce the AGENTS.md line to a pointer.

Value	Prose-only (low signal)	Verification-paired (high signal)
Accessibility	"Write accessible UIs."	"After UI changes, run `pnpm test:a11y` (axe-core); fix violations before commit."
Licensing	"Respect open-source licenses."	"Run `pnpm licenses:check`; only `MIT`, `Apache-2.0`, `BSD-*` allowed."
Fairness in data	"Avoid biased datasets."	"Run `scripts/dataset-audit.py` on every new dataset; CI fails on parity-check failure."
Security	"Write secure code."	"Run `gitleaks detect` and `npm audit --omit=dev` before commit."

This matches the broader finding that guardrails beat guidance — tool-specific commands are the only AGENTS.md content with reliable behavioral effect (Wei et al., 2025).

Where Values Actually Belong¶

If the goal is enforced values, AGENTS.md is rarely the right layer; each value usually has a lower-level mechanism:

Permissions / sandboxes — deny rules enforce "do not exfiltrate data" without prose
CI checks — accessibility linters, license scanners, dataset audits, dependency scans
Pre-commit hooks — secret scanning, formatting, deny-rule enforcement
Branch protection — "do not commit to main" becomes a server-side rule

AGENTS.md then references the mechanism: "Run make check-a11y. If it fails, do not propose merging." That works because the agent can verify the outcome. Prose still earns space when it is short and points to a mechanism; a long ethics preamble with no follow-through does not.

When This Backfires¶

Verification-pairing is the right default, but it has failure conditions:

Not every value is mechanizable. "Use inclusive tone" has no clean linter; forcing one yields a brittle matcher that misfires — worse than honest prose plus human review.
Over-mechanization breeds checkbox theater. Reduce a value to "CI is green" and teams optimize the check, not the value: a passing dataset-audit.py can certify data that is fair on the measured axis and biased on an unmeasured one.
Premature mechanisms misdirect. Wiring a check before the value is understood freezes a wrong proxy into CI; some values are better left as reviewed prose until a faithful check exists.

Where no faithful, cheap check exists, prose pointing at human review beats a misleading green check.

Example¶

A real "before" pattern, rewritten for verification:

Before — values as advisory prose:

# AGENTS.md

## Our values

We care deeply about accessibility, sustainability, and inclusive language.
Please write code that respects these values.

## Build

pnpm install
pnpm build

After — values pinned to mechanisms:

# AGENTS.md

## Build & verify

- pnpm install
- pnpm build
- pnpm test:a11y      # accessibility (axe-core); CI fails on violations
- pnpm licenses:check # sustainability/licensing (MIT/Apache-2.0/BSD-* only)
- pnpm test           # unit + integration

Run all four before proposing a commit. Do not propose merging if any fail.

See docs/a11y.md and docs/licensing.md for the underlying policies.

The "after" version contains the same values commitments. The difference is that every value points to a command the agent runs, the result of which the team can audit.

Key Takeaways¶

Corpus studies show developers rarely encode fairness, accessibility, sustainability, or tone in AGENTS.md; functional context dominates (Wei et al., Liu et al.)
Values-as-prose inherits the compliance ceiling and primacy bias — read by the model, applied unreliably, never verified
Verbose AGENTS.md actively reduces task success and raises cost ~20% (Gloaguen et al.); adding values prose has a real cost
Pair every value with a verification command, or move it to a lower-layer mechanism (permissions, CI, hooks, branch protection)
Keep AGENTS.md as a pointer: short rule, named command, link to policy

Sources¶

Wei et al. — Agent READMEs: An Empirical Study of Context Files for Agentic Coding — 2,303 context files; functional categories dominate, security/performance under 15%
Liu et al. — Context Engineering for AI Agents in Open-Source Software — 466 OSS repos; five writing styles; no explicit ethical, accessibility, fairness, or tone instructions found
Gloaguen et al. — Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents? — verbose context files reduce success and add ~20% cost
Jaroslawicz et al. — How Many Instructions Can LLMs Follow at Once? — frontier models top out at 68% at 500 instructions; primacy bias peaks around 150–200 instructions
Zhang et al. — Do Agent Rules Shape or Distort? Guardrails Beat Guidance in Coding Agents — negative constraints help, positive directives hurt; ground for the verification-not-prose recommendation
Treude et al. — Operationalizing Ethics for AI Agents: How Developers Encode Values into Repository Context Files — vision paper; finds developers already embed fairness/accessibility/sustainability/tone/privacy guidance, but defers whether agents adhere to it