Skip to content

Intent-Centric Engineering: Oversight Over Authorship

When code generation is cheap, engineering leverage moves from authorship to specifying intent and governing humans, agents, tools, and evidence gates.

Intent-centric engineering is the operating model where the engineer's primary work is specifying what the system should do, designing the evidence gates that prove it does, and governing the socio-technical system — humans plus agents plus tools — that produces and verifies the code. Authorship is delegated; intent and oversight are not. The framing comes from De La Cruz's reflexive thematic analysis of GenAI and agentic software engineering: GenAI's paradoxical effect is to raise the value of intent specification, context curation, architectural judgment, and verification as it lowers the cost of code production (De La Cruz, arXiv:2605.11027).

It is a destination, not a default. The conditions are specific and the failure modes are sharp — read When This Backfires before adopting it as a generic prescription.

When the Posture Pays Back

Three conditions make the intent-centric posture economically defensible:

  • Repeated or fanned-out generation. A single one-shot task does not recoup the cost of authoring a precise intent specification plus an evidence-gate harness. The investment pays off when agents iterate or fan out across many runs — the same boundary Spec Complexity Displacement identifies for spec-driven development.
  • Verification capacity exists. The team must have, or be willing to build, mechanical evidence gates — tests, schemas, linters, security scans, automated review — that catch bug classes rather than relying on individual review judgment. Without that scaffold, "oversight" becomes ceremonial.
  • Reviewers can evaluate generated output. Junior teams without the experience to assess agent output against an intent specification produce a rubber-stamp checkpoint — the failure mode already named for the merge button (Empowerment Over Automation).

When any of these conditions fails, the team is not yet ready to relocate effort upward; investing in deterministic harnesses and verification capacity first is the correct sequencing.

The Mechanism

Code generation accelerates production faster than human review scales. Faros AI data from teams with high AI adoption shows 98% more PRs merged but with 91% longer review times — code generation roughly doubled, review capacity did not (Osmani: The 80% Problem). Because the engineer cannot match generation throughput line-by-line, leverage migrates upstream to the gates that compress decision volume:

  • A precise intent specification compresses many possible implementations into one acceptable region.
  • A constraint-bearing harness compresses many possible code states into a verifiable subset.
  • Evidence gates make verification mechanical rather than judgment-bound.

This is the same mechanism Martin Fowler named "rigor relocation" — engineering discipline does not vanish, it moves to constraint design, verification systems, and intent specification (see the project-internal treatment in Rigor Relocation). Intent-centric engineering names where the rigor relocates: the layer above authorship.

Why It Works

The causal reason this shift is more than relabeling is enforcement locality. A precise intent specification fixes the acceptance region at the point an agent generates output; an evidence gate fires at the moment of decision, not after the output has propagated through review. LangChain demonstrated the magnitude of the effect empirically — a coding agent improved from Terminal Bench 2.0 rank 30 to rank 5 with no model change, only harness investment in pre-completion checklists, loop detection, and structured verification (LangChain). The same engineering instinct previously applied to code is now applied to the layer that produces and verifies code — the surface changes, the discipline does not.

GitHub's data ratifies the framing from the operational side. The merge button "still needs (and, in our view, always will need) a developer fingerprint" because three categories of work remain "stubbornly human": architecture trade-offs, mentorship and culture, and ethical decisions about whether to build something (GitHub: Why Developers Will Always Own the Merge Button). Intent and oversight are precisely the work the merge button represents.

What Relocates

The skills that gain weight relative to authorship are the ones that compress decisions or make verification mechanical:

Skill that gains weight What it does
Intent specification Compresses many possible implementations into one acceptance region
Context curation Determines what the agent can access — the context-engineering discipline
Architectural judgment Sets boundaries agents cannot reliably reason about
Verification design Builds evidence gates that catch bug classes, not individual bugs
Security and provenance Tracks what was generated, by what, against what intent
Governance Allocates accountability across the human-plus-agent system
Accountable judgment Owns the merge decision when the evidence gates pass

The list is not new disciplines. It is the redistribution of weight away from authorship toward practices that already existed but were secondary when code-writing was the bottleneck.

When This Backfires

The intent-centric posture has real failure modes. Adopting it as a generic prescription without the conditions above produces worse outcomes than continuing to write code.

  • Spec-as-code displacement. Specifications precise enough to drive reliable generation accumulate schemas, pseudocode, and constraints until they become code-adjacent. Scott Logic found Spec Kit produced 2,000+ lines of Markdown per feature and still introduced bugs, while iterative prompting produced working code ~10× faster (Scott Logic). Addy Osmani names the upper-bound failure the "curse of instructions": as a spec accumulates detail, model adherence to individual instructions degrades (Osmani, O'Reilly). The "intent" surface can grow until it carries the same complexity authorship used to — see Spec Complexity Displacement.
  • Skill atrophy compounds. Engineers who only specify and supervise lose the capability to evaluate what they supervise. The METR study found developers using AI estimated they were 20% faster while actually running 19% slower — a 39-point perception gap (METR). Atrophy is self-concealing; oversight without retained coding capacity becomes ceremonial. See Skill Atrophy.
  • Vendor ToS undercut accountability. Treude's analysis of AI development-tool Terms of Service finds "a consistent tendency to shift responsibility for correctness, safety, and legal compliance onto users" and concludes current governance frameworks are "poorly aligned with increasingly agent-mediated and autonomous software development workflows" (Treude, arXiv:2605.04532). Intent-centric engineering without contractual accountability becomes a unilateral burden, not a partnership.
  • Bottleneck migration without capacity investment. The Faros AI 98%/91% asymmetry is the warning, not the prescription: code volume grows faster than review bandwidth (Osmani: The 80% Problem). Teams that adopt the intent-centric posture without simultaneously investing in mechanical evidence gates find their accountability surface grows faster than their oversight capacity.

The honest framing: do not relocate rigor upward as a posture. Invest in mechanical evidence gates and harness constraints first, then treat the intent-centric model as the operating mode the harness investment unlocks.

Example

A platform team running an agentic refactor across a large repo writes the change as an intent specification — the invariants the refactor must preserve, the types it must not change, the test cases that must continue to pass — rather than as a sequence of code edits. The harness enforces the intent mechanically: a type checker, the existing test suite, and a custom linter that fails on banned API patterns. The team's senior engineers spend their time on the intent specification and the harness rules; the agent does the authorship; the evidence gates produce the proof that the change is acceptable. Review focuses on the architectural decisions encoded in the intent spec and the structural changes the agent proposes — not on line-by-line code style. This is the pattern GitHub's four-stage maturity model labels "Strategist" — orchestrating agents as "creative director of code" rather than implementing line by line (GitHub Octoverse: New Identity of a Developer).

The same team rejects a proposal to apply the model to a one-off prototype. The intent-spec-plus-harness overhead does not recoup for a single agent run; the spec-first investment only pays off when agents iterate or fan out.

Key Takeaways

  • Intent-centric engineering pays back only under specific conditions: repeated or fanned-out generation, existing verification capacity, and reviewers who can evaluate agent output. Outside those conditions, build the harness first.
  • The mechanism is bottleneck migration plus enforcement locality — review cannot scale with code generation, so leverage moves to upstream gates that compress decision volume at the moment of decision.
  • The skills that gain weight are intent specification, context curation, architectural judgment, verification design, security and provenance, governance, and accountable judgment. The discipline is not new; the weighting is.
  • The sharpest failure modes are spec-as-code displacement, self-concealing skill atrophy, vendor ToS that shift liability onto operators, and adopting the posture without the mechanical evidence gates that make oversight tractable.
Feedback