Coding-Agent Reversibility: Platform Choice as a Two-Way Door¶

Coding agents make a platform choice reversible only in proportion to how well executable tests capture behaviour — not by any property of the agent.

Coding-agent reversibility is the decision-economics shift in which a platform or language choice — once a multi-year, one-way commitment — becomes a multi-day agent run if the codebase meets specific conditions. Mitchell Hashimoto framed it as: "Programming languages used to be LOCK IN, and they're increasingly not so" (Simon Willison, 2026-05-14). The conditions decide whether that holds or is a trap that ships the wrong stack.

When reversibility holds¶

Treat a platform decision as reversible only when all hold:

Condition	Why it matters
High-coverage behavioral tests	Agents converge against tests, so missing coverage lets semantic drift go unnoticed. FreshBrew gates migrations on test pass and coverage within 5 pp (arxiv 2510.04852).
Stable external contracts	Public APIs, data formats, and wire protocols anchor the translation. Internal refactors lack this anchor.
Shallow platform-API surface	The fewer `HealthKit`, `ARKit`, `CoreML`, or kernel-level calls, the more of the codebase is portable logic rather than platform glue.
Functional requirements dominate	Agents preserve logic, not latency distributions, memory profiles, or security postures.

If any is missing, the cost saving is the small half of the bill.

Why it works (when it does)¶

The agent's translate, compile, test, correct loop converges quickly when the suite is dense enough to anchor every behavioral decision, and slowly or wrongly when it is not (FreshBrew §3). The test suite is the portability substrate, and the agent amortizes it into a one-time port cost (Simon Willison, 2026-05-14).

The empirical anchor is Bun PR #30412: 1,009,257 lines of Rust replacing Zig in six days at a 99.8% pass rate, produced by Claude agents in a four-phase translate, error-correct, and verify loop (byteiota analysis). The pre-existing dense suite was the precondition.

When this backfires¶

Five situations break the framing:

Anemic test coverage. FreshBrew — same-language JDK 8 → JDK 17 upgrades with mandatory ≥50% baseline coverage — caps the top model (Gemini 2.5 Flash) at 52.3% project success across 228 real Java projects (arxiv 2510.04852). Cross-language ports against weaker suites do worse.
Tests pass but production does not. A logistics company migrating Java to Node.js passed every functional test, then failed under realistic load — the agent translated logic without preserving performance (eleks: Code Migration with AI).
The headline language change is cosmetic. Bun's Rust port shipped with 13,000+ unsafe blocks versus 73 in uv, a comparable-size project (byteiota). The 99.8% pass validates behavior at the public API, not that the unsafe blocks uphold memory invariants — a migration that nominally bought "memory safety" delivered something softer.
Deep platform-API integration. React Native ports of native apps still hit framework limits on heavy GPU work, AR, real-time video, and design-system fidelity — "feature parity becomes a budgeting problem instead of a technical one" (leanware, 2026). The agent ports your code, not platform capabilities.
Network effects are the real lock-in. Proprietary file formats, package registries, design-system libraries, and certification regimes (SOC2, HIPAA, PCI) survive a rewrite. The Bun port was partly forced by the Zig project's April 2026 ban on LLM-authored contributions (byteiota) — less voluntary than the framing implies.

The opposing posture — committing irreversibly — is documented in Burn the Boats. The two are not contradictions; reversibility is a property of the decision you've structured for it.

What to invest in instead¶

The prerequisites the agent cannot manufacture are what a healthy codebase already wants:

Behavioral tests over unit tests — property-based and end-to-end coverage against the contract surface.
Contract isolation — push platform-API calls behind a thin adapter so the retranslation surface stays small.
Performance baselines as artifacts — latency and memory profiles captured as benchmarks the post-migration build must match.

Reversibility is a side effect of these investments, not their goal.

Decision checklist¶

Score each before treating a choice as reversible:

Does the test suite cover behaviour, not just lines?
Are external contracts stable and documented?
Is the platform-API surface < 20% of the codebase?
Are non-functional requirements captured as benchmarks?
Is the lock-in in source code, or in data formats, certifications, and ecosystem?

A "no" on 1 or 2 means the migration is human-led with agent help, not agent-led.

Key Takeaways¶

Coding-agent reversibility is real but conditional; the binding constraint is behavioural test coverage, not agent capability.
The Bun Zig→Rust port (6 days, 1M lines, 99.8% tests passing, 13K unsafe blocks) shows both the speed dividend and the quality fine print.
FreshBrew caps best-in-class agents at 52.3% success on same-language JDK upgrades — cross-language ports against weaker tests do worse.
Performance, security posture, platform-API behaviour, and ecosystem lock-in survive an agent migration; only behaviourally-tested logic ports cleanly.
Invest in behavioural tests, contract isolation, and performance baselines if you want to keep the reversibility option open.

Burn the Boats — Commitment-Forcing Deprecation — the opposing posture: structured irreversibility as a forcing function.
LLM Agent Bug Fix Taxonomy — empirical bug patterns in agent-edited code; relevant when validating post-migration behaviour.
Portable Agent Definitions — adjacent reversibility shift for agent configuration, not application code.
Progressive Autonomy with Model Evolution — how trust in agent output scales with demonstrated reliability, which gates how confidently you can run large migrations.
Documentation-Guided Legacy Migration — the workflow side of agent-driven migration, focused on capturing behaviour before translation.