Rollback-First Design: Every Agent Action Should Be Reversible¶

Before choosing how an agent performs an action, choose how you will undo it — if recovery costs more than one command, reconsider the approach.

The Premise¶

Agents produce bad output. The question is not whether an agent will make a mistake, but what the recovery cost is when it does.

Rollback-first design treats recovery cost as a first-class constraint. For every agent action, ask "how hard is this to undo?" If the answer is "very," choose an approach that produces a reversible result.

The Undo Cost Spectrum¶

graph LR
    A[Instant<br/>delete branch] --> B[Easy<br/>close PR, revert commit]
    B --> C[Hard<br/>manual rollback, data restore]
    C --> D[Impossible<br/>sent email, charged card]

Design agent workflows to stay in the left half. When a step must land in the right half, add a human gate before it — see Human-in-the-Loop Placement.

Reversible Primitives¶

Git branches. Agent work on a branch is reversible: delete the branch, nothing on main is affected. Git's branching model makes creation and deletion nearly free, which is what makes it viable as a per-task primitive.

Draft PRs. A draft PR is visible and reviewable but not merged. Closing it discards the changes. Use draft PRs instead of direct pushes to main.

Labels over status. Changing a label is reversible. Closing an issue muddies history; deleting is irreversible on most platforms.

Comments over edits. Appending a comment is reversible (delete it). Editing a GitHub issue or PR body overwrites the original. Prefer comments for agent-generated observations; reserve body edits for structured fields.

Checkpoints. Claude Code checkpoints capture file state before each user prompt. Restoration is selective — code only, conversation only, or both.

Staging environments. Agent output affecting live systems should pass through staging. A bad draft in staging costs nothing to discard; a bad production deployment costs recovery time.

Transactional boundaries. IBM Research's STRATUS system uses a "transactional-no-regression" rule: mitigation agents may only take reversible actions within a transaction, and commands per transaction are capped to keep rollbacks tractable (IBM Research, 2025). The same applies to coding agents — bound each turn to changes that can be undone in one step.

What Cannot Be Made Reversible¶

Some actions have inherent irreversibility:

Sending external notifications (email, Slack, webhooks)
Charging or refunding payment instruments
Deleting external resources without snapshots
Pushing to a CDN or cache that propagates globally

For these, apply human gates before the action, not after. There is no rollback; the gate is the only defense.

Designing for Reversibility¶

Checklist for each agent action:

Can this be done on a branch instead of main?
Can the artifact be a draft before it becomes final?
Is there a checkpoint before this action?
If this action fails or is wrong, what is the one-command undo?

If the one-command undo does not exist, redesign the step before shipping the workflow.

Example¶

An agent refactors a module across 40 files. Halfway through, it makes a wrong assumption about the interface and produces broken code in 15 files.

Without rollback-first design:

The agent pushed directly to main
Recovery requires reverting individual commits or manually fixing 15 files
CI is broken; other developers are blocked

With rollback-first design:

All changes happened on a branch (agent/refactor-module-xyz)
A draft PR was opened for human review before any merge
Recovery is one command: git branch -D agent/refactor-module-xyz
Main is untouched; CI is unaffected; no one is blocked

The design choice was made before execution: work on a branch, open a draft PR, require human approval before merge. Each step has a one-command undo at the point it was created.

When This Backfires¶

Rollback-first design is not free. Conditions where it is worse than the alternative:

Reversibility hides root cause. When rollback is trivial, teams lean on "undo and retry" instead of fixing the underlying bug. A reliable undo path can delay diagnosis until the failure surfaces somewhere harder to reverse.
Gate latency dominates. In high-frequency loops (inner-loop edits, self-review cycles), forcing every action through a draft/approval gate adds human-scale delay to machine-scale work. The recovery saved is smaller than the throughput lost.
The action is already cheap to redo. For idempotent operations where re-running is faster than engineering a rollback primitive, the reversibility machinery is overhead. Prefer idempotent design when the natural answer is "just run it again."
Ephemeral environments. If the environment is throwaway (container spun up per task, test database that resets on each run), branch-level isolation is redundant — the environment itself is the rollback primitive.

The pattern scales with stakes. Apply it fully for agents touching shared codebases, production systems, or customer-facing data. Relax it when the blast radius is small and recovery is already trivial.

Key Takeaways¶

Treat undo cost as a design constraint, not an afterthought
Git branches, draft PRs, and checkpoints are the core reversible primitives
Comments are reversible; body edits less so — prefer comments for agent-generated content
External side effects (emails, webhooks, payments) cannot be made reversible — gate them instead
If recovery requires more than one command, reconsider the approach