Action-Audit Divergence: A Four-Mode Taxonomy for Runtime Hardening¶

A runtime action-audit divergence takes four forms — gate-bypass, audit-forgery, silent host failure, wrong-target — each a coverage question for existing controls.

What the Runtime Must Guarantee¶

An agentic runtime issues tool calls and actuates devices for an LLM. Its load-bearing safety property is that the audit record matches what actually happened. Metere (arXiv:2605.01740) formalises this as four divergence modes:

Mode	Name	What the audit lies about
F1	Gate-bypass	Authorisation said no; the action ran
F2	Audit-forgery	Action ran; log shows a different action
F3	Silent host failure	Log says action ran; host did nothing
F4	Wrong-target	Log names target X; action hit target Y

The taxonomy is a navigation aid, not a defense — it converts "is this runtime hardened?" into four closed questions mapped to existing controls.

Formally the property is a multiset equality: the intended (capability, target) pairs must equal those executed after every action (Metere, arXiv:2605.01740). A biconditional checker — logging denials, not just allows — fails closed on any diff, detecting divergence the per-mode controls below fail to prevent.

Mapping Each Mode to Existing Controls¶

graph TD
    F1[F1 Gate-Bypass] --> C1[Action-Selector + Admission Gate]
    F2[F2 Audit-Forgery] --> C2[Hash-Chained Tamper-Evident Log]
    F3[F3 Silent Host Failure] --> C3[Bootstrap Seal + Module Signing]
    F4[F4 Wrong-Target] --> C4[Egress Policy + URL/Target Validation]

    style F1 fill:#fce8e6,stroke:#d93025
    style F2 fill:#fef3e0,stroke:#e8a100
    style F3 fill:#e8f4fd,stroke:#1a73e8
    style F4 fill:#e6f4ea,stroke:#1e8e3e

F1 — Gate-bypass. Authorisation rejected the request; the action ran anyway. The control is a single chokepoint every tool call must pass. The action-selector pattern restricts the LLM to a fixed catalog so unsanctioned actions are unrepresentable; the MCP runtime control plane intercepts every MCP call at one policy point. Logging denials, not just allows, closes the asymmetry attackers exploit when only allow-paths are observable.

F2 — Audit-forgery. The action ran and was logged, but the log was modified to claim a different action ran. Tamper-evident hash chains defeat this by construction: each entry includes the hash of the previous, so any modification breaks the chain on verification (AuditableLLM, MDPI 2026). The site's Cryptographic Governance Audit Trail covers the implementation with ML-DSA-65 receipt signing.

F3 — Silent host failure. The log records "action X executed" but the host did nothing — process crashed, error swallowed, container killed mid-call. The signal must come from outside the runtime: bootstrap seals verify a known-good start state, module signing verifies executing code matches audited code, and post-execution probes confirm the side effect landed. Without these, F3 looks identical to drift.

F4 — Wrong-target. Log says "emailed alice@" but the message went to attacker@. The control is target validation at the egress boundary, not at argument generation. The agent network egress policy restricts reachable domains; the URL exfiltration guard validates targets independently of LLM intent.

Using the Taxonomy as a Review Checklist¶

Walk F1-F4 against any runtime or harness:

F1 — name the chokepoint. Where does every tool call pass authorisation? "The LLM checks" is not a chokepoint — the LLM is what is being authorised.
F2 — name the integrity mechanism. Append-only is not enough; the log must be tamper-evident under an attacker on the host. Hash chains, Merkle trees, or external receipt sinks (nono.sh on tamper-evident agent audit) close the gap.
F3 — name the liveness probe. What confirms the action actually ran? Side-effect verification, downstream acks, or out-of-band telemetry beat "the call returned 200".
F4 — name the target validator. What checks the file path, hostname, recipient, or endpoint is the intended one, independent of LLM-generated arguments? HashiCorp's write-up frames this as unifying infrastructure telemetry with identity logs.

A control may cover multiple modes (a hash-chained log with policy receipts covers F1 and F2), and a mode may need several controls. The taxonomy does not prescribe — it names the question each control answers.

Where the Framing Backfires¶

The decomposition assumes there is an audit worth defending. Three conditions where it adds cost without value:

Single-user local runtimes with no compliance obligation. F1-F4 each motivate non-trivial architecture; capability minimisation and rollback-first design deliver more safety per unit of complexity.
Pure-text agents. Without tool calls, there is no action to diverge from an audit.
Reversible-state systems. When every action is rolled back on detection of badness, post-hoc tamper-evidence is less load-bearing than detection latency.

It complements the four-layer threat taxonomy: that model groups threats by attack surface; this one groups runtime safety properties by failure mode. One places controls on a grid, the other audits whether the grid is load-bearing.

Key Takeaways¶

An agent runtime's load-bearing safety property is that the audit record matches what actually happened.
Four divergence modes — F1 gate-bypass, F2 audit-forgery, F3 silent host failure, F4 wrong-target — name the specific ways the audit can lie.
Each mode maps to existing site coverage: action-selector and MCP control plane for F1, hash-chained audit trail for F2, bootstrap and module signing for F3, egress and URL validation for F4.
Use the taxonomy as a review checklist, not a defense — name the chokepoint, integrity mechanism, liveness probe, and target validator for any runtime under review.
The framing assumes an audit worth defending; for single-user local runtimes, pure-text agents, and reversible-state systems, capability minimisation often beats divergence detection.

Four-Layer Taxonomy of Agent Security Risks — companion threat-surface layering; pair with this divergence-mode model
Cryptographic Governance Audit Trail — F2 control: hash-chained tamper-evident logs with ML-DSA receipts
Action-Selector Pattern — F1 control: deterministic execution from a fixed action catalog
MCP Runtime Control Plane — F1 control: single chokepoint for tool-call policy evaluation
Agent Network Egress Policy — F4 control: target validation at the network boundary
Tool Signing and Signature Verification — F3 control: module-level integrity for executing code