Skip to content

Silent-Failure Mechanism Taxonomy in Production Agent Runtimes

In unattended multi-component agent runtimes, classify silent failures by mechanism — not by location — so one defense covers every job at once.

A silent failure is one whose error signal never reaches a human in actionable form. An eight-week field study of one production personal-assistant runtime — 40 scheduled jobs, 8 providers, a tool-governance proxy, a memory plane, 4,286 unit tests, 827 governance checks — documented 22 incidents containing at least 28 silent-failure instances and proposed a five-mechanism cut: environment and platform quirks (A), design-assumption mismatch (B), error swallowing and dilution (C), chained hallucination and fabrication (D), operational omission and forensic blind spots (E) (Wu, arxiv 2606.14589).

When This Applies

The taxonomy is load-bearing only under three conditions:

  • Unattended runs. Silence spans of 13 hours to 60 days (Wu §Fig 4) are for scheduled jobs and memory-mediated chains, not interactive sessions where the next user utterance bounds the silence.
  • Multi-component runtime with seams. The longest-lived failures lived "in the seams between components, where no test runs" (Wu) — scheduler, memory store, governance proxy, providers. A monolithic harness has fewer seams.
  • A trace store and intervention path exist. Without telemetry sufficient to attribute a failure to a mechanism, the classes are unactionable; ship a two-axis run-vs-task dashboard first.

Deterministic CI, short interactive sessions, and single-agent harnesses without persistence don't pay off the overhead — the pre-completion checklist and loop detection primitives already cover them.

The Five Mechanisms

Class Mechanism Representative example
A Environment / platform quirk macOS TCC sandbox silently blocked an SSD backup; the 60-day-latency end of the distribution (Wu)
B Design-assumption mismatch Positional parsing of LLM output recurred across unrelated jobs; one key-based-parsing rule with a repo-wide scanner closed every instance (Wu §3.3)
C Error swallowing / dilution Errors captured into a log cache or summarised by an intermediate component before reaching any alert path (Wu)
D Chained hallucination / "fail-plausible" A Unicode-surrogate error was captured into a log cache; the downstream LLM composed a confident "Hugging Face platform crisis" analysis and pushed it to the user as routine analysis (Wu §D1)
E Operational omission / forensic blind spot A reserved-file mute in a logging path; no record existed for postmortem to consult (Wu)

Class D is the qualitative novelty. The other four are silent; in D "the LLM transforms it into fluent, plausible narrative delivered to the user" (Wu) — fluent misinformation instead of silence, a worse mode than no signal at all. A second logged example: a system alert persisted into chat history; hours later the model instructed the user to grant Full Disk Access to a cron binary in macOS System Preferences as fabricated remediation (Wu §D2).

Why It Works

Silent-failure mechanisms recur across unrelated jobs because they exploit generic agent-runtime invariants — LLM string output re-parsed downstream, error frames re-serialised through the model, governance checks gating the wrong layer. A mechanism-layer defense (a repo-wide key-based-parsing rule with a scanner; an explicit task-status artifact; an input-trust boundary around log-cache content) immunizes every location because every location traverses the same invariant. Location-axis attribution loses by construction: location is downstream of mechanism, so fixing one location leaves the same mechanism live elsewhere (Wu §3.3). Class D is acute — no "location" to fix, because the LLM constructs the plausible narrative from any contaminated input; the defense sits at the input-trust boundary, the discipline Anthropic names in building effective agents.

When This Backfires

  • Single-case-study generalisation. The study is n=1: "one system, one host OS, one operator pair, eight weeks" (Wu §8). The 5-class shape is plausible; the frequencies are not population estimates and the latency distribution is right-censored — failures silent at study end are absent by construction. Treat the five classes as a working enumeration to force attribution through, not a closed schema.
  • Operator-as-annotator confirmation bias. Wu reports "classification was performed by the system's two operators without independent annotation; we report no κ and acknowledge confirmation-bias risk". Independent annotation on a different runtime may yield a different cut.
  • Mechanism proliferation. A neighbouring entropy-principle paper argues silent failure is governed by a unified physical law (S(t) = S₀·e^(αt) across 22 intrinsic properties in 6 lifecycle layers), not a discrete mechanism set (Liu, arxiv 2606.08162). Stacking taxonomies — Wu's five classes, Li's six signals (Li et al., arxiv 2606.01365; cf. failure-aware observability), the entropy lens — multiplies vocabularies without adding defensive power. Pick the lens that maps to the next defense you can ship.
  • Detection-channel asymmetry undermines automation. ~70% of silent failures in Wu's study were caught by human observation, not by the 4,286 unit tests or 827 governance checks (Wu). A team that reads "5-class taxonomy" as "add 5 alert classes" and walks away has not ported the load-bearing finding. Retrospective audit showed 0% preventable ex-ante but 87% blockable as regressions — the win is mechanism-level regression scanners, not mechanism-level alerts.

Example

Take Class D1 above as the case. Two responses:

  • Location-axis. Fix the synthesis prompt to be sceptical of "platform crisis" claims. The next surrogate error in a different job reproduces the same mechanism as a different fabricated narrative — fabricated remediation, false software release, or fabricated success metric (Wu).
  • Mechanism-axis. Name the invariant: any error frame routed through an LLM context window without an explicit error-marker becomes raw narrative material. Tag error frames at the input boundary (structured envelope, not raw string capture) and add a repo-wide scanner that flags command-substitution captures of stderr into LLM-readable caches. One defense; every job covered.

Key Takeaways

  • Silent failures cluster into five mechanism classes — environment quirks, design-assumption mismatch, error swallowing, chained hallucination ("fail-plausible"), operational omission — drawn from an eight-week field study of a production runtime (Wu, arxiv 2606.14589).
  • Mechanism-axis attribution outperforms location-axis attribution under unattended multi-component runtimes: one defense at the invariant layer immunizes every location at once (Wu §3.3).
  • Class D — chained hallucination — is the qualitative novelty: the user receives fluent misinformation, not silence; the defense sits at the input-trust boundary, not in the output filter.
  • The study is n=1, operator-self-annotated, right-censored on latency — treat the five classes as a working enumeration to force attribution through, not a closed schema.
  • ~70% of silent failures were caught by human observation; 87% of incidents were retrospectively blockable as regressions, but 0% were preventable ex-ante. The win is mechanism-level regression scanners, not mechanism-level alerts.
  • Skip the overhead for deterministic CI, short interactive sessions, and single-agent harnesses without persistence — the existing pre-completion checklist and loop detection primitives already cover them.
Feedback