Skip to content

Lethal Trifecta Threat Model

Risk emerges when an agent has all three: access to private data, exposure to untrusted content, and the ability to communicate externally. Remove at least one from every execution path.

The Three Legs

The lethal trifecta (Willison, 2025) names three capabilities that together create an exploitable surface:

graph TD
    PD["1. Private Data Access"]
    UI["2. Untrusted Input"]
    EC["3. External Communication"]

    PD --- RISK["Exploitable<br/>Attack Surface"]
    UI --- RISK
    EC --- RISK

    style RISK fill:#b60205,color:#fff,stroke:#b60205
Leg What it means Examples
Private data Secrets, credentials, PII, or proprietary code .env files, DB connections, internal repos
Untrusted input Content the agent did not author and cannot fully trust PR comments, GitHub issues, fetched pages, dependencies
External communication Ability to send data outside the sandbox HTTP tools, MCP servers with outbound calls

LLMs cannot reliably distinguish trusted from injected instructions — once untrusted input enters context, it influences tool calls. The trifecta shifts defense from prompt-level mitigations to architecture.

Remove a Leg

Ensure no execution path has all three legs. Which leg to remove depends on the task:

Remove egress (most common for coding agents)

Default-deny outbound network. Most coding tasks need no network.

# Docker-based sandbox — no network
docker run --network none agent-image

Vendors ship this as a first-class deterministic control: OpenAI's Lockdown Mode limits outbound requests with no AI evaluation in the loop, removing the egress leg without relying on the model to police itself (Willison, 2026).

Remove private data access

Strip sensitive data before it enters context.

  • PII tokenization — replace real values with opaque tokens resolved in a trusted executor
  • Scoped credentials — short-lived, minimal-permission tokens injected at runtime
  • File exclusion.env, credentials, and key files excluded from agent-accessible paths

Remove untrusted input

Restrict the agent to operator-controlled content only — viable for internal automation but not for external code or user-generated content.

Design Patterns for Trifecta Mitigation

Six patterns (Beurer-Kellner et al., 2025) map to leg removal:

Pattern Leg removed Mechanism
Dual LLM Untrusted input Privileged LLM decides; quarantined LLM handles untrusted content
Action-Selector Untrusted input LLM picks from a fixed action set; injected instructions can't add new actions
Plan-Then-Execute Untrusted input Plan formed before untrusted content is seen; execution is deterministic
Context-Minimization Untrusted input Only minimum necessary untrusted content enters context
Code-Then-Execute Untrusted input LLM generates code; sandboxed runtime executes without LLM re-evaluation
LLM Map-Reduce Private data Each instance sees only a partition; no single instance has full data access

CaMeL (Debenedetti et al., 2025) enforces trifecta separation via control and data flow primitives, achieving 77% task completion with provable security.

Attack Chains

Poisoned dependency (Lynch / NVIDIA, 2025): Agent reads a GitHub issue referencing a malicious pip package, installs it (egress), package exfiltrates env vars (private data). Fix: remove egress.

Cross-agent privilege escalation (Embrace The Red, 2025): Compromised agent rewrites another agent's config to remove sandbox constraints, giving it all three legs. Fix: protect config from agent writes.

MCP tool exfiltration (Invariant Labs, 2025): Malicious MCP server shadows trusted tools, intercepts calls to access private context, forwards to an external endpoint. Fix: restrict MCP server egress.

Trifecta Audit Checklist

Map each execution path against the legs:

Execution path Private data? Untrusted input? Egress? Safe?
Code review agent Yes Yes (PR content) No Yes
Research agent No Yes (web) Yes Yes
Deployment agent with env vars Yes Yes (repo config) Yes No
Internal codegen Yes No Yes Yes

Three "Yes" values requires architectural mitigation.

Mandatory Sandbox Controls

Controls (Harang, 2025):

  • Network egress — default-deny with explicit allowlists
  • File system — block writes outside the workspace
  • Config protection — prevent modification of .cursorrules, CLAUDE.md, MCP configs
  • Secret injection — short-lived, minimal-permission tokens

When This Backfires

The trifecta model is a structural heuristic, not a guarantee:

  1. Leg removal is not always feasible. A research agent fetching live web content, holding API keys, and posting to external endpoints has all three legs by design. For unavoidable trifectas, add compensating controls — output scanning, rate-limiting, egress anomaly detection.

  2. Partial-leg states are underspecified. "Read-only egress" and "tokenized private data" sit between leg-present and leg-absent. Binary Yes/No columns produce false confidence when a leg is partially present.

  3. Leg removal migrates risk. Tokenizing PII shifts the attack to the token resolver; sandboxing egress shifts it to sandbox-escape. Each removal creates a new high-value target that must itself be hardened.

Feedback