Skip to content

Agentic Detection and Response at the MCP Boundary

Instrument the MCP transport so agent reasoning, prompts, and tool calls become a runtime detection signal that endpoint tools cannot reconstruct.

Agentic Detection and Response (ADR) is the runtime monitoring half of MCP-mediated agent security. An in-process sensor captures prompts, reasoning, tool calls, and server responses. A two-tier detector then classifies each session as benign or compromised before the action chain completes. Uber's production deployment reports 97.2% precision at >10,000 agent sessions daily across 7,200+ hosts (Li et al., 2026).

ADR complements the MCP Runtime Control Plane. The control plane gates calls before execution. ADR observes the full trajectory, so detection can fire on signals the gate cannot evaluate in isolation.

Why endpoint tools miss MCP attacks

EDR sees file writes, process exec, and network connections, but not agent reasoning, prompts, or the causal chains linking intent to execution (Li et al., 2026). A compromised agent looks identical to a benign one at the OS layer: same process, same outbound HTTPS, same syscalls. MCP tool servers sit at the junction of protocol traffic, OS execution, and agent decision-making. Attacks live in the gaps between them.

How it works

graph LR
    A[Agent] -- prompt + reasoning --> S[ADR Sensor]
    S -- tool call --> M[MCP Server]
    M -- response --> S
    S -- session telemetry --> T1[Tier 1: Fast Triage]
    T1 -- suspicious --> T2[Tier 2: Context-Aware LLM Reasoning]
    T1 -- benign --> L[Log only]
    T2 -- alert --> R[Response]
    E[ADR Explorer] -. offline red-team .-> T1
    E -. offline red-team .-> T2

Three components compose the system (Li et al., 2026):

  • ADR Sensor captures telemetry at the MCP transport: prompts, reasoning, tool arguments, server responses, and the causal chains linking them. It closes the observability gap EDR cannot.
  • ADR Explorer runs offline red-teaming before deployment. It generates test cases that calibrate the detector against attacker behavior the production stream has not yet seen.
  • ADR Detector is the two-tier online classifier that reaches the 97.2% production precision above. Fast triage handles bulk traffic deterministically. Only suspicious sessions pay the cost of context-aware LLM reasoning, which amortizes inference cost across volume.

On the public ADR-Bench evaluation, the system reported zero false positives at 67% attack recall, a 2 to 4x F1 improvement over three baselines (Li et al., 2026).

Why it works

The mechanism is co-occurrence at a single transport. Prompt, reasoning, and tool call all pass through the MCP envelope in structured form. Instrumenting that boundary captures the causal chain EDR cannot reconstruct from OS-layer events (Li et al., 2026). The two-tier detector then makes the economics work: cheap triage for the bulk, context-aware LLM reasoning only for the suspicious tail. That amortization keeps a 10,000-sessions-per-day deployment affordable rather than "prohibitively expensive at scale" (Li et al., 2026).

When this backfires

The pattern is calibrated for enterprise deployments. Smaller or differently shaped workloads may pay net cost.

  • Small deployments and single-developer workflows. Value comes from enterprise scale (7,200+ hosts in the Uber report), where cross-host correlation improves detection quality (Li et al., 2026). For a single agent on one developer machine, sensor overhead adds cost without enabling cross-session signal. A policy gate alone fits better.
  • Tightly constrained tool catalogs. When tool sequences are fully enumerable, a Behavioral Firewall for Tool-Call Trajectories gives deterministic enforcement at ~2.2 ms per call. A probabilistic detector only generates alerts the firewall already blocks.
  • Slow-drift and memory-poisoning attacks. Triggered backdoors in agent memory or RAG stores activate only on attacker-chosen conditions. AgentPoison achieves >80% attack success with <1% benign-task degradation (Chen et al., arXiv:2407.12784), so per-session detectors that never observe the trigger miss them. Pair ADR with offline baselining or memory provenance; see Trojan Hippo: Dormant Memory Payloads.
  • No SOC capacity to action alerts. A ~97% precision detector at 67% recall still generates triage load. Teams without incident response accumulate unactioned alerts and devalue the signal.
  • Naive telemetry without two-tier amortization. Provenance-EDR systems have historically added up to 821% runtime overhead and 10x the industry-expected memory footprint per host (Dong et al., arXiv:2307.08349). Copying the sensor without the triage architecture reproduces that cost problem.

Trade-offs

Approach Pros Cons
ADR at MCP boundary Captures intent + execution causal chain; tiered inference cost; per-session online detection Cost amortisation requires enterprise volume; misses slow-drift attacks; needs SOC to action alerts
Policy gate only (control plane) Deterministic, low-overhead, fail-closed at the boundary No detection signal for novel patterns the policy does not encode
Trajectory firewall (pDFA) Sub-millisecond enforcement on stable tool catalogs Brittle when tool catalog or sequence space grows
EDR only Mature tooling and SOC integration Cannot see agent reasoning, prompts, or causal chains (Li et al., 2026)

Key Takeaways

  • The MCP transport is the architectural location where prompt, reasoning, and tool call co-occur — instrumenting it captures the causal chain endpoint tools cannot reconstruct.
  • The two-tier detector design is what makes runtime LLM-based detection affordable; cheap triage handles the bulk, context-aware reasoning runs only on the suspicious tail.
  • The pattern is enterprise-scale — the Uber deployment spans 7,200+ hosts. Below ~thousands of sessions per day with a SOC to action alerts, simpler policy gates or trajectory firewalls cover the same ground at lower cost.
  • ADR observes; it does not gate. Pair it with the MCP Runtime Control Plane to refuse risky calls and with the Action-Audit Divergence Taxonomy to validate that the telemetry it produces is itself trustworthy.
Feedback