Hybrid Deterministic + Semantic Authorization for Agent Tool Calls¶
Deterministic checks at the agent-tool layer cover structural attacks; a semantic task-to-tool matcher covers intent drift. The two attack classes are orthogonal.
Two Orthogonal Attack Classes¶
A compromised agent attacks tool calls along two independent dimensions:
- Structural attacks — the call's form is wrong: tampered description, name swap (
get_balance→transfer_amount), parameter mutation, falsified return. - Semantic attacks — the form is correct but the purpose is not: a typed call to a tool unrelated to the user's task —
delete_repositoryduring a documentation read.
Each layer passes the other's attack class. The CASA framework (El Helou et al., 2026) combines both at a zero-trust layer between agent and MCP server.
The Five Deterministic Checks¶
Each check is a binary comparison enforced before the call leaves the layer (El Helou et al., 2026, §III-A):
| # | Check | What it compares | Attack it blocks |
|---|---|---|---|
| 1 | Tool Definition Integrity | Cached MCP tool definition vs. description served to the LLM | Description-injection rewriting tool semantics in-flight |
| 2 | Request Authorization | Whether the LLM response actually contained the tool call being executed | Autonomous calls fabricated by the runtime outside LLM reasoning |
| 3 | Action Alignment | Function name in LLM output vs. outgoing MCP call | Name swap (get_balance → transfer_amount) |
| 4 | Parameter Integrity | Parameter names and values in LLM output vs. outgoing call | Recipient/amount mutation between LLM and tool |
| 5 | Data Fidelity | Raw MCP response vs. content relayed back to the LLM | Result falsification or hidden-instruction injection on return |
Checks 1, 3, 4, 5 are byte-level; check 2 is set membership. No LLM at decision time — failures reject deterministically.
graph LR
LLM[LLM] -->|tool call| INT[Interception Layer]
INT -->|checks 1-4 pass| MCP[MCP Server]
MCP -->|response| INT
INT -->|check 5 pass| LLM
INT -.->|any check fails| BLOCK[Reject + audit log]
style BLOCK fill:#b60205,color:#fff
The Semantic Layer: Task-to-Tool Matching¶
Structural checks pass a clean call to an unrelated tool. Two stages add intent matching (El Helou et al., 2026, §III-B):
- Task extraction — emit a concise natural-language description of the user's objective from the conversation. Separating extraction from matching keeps raw conversation off the authorization server.
- Task-tool matching — binary classify whether each requested tool aligns with the task. Mismatches deny; matches proceed to deterministic checks.
This extends Task-Based Access Control (TBAC) (El Helou et al., 2025) to multi-turn, handling "clarification exchanges, tangential information, and conversational drifts" without re-prompting.
What the Empirical Numbers Buy You¶
Accuracy on the extended ASTRA benchmark for Claude Sonnet 4.6 (§V):
| Setting | Accuracy | F1 | FPR | FNR |
|---|---|---|---|---|
| Single-turn (ASTRA) | 0.98 | 0.98 | 2% | 2% |
| Multi-turn (augmented) | 0.97 | 0.97 | 2% | 4% |
| Multi-turn (multi-agent) | 0.92 | 0.92 | 10% | 7% |
Single-turn is near-deterministic. Multi-turn degrades: 7% FNR denies ~1 in 14 legitimate calls; 10% FPR over-allows on multi-agent traffic. The paper concludes semantic checks remain "insufficient for high-stakes, long-horizon autonomous tool use" (§VII) — design fallbacks.
When Hybrid Beats Deterministic-Only¶
The deterministic-only alternative pairs a Scoped Credentials Proxy with the Action-Selector pattern: pre-declared (URL, method) tuples and a fixed action set. For small action spaces this beats hybrid on latency, predictability, and FPR.
Hybrid earns its complexity only when both conditions hold:
- Tool catalogue is large and dynamic — pre-declaring every (task, tool) pair is impractical.
- Conversations are multi-turn with drifting per-turn tasks (interactive coding, research, support).
Where It Sits in the Defense Stack¶
The Lethal Trifecta Threat Model and Task Scope Security Boundary define the contracts; deterministic checks catch in-flight violations; the semantic matcher enforces task scope at runtime; the MCP Runtime Control Plane is the architectural slot.
Example¶
A coding agent has read access to github and db-readonly MCP servers. The user asks: "Summarize the last week's failing tests in the auth-service repo." The interception layer extracts: "Read CI test results from the auth-service repository."
The agent emits github.list_workflow_runs(repo="auth-service", status="failure"). Checks 1–4 pass; semantic match aligns; check 5 verifies the relayed content matches the raw response.
Now an injected instruction in a fetched issue body makes the LLM emit db-readonly.export_full_users_table(). Checks 1–4 still pass — structurally clean. The semantic matcher rejects: export_full_users_table does not align with "read CI test results". Only the semantic layer sees the drift.
When This Backfires¶
- High-frequency tool use. Each decision adds an LLM round-trip for task extraction. AgentSpec-style declarative predicates run with millisecond-level overhead (Wang et al., 2025) vs. hundreds of ms per LLM call. Cache per turn or fall back to deterministic allowlists for hot paths.
- Shared-failure mode. Same model class for policy and agent creates correlated weakness — a jailbreak misleading one may mislead both (§V). Use a different family for policy.
- High FNR on critical paths. 7% multi-turn FNR is unacceptable for utility-critical workflows without a fallback (operator review or allowlist for known-good pairs).
- PII in tasks. Summaries may carry PII to the auth server — encrypt at rest, minimise retention.
Key Takeaways¶
- Structural and semantic attacks are orthogonal; covering only one leaves the other open.
- The five deterministic checks are byte-level comparisons that block tampering without LLM overhead.
- Semantic matching extends single-task TBAC to multi-turn by separating extraction from matching.
- Multi-turn matching is meaningfully less reliable than single-turn (10% FPR / 7% FNR on multi-agent traffic) — design fallbacks.
- Hybrid earns its complexity only when the tool catalogue is large/dynamic and conversations are multi-turn; for static action sets, deterministic allowlists are cheaper and tighter.
Related¶
- Treat Task Scope as a Security Boundary
- Action-Selector Pattern: LLM as Intent Decoder with Deterministic Execution
- Tool-Invocation Attack Surface
- MCP Runtime Control Plane: Policy Evaluation Between Agent and Tool
- Scoped Credentials via Proxy Outside the Agent Sandbox
- Lethal Trifecta Threat Model
- Mid-Trajectory Guardrail Selection for Multi-Step Tool Calls
- Defense-in-Depth Agent Safety