Hybrid Deterministic + Semantic Authorization for Agent Tool Calls¶

Deterministic checks at the agent-tool layer cover structural attacks; a semantic task-to-tool matcher covers intent drift. The two attack classes are orthogonal.

Two Orthogonal Attack Classes¶

A compromised agent attacks tool calls along two independent dimensions:

Structural attacks — the call's form is wrong: tampered description, name swap (get_balance → transfer_amount), parameter mutation, falsified return.
Semantic attacks — the form is correct but the purpose is not: a typed call to a tool unrelated to the user's task — delete_repository during a documentation read.

Each layer passes the other's attack class. The CASA framework (El Helou et al., 2026) combines both at a zero-trust layer between agent and MCP server.

The Five Deterministic Checks¶

Each check is a binary comparison enforced before the call leaves the layer (El Helou et al., 2026, §III-A):

#	Check	What it compares	Attack it blocks
1	Tool Definition Integrity	Cached MCP tool definition vs. description served to the LLM	Description-injection rewriting tool semantics in-flight
2	Request Authorization	Whether the LLM response actually contained the tool call being executed	Autonomous calls fabricated by the runtime outside LLM reasoning
3	Action Alignment	Function name in LLM output vs. outgoing MCP call	Name swap (`get_balance` → `transfer_amount`)
4	Parameter Integrity	Parameter names and values in LLM output vs. outgoing call	Recipient/amount mutation between LLM and tool
5	Data Fidelity	Raw MCP response vs. content relayed back to the LLM	Result falsification or hidden-instruction injection on return

Checks 1, 3, 4, 5 are byte-level; check 2 is set membership. No LLM at decision time — failures reject deterministically.

graph LR
    LLM[LLM] -->|tool call| INT[Interception Layer]
    INT -->|checks 1-4 pass| MCP[MCP Server]
    MCP -->|response| INT
    INT -->|check 5 pass| LLM
    INT -.->|any check fails| BLOCK[Reject + audit log]
    style BLOCK fill:#b60205,color:#fff

The Semantic Layer: Task-to-Tool Matching¶

Structural checks pass a clean call to an unrelated tool. Two stages add intent matching (El Helou et al., 2026, §III-B):

Task extraction — emit a concise natural-language description of the user's objective from the conversation. Separating extraction from matching keeps raw conversation off the authorization server.
Task-tool matching — binary classify whether each requested tool aligns with the task. Mismatches deny; matches proceed to deterministic checks.

This extends Task-Based Access Control (TBAC) (El Helou et al., 2025) to multi-turn, handling "clarification exchanges, tangential information, and conversational drifts" without re-prompting.

What the Empirical Numbers Buy You¶

Accuracy on the extended ASTRA benchmark for Claude Sonnet 4.6 (§V):

Setting	Accuracy	F1	FPR	FNR
Single-turn (ASTRA)	0.98	0.98	2%	2%
Multi-turn (augmented)	0.97	0.97	2%	4%
Multi-turn (multi-agent)	0.92	0.92	10%	7%

Single-turn is near-deterministic. Multi-turn degrades: 7% FNR denies ~1 in 14 legitimate calls; 10% FPR over-allows on multi-agent traffic. The paper concludes semantic checks remain "insufficient for high-stakes, long-horizon autonomous tool use" (§VII) — design fallbacks.

When Hybrid Beats Deterministic-Only¶

The deterministic-only alternative pairs a Scoped Credentials Proxy with the Action-Selector pattern: pre-declared (URL, method) tuples and a fixed action set. For small action spaces this beats hybrid on latency, predictability, and FPR.

Hybrid earns its complexity only when both conditions hold:

Tool catalogue is large and dynamic — pre-declaring every (task, tool) pair is impractical.
Conversations are multi-turn with drifting per-turn tasks (interactive coding, research, support).

Where It Sits in the Defense Stack¶

The Lethal Trifecta Threat Model and Task Scope Security Boundary define the contracts; deterministic checks catch in-flight violations; the semantic matcher enforces task scope at runtime; the MCP Runtime Control Plane is the architectural slot.

Example¶

A coding agent has read access to github and db-readonly MCP servers. The user asks: "Summarize the last week's failing tests in the auth-service repo." The interception layer extracts: "Read CI test results from the auth-service repository."

The agent emits github.list_workflow_runs(repo="auth-service", status="failure"). Checks 1–4 pass; semantic match aligns; check 5 verifies the relayed content matches the raw response.

Now an injected instruction in a fetched issue body makes the LLM emit db-readonly.export_full_users_table(). Checks 1–4 still pass — structurally clean. The semantic matcher rejects: export_full_users_table does not align with "read CI test results". Only the semantic layer sees the drift.

When This Backfires¶

High-frequency tool use. Each decision adds an LLM round-trip for task extraction. AgentSpec-style declarative predicates run with millisecond-level overhead (Wang et al., 2025) vs. hundreds of ms per LLM call. Cache per turn or fall back to deterministic allowlists for hot paths.
Shared-failure mode. Same model class for policy and agent creates correlated weakness — a jailbreak misleading one may mislead both (§V). Use a different family for policy.
High FNR on critical paths. 7% multi-turn FNR is unacceptable for utility-critical workflows without a fallback (operator review or allowlist for known-good pairs).
PII in tasks. Summaries may carry PII to the auth server — encrypt at rest, minimise retention.

Key Takeaways¶

Structural and semantic attacks are orthogonal; covering only one leaves the other open.
The five deterministic checks are byte-level comparisons that block tampering without LLM overhead.
Semantic matching extends single-task TBAC to multi-turn by separating extraction from matching.
Multi-turn matching is meaningfully less reliable than single-turn (10% FPR / 7% FNR on multi-agent traffic) — design fallbacks.
Hybrid earns its complexity only when the tool catalogue is large/dynamic and conversations are multi-turn; for static action sets, deterministic allowlists are cheaper and tighter.