Prompt-Only Tool Access Control¶

A system-prompt "do not call this tool" cuts unauthorized invocation by only 11–18 points; stripping it from context and re-checking calls drives it to 0%.

Prompt-only tool access control restricts which tools an agent may invoke by adding instructions to the system prompt — "do not call delete_repo", "only use the read-only API" — while the full tool catalog stays visible in the model's context. Across 150 adversarial tasks on Qwen 2.5 7B, Llama 3.1 8B, and Claude Haiku 3.5, this cut the Unauthorized Invocation Rate (UIR) by only 11–18 percentage points; a governed MCP proxy doing ABAC at discovery and invocation drove UIR to 0% with under 50 ms latency (Uppala 2026).

Why it fails¶

The system prompt is data, not enforcement. Models "cannot distinguish between instructions of different privilege levels" (Willison 2025), so a developer's "do not pick this token" competes with every other signal, including injected instructions in fetched documents. Microsoft's Agent Governance Toolkit measures the gap: 26.67% policy-violation under prompt-only controls, 0.00% under deterministic application-layer enforcement (agent-governance-toolkit). CaMeL agrees: moving control and data flow into a deterministic policy layer gives provable security on 77% of AgentDojo tasks where the undefended baseline gives none (Debenedetti et al. 2025).

Why it works (the architectural fix)¶

The fix removes the choice rather than asking the model to refuse it. A governed proxy enforces ABAC at two points:

Discovery — unauthorized tools are filtered out of the list the model receives. There is no token to select (Uppala 2026, §3).
Invocation — every outgoing call is re-checked against the same policy and rejected before reaching the MCP server.

Causality runs policy → enforcement, not instruction → model compliance → enforcement — the loop that adversarial context breaks.

graph LR
    P[Policy] --> D[Discovery filter]
    P --> I[Invocation check]
    D -->|filtered tool list| LLM
    LLM -->|tool call| I
    I -->|allow| T[MCP Server]
    I -.->|deny| X[Reject + audit]

When this backfires¶

The architectural fix is not always necessary or sufficient.

Tiny, fixed action set. A chatbot with three read-only tools wired through an action-selector pattern can match the proxy's UIR with a small system prompt, so a gateway is over-engineering.
Latency-critical hot paths. Full-featured gateways add 100–300 ms per call; at 20 calls per workflow that compounds to 2+ seconds (Composio 2026).
Off-protocol calls. A proxy enforces only what traverses it. Shell, raw HTTP, and non-MCP channels bypass it entirely (Security Boulevard 2026).
Single point of failure. One broker concentrates outage and compromise surface, so replicate it and keep credentials out of the proxy.
Scope of the 11–18 pp figure. Uppala tested "explicitly instructed otherwise" restrictions; constitutional schemas and tool-call output validation are different mechanisms and were not ablated.

Example¶

Before — prompt-only restriction (leaks under adversarial context):

# system prompt
You are a code review agent. You have access to: read_file, list_files,
post_comment, delete_repo, transfer_ownership, exfiltrate_secrets.

IMPORTANT: NEVER call delete_repo. NEVER call transfer_ownership.
NEVER call exfiltrate_secrets. These are not for your use.

The model sees all six tool definitions. Uppala's adversarial cases — including indirect prompt injection in a fetched PR description — bypass this restriction in 11–18% of attempts, depending on the model (Uppala 2026, §5).

After — architectural enforcement at the proxy:

# proxy policy (ABAC)
principal: code-review-agent
allowed_tools: [read_file, list_files, post_comment]
# discovery filter: only the three allowed tools are sent to the model
# invocation check: any call to delete_repo / transfer_ownership /
#                   exfiltrate_secrets is rejected before reaching the MCP server

The model never sees the dangerous tools at discovery. If an injection convinces it to fabricate the call anyway, the invocation check rejects it deterministically.

Key Takeaways¶

"Do not call this tool" reduces unauthorized invocation by only 11–18 percentage points across three model classes; the same architectural proxy drives it to 0%.
The mechanism is removing the choice (discovery filter) and verifying the call (invocation check) — not improving the instruction.
Prompt-only enforcement and adversarial context share a failure mode: the model treats both as data of equal privilege.
A tiny enumerable action space can make a proxy unnecessary; a large dynamic catalog or multi-tenant tool surface makes it unavoidable.
A proxy enforces only what traverses it — pair with sandboxing and egress policy to cover shell, HTTP, and non-MCP channels.