Scope Sandbox Rules to Harness-Owned Tools, Not Third-Party¶
Define sandbox rules only for tools your harness controls, and document explicitly that external tools enforce their own guardrails.
The Boundary Problem¶
As agents gain access to more tools from multiple sources — built-in shell tools, MCP servers, user-provided tools — the temptation is to write a single blanket sandbox policy covering all of them. This creates a false security assumption: the blanket policy implies the harness enforces restrictions on tools it doesn't actually control.
Codex draws a clear boundary: the sandbox developer message describes restrictions only for the Codex-provided shell execution tool. MCP servers and user-provided tools are explicitly excluded from harness-level sandboxing. [Source: Unlocking the Codex Harness]
Why the Separation Matters¶
A harness can only enforce restrictions on tools it controls at the API level. A harness-owned shell tool wrapping docker sbx, for example, sits inside the sandbox boundary the harness defines; an MCP server invoked from the same agent does not. When an MCP server receives a call, the harness has already handed off execution. The MCP server processes the request according to its own logic and returns a result — the harness cannot intercept or modify this behavior at the sandbox layer.
If harness sandbox rules are written as if they apply to MCP tools, two problems follow:
- The model may believe the MCP tool is restricted by the sandbox rules, altering its behavior in ways the MCP tool doesn't expect
- Developers reviewing the harness may believe MCP tool behavior is sandboxed when it is not — a false sense of security
Implementation Pattern¶
Scope sandbox rules explicitly in the developer message:
SANDBOX RULES (applies to shell tool only):
- No network access from shell commands
- No writes outside /workspace
- No access to /etc, /home, or system directories
Note: MCP servers and user-provided tools operate under their own
authorization policies and are not subject to these sandbox rules.
The explicit note is auditable: any reviewer can see that MCP tools are excluded from harness-level sandboxing and know to check each MCP server's own guardrails separately. [Source: Unlocking the Codex Harness]
Per-Source Trust Boundaries¶
When building harnesses that compose heterogeneous tools, define trust boundaries per tool source. MCP server deployments span distinct trust contexts shaped by where the code originates (first-party, open source, third-party), where it executes, and which resources it can access. [Source: MCP Security — CoSAI OASIS]
| Tool Source | Who Enforces Guardrails |
|---|---|
| Harness-owned shell tool | Harness sandbox rules |
| First-party MCP server | MCP server's own policies |
| Third-party MCP server | Third-party's policies (audit separately) |
| User-provided tools | User's responsibility; document this explicitly |
This makes accountability visible at design time rather than discovered during an incident.
Auditing Third-Party MCP Tools¶
Third-party MCP servers require separate security review. The harness cannot be used as a proxy for trusting them. Questions to answer before deploying:
- What actions can this MCP server take?
- Does it have its own access controls and audit logging?
- What data does it access and what does it retain?
- What happens if the model is injected and calls this tool with crafted inputs across its tool-invocation attack surface?
Document the answers. The harness sandbox policy is not a substitute for this review.
When This Backfires¶
Explicit scoping is not a cure-all. Specific failure conditions:
- Exclusion confusion: Stating "sandbox rules apply to shell tool only" can leave the model uncertain whether MCP tools have any restrictions at all, leading it to invoke them in contexts where the absent policy would have said no. Pair the exclusion with a brief statement of what governs MCP calls (e.g., "MCP tools enforce their own authorization").
- False audit comfort: A visibly scoped sandbox policy can create the impression that security has been addressed because the boundary is documented. Reviewers may skip auditing each MCP server's guardrails, assuming the explicit exclusion signals that MCP security was considered. Documentation of a gap is not closure of it.
- Drift across tool upgrades: A harness-owned tool can be reimplemented as an MCP server (or vice versa) without updating the sandbox rules. The explicit scoping then misdescribes the current surface. Treat the "which tools the sandbox covers" list as part of tool registration, not a one-time doc edit.
Counterpoint: Gateway-Enforced Uniform Policy¶
The claim that a harness "cannot intercept" MCP calls holds for an in-process sandbox — once execution hands off to an MCP client, there is no interposition point. It does not hold for every architecture. A dedicated MCP gateway or proxy is a separate interposition point that all agent-to-server traffic crosses, evaluating each tools/call against a uniform policy and blocking violations before they reach the upstream server, without changes to the servers themselves. [Source: MCP and Zero Trust — Cerbos]
This relocates a trust boundary rather than removing it. A gateway enforces coarse uniform rules (which tools are callable, rate limits, taint tracking) while each MCP server still owns the fine-grained authorization the gateway cannot see. The page's warning stands: writing in-process shell-sandbox rules as if they cover MCP tools is still a mistake. The correction is that "harness-level" need not mean "shell-sandbox-level" — a proxy boundary is a legitimate place to enforce some cross-tool policy, as long as it is documented as a distinct boundary, not conflated with the shell sandbox.
Key Takeaways¶
- Harness sandbox rules only control tools the harness owns; MCP tools execute under their own policies
- Writing sandbox rules that imply coverage of MCP tools creates false security assumptions
- Explicitly document in the sandbox policy which tools are and are not covered
- Define trust boundaries per tool source and audit each source separately
- Third-party MCP servers require independent security review that harness policies cannot substitute for
Related¶
- Sandbox Runtime Comparison
- Tool-Invocation Attack Surface in Coding Agents
- Dual-Boundary Sandboxing
- Lethal Trifecta Threat Model for AI Agent Development
- Blast Radius Containment: Least Privilege for AI Agents
- Scoped Credentials via Proxy Outside the Agent Sandbox
- Action-Selector Pattern: LLM as Intent Decoder with Deterministic Execution
- Prompt Injection Threat Model