OpenAI Agents SDK Sandboxes Harness and Memory¶
The April 2026 OpenAI Agents SDK update ships three primitives — controlled sandboxes, an inspectable harness, and configurable memory — in one Python library.
What Shipped¶
OpenAI released the Agents SDK update on 2026-04-15, consolidating three primitives teams previously assembled themselves:
- A model-native harness — the control plane around the model
- Native sandbox execution — a compute plane for model-directed work
- Configurable memory — two systems (session and sandbox)
Python-first; TypeScript is planned.
Harness / Compute Separation¶
The SDK separates a persistent, trusted harness from an ephemeral, untrusted compute environment (concepts guide):
| Plane | Owns |
|---|---|
| Harness | Agent loop, model calls, tool routing, handoffs, approvals, tracing, recovery, run state |
| Compute | File reads/writes, command execution, dependency installs, mounted storage, exposed ports, state snapshots |
Colocation would let model-generated shell commands read loop credentials. Separation contains blast radius and enables snapshot/rehydrate: when a sandbox fails or expires, the SDK restores state in a fresh container from the last checkpoint.
graph LR
Loop[Harness] -->|Routes tool calls| Sandbox[Sandbox]
Sandbox -->|Results, snapshots| Loop
Loop -.->|Trusted state:<br/>credentials, approvals| Loop
Sandbox -.->|Untrusted execution:<br/>model-generated code| Sandbox
Sandbox Primitives¶
Sandbox execution is authored through SandboxAgent, Runner.run, and RunConfig. SandboxAgent keeps the standard agent surface (instructions, tools, handoffs, mcp_servers, guardrails, hooks) and adds a Manifest plus LocalDir mounts declaring workspace file access.
Sandbox clients are pluggable (reference):
UnixLocalSandboxClient— local filesystem, dev-only- Docker — stronger isolation, production parity
- Hosted providers — OpenAI partners with Cloudflare, Vercel, E2B, and Modal for container-based execution
The provider lives in RunConfig, not the agent — swap clients per environment while the agent, manifest, and capabilities stay stable.
Isolation caveat: partners ship containers (Modal uses gVisor). For cross-tenant threat models, container isolation is weaker than Firecracker microVMs — see Subprocess and PID-namespace sandboxing.
Harness Primitives¶
The harness standardises primitives previously bespoke per-agent (Help Net Security):
- Tool use via MCP
- Progressive disclosure via skills
- Custom instructions via
AGENTS.md - Code execution via a
shelltool - File edits via an
apply_patchtool - Compaction for long-running runs
Loop customisation is coarse. Runner manages turns, tools, guardrails, handoffs, and sessions — teams that want full loop control call the Responses API directly.
Memory: Two Systems¶
The SDK exposes two memory systems with distinct lifecycles. Confusing them is the most common mistake.
Session Memory¶
Conversation history with an explicit API (sessions guide):
add_items()— append messagesget_items()— retrieve historypop_item()— remove most recentclear_session()— wipe
After a non-streaming run, add_items() persists user input plus model outputs from the latest turn. Backends ship as first-class extras:
| Backend | Purpose |
|---|---|
SQLiteSession / AsyncSQLiteSession |
Local dev, single-server |
SQLAlchemySession |
Production — Postgres, MySQL, SQLite |
RedisSession |
Shared cache-backed session |
AdvancedSQLiteSession |
Branching, analytics, structured queries |
EncryptedSession |
At-rest encryption wrapper |
Sandbox Memory¶
Filesystem artifacts distilled from prior runs (agent memory guide). The workspace stores:
MEMORY.md— concise summary injected into later runsmemories/memory_summary.md— longer distilled lessonsraw_memories/— unprocessed notesworkspace/sessions/<rollout-id>.jsonl— rollout transcripts
The agent searches MEMORY.md for keywords and opens deeper rollout summaries only when needed — progressive disclosure inside the workspace.
Neither system replaces a dedicated long-term vector or graph store for cross-agent knowledge — pair with agent memory patterns for scope beyond a workspace.
When to Pick the SDK¶
Pick the SDK when:
- Python stack, no existing harness or sandbox investments
- Container isolation meets your threat model
- You accept an opinionated loop (
Runner) and memory schema (MEMORY.md, rollout summaries) - You want durable execution without writing it yourself
Skip the SDK when:
- You need TypeScript today
- You require microVM isolation for cross-tenant blast radius
- You need custom turn scheduling, non-standard handoffs, or heterogeneous model routing — call the Responses API directly
- You already run a self-hosted harness with verification or replay
Example¶
A SandboxAgent run with a SQLAlchemySession for conversation history and a Docker sandbox for execution. The harness routes the tool call; the sandbox runs shell and apply_patch against a manifested workspace.
from agents import Runner, RunConfig
from agents.sandbox import SandboxAgent, Manifest, LocalDir
from agents.extensions.memory import SQLAlchemySession
session = SQLAlchemySession.from_url(
"user-123",
url="postgresql+asyncpg://app:pw@db/agents",
create_tables=True,
)
agent = SandboxAgent(
name="refactor-bot",
instructions="Refactor the target module. Run tests after each change.",
manifest=Manifest(mounts=[LocalDir("./target", read_write=True)]),
# tools: shell + apply_patch are wired by the harness
)
result = await Runner.run(
agent,
input="Extract the auth middleware into its own module.",
session=session,
run_config=RunConfig(sandbox_client="docker"),
)
Swap sandbox_client="docker" for "unix_local" in dev or a hosted provider in production. The agent, manifest, and session stay stable.
Key Takeaways¶
- Three primitives in one Python SDK: model-native harness, native sandbox execution, configurable memory — shipped 2026-04-15
- Harness owns the trusted loop; sandbox owns untrusted execution — snapshot/rehydrate recovers from sandbox failure
- Two memory systems:
Sessionfor conversation history (SQLAlchemy, SQLite, Redis, encrypted), sandbox memory for filesystem-distilled lessons across runs - Harness primitives are opinionated (
shell,apply_patch,AGENTS.md, MCP, skills, compaction) — bypassRunnerfor custom loops - Container-level isolation via partner providers (Cloudflare, Vercel, E2B, Modal) — insufficient for threat models requiring microVMs
Related¶
- Sandbox Runtime Comparison — selection rubric across the OpenAI sandbox clients,
docker sbx, bubblewrap, and Seatbelt - Sandbox rules and harness tools
- Harness engineering
- Managed vs self-hosted harness
- Agent memory patterns
- Session harness sandbox separation
- Claude Agent SDK
- Copilot SDK