In-Process WebAssembly Sandboxes for Agent-Generated Code¶
Run untrusted agent-generated code inside a WebAssembly runtime embedded in the host process — deny-by-default I/O, fuel-bounded CPU, capped memory, host-function interop.
When This Pattern Fits¶
WebAssembly fills the in-process slot in the sandbox-runtime trade-space (containers vs microVMs vs OS-level isolators): no container daemon, no hypervisor, no separate process — the sandbox is a library the host application instantiates. It applies when a Python or JavaScript application needs to execute agent- or LLM-generated code without a container runtime, on targets that cannot or should not run Docker / a hypervisor (developer laptops, edge nodes, hosted SaaS workers), and where the threat model is "buggy or compromised guest code", not a hostile tenant with hardware access (see When This Backfires).
The substrate works because WebAssembly was designed for executing untrusted code: bytecode has no syscalls, all I/O is routed through host-supplied imports, and the type system is verified at load time. The runtime ships as a maintained PyPI package (wasmtime) with binary wheels — no toolchain build required.
The Four Controls¶
A usable WASM sandbox composes four runtime-enforced controls. The names below match the wasmtime API; engines like wasmer expose the same shapes.
CPU Bound — Fuel or Epochs¶
Wasmtime offers two interruption mechanisms (Interrupting Wasm Execution):
- Fuel —
config.consume_fuel(true)plusstore.set_fuel(N). Each instruction consumes fuel; exhaustion raisesTrap::OutOfFuel. Deterministic — same program with same fuel traps at the same instruction. Trade-off: higher per-instruction overhead than epochs, and fuel units do not translate cleanly to wall-clock time (Simon Willison's notes: "the units are hard to reason about"). - Epochs —
config.epoch_interruption(true)plus a host-side timer that increments the epoch. Around 10% guest slowdown per the docs. Non-deterministic — wall-time based; the same input may trap at different points across runs.
Fuel for reproducibility (eval suites, replays); epochs for production wall-clock budgets.
Memory Cap¶
Wasmtime exposes per-store memory limits as a first-class option. Without an explicit cap a loop like s = ""; while True: s += "longer" exhausts host memory; with a cap, the guest traps inside the runtime (reproduced in Willison's CLI demo).
Filesystem and Network — Deny by Default¶
WebAssembly has no ambient I/O. WASI capabilities (file descriptors, network sockets) must be explicitly handed to the instance at construction. The default posture is no filesystem and no network — a stronger default than dual-boundary sandboxing, where both boundaries must be configured to be enforced.
Host-Function Interop¶
The escape hatch is explicit host imports. Each function the guest can call is registered on the embedder side; everything else is unreachable. Per the Wasmtime security contract: "Use of a WASI resource without having been given the associated WASI capability" is a security vulnerability — the embedder, not the guest, decides what gets exposed.
Persistent Interpreter State¶
A naive WASM-Python embedding starts a fresh interpreter per call, losing variables and imports. The persistent-session technique runs the guest interpreter in a host thread that blocks on a __session_next__() host function, feeds successive code blocks through eval(), and returns results via __session_result__(). Variables stay resident across calls without restarting the WASM module (implementation walkthrough). The same shape applies to QuickJS-in-WASM where a code-execution tool wants REPL-like persistence inside an isolated session.
Why It Works¶
WebAssembly was designed as an abstract machine for untrusted code: no syscalls in the bytecode, I/O routed through host-supplied imports, and a verified-at-load-time type system that prevents memory corruption inside the guest. Per the Wasmtime security policy: "Anything that undermines the Wasm execution sandbox is a security vulnerability" — including denial-of-service via uninterruptible loops and user-controlled memory exhaustion. That contract makes the runtime owners, not the embedding application, responsible for closing escape and exhaustion paths.
The substrate beats V8-in-Python alternatives because browser JS engines, while battle-tested, are not designed for easy embedding; most V8-in-Python projects "are infrequently maintained and come with warnings not to use them with completely untrusted code" (Willison, 2026).
When This Backfires¶
- Hostile-multi-tenant production on shared hardware. A wasmtime CVE in the VMM, compiler backend, or WASI layer becomes a same-host escape across tenants. GHSA-2r75-cxrj-cmph (May 2026) — a WASI
path_open(TRUNCATE)bypass ofFilePerms::WRITE— is the reminder. Hostile-tenant workloads warrant a hypervisor boundary (Firecracker microVMs) rather than in-process isolation. - Workloads needing the full CPython ecosystem. MicroPython runs a small subset of the standard library. Pyodide handles more of CPython but is documented as "browser or Node.js" only on the server side (Pyodide guidance, Oct 2024); server-side embedders cannot use it today.
- Alpha reference implementations. The motivating package — micropython-wasm — is self-described as alpha and "vibe-coded"; the author "deliberately slapped an alpha release version on it" and is "not ready to recommend it to anyone who isn't willing to take a significant risk." Adopt the pattern, not this specific package, for anything load-bearing. Wait for an audited implementation or accept the maturity risk explicitly.
- Wall-clock CPU budgets without measurement. Fuel is deterministic per program but not per wall-time. Teams needing wall-clock bounds layer epoch interruption on top of fuel, or run calibration sweeps before fixing a budget.
- Agents reasoning around the substrate. No runtime stops a capable agent from finding alternative execution paths through host functions you expose (the sandbox illusion). Audit the host-function surface as carefully as the sandbox itself.
Example¶
A Python application embeds MicroPython-in-WASM via micropython-wasm (alpha — illustrative of the API shape, not a production recommendation):
from micropython_wasm import MicroPythonSession
with MicroPythonSession() as session:
print(session.run("x = 10\nprint(x)").stdout)
print(session.run("x += 5\nprint(x)").stdout)
print(session.run("print(x * 2)").stdout)
Each session.run() executes inside the WASM sandbox; variables persist across calls via the host-function session protocol. A runaway loop traps out instead of hanging the host:
$ uvx micropython-wasm -c 's = ""; while True: s += "longer"'
micropython-wasm: guest exited with code 1
The host sees a clean error; the host process is unaffected. The same pattern applies to a JavaScript-guest sandbox (QuickJS-in-WASM) embedded in a Node.js agent harness.
Key Takeaways¶
- WASM is the in-process substrate slot in the sandbox-runtime trade-space — distinct from containers, microVMs, and OS-level isolators
- Four runtime-enforced controls compose the sandbox: CPU (fuel or epochs), memory cap, deny-by-default WASI, explicit host imports
- Fuel is deterministic but not wall-clock; epochs are wall-clock but non-deterministic — pick per workload, or layer them
- Persistent interpreter state across calls is a host-function-mediated session pattern, not a WASM-level feature
- Not a fit for hostile-multi-tenant workloads on shared hardware — hypervisor isolation (microVMs) remains the right floor there
- Reference implementations in 2026 are alpha; adopt the pattern, audit the host-function surface, and watch for a production-grade embedder before staking high-stakes workloads on it