Featured image of post OpenSandbox: The Universal Sandbox Platform Every AI Agent Needs

OpenSandbox: The Universal Sandbox Platform Every AI Agent Needs

Alibaba open-sourced a general-purpose sandbox platform for AI applications — supporting Coding Agents, GUI Agents, RL Training and more, with multi-language SDKs and Docker/Kubernetes runtimes.

Part 1: Foundations — The Mental Model

Imagine you are an AI agent. You need to write code, run it, browse the web, interact with a desktop, maybe even train a model — all in a safe, isolated environment. The host system must not be affected, yet you need full power within the box.

That is exactly what OpenSandbox by Alibaba provides.

Mental Model: Think of OpenSandbox as a universal remote-controlled sandbox — a standardized socket into which any AI agent (Claude Code, Gemini CLI, LangGraph, Google ADK, etc.) can plug. The sandbox wraps Docker containers or Kubernetes pods and exposes one consistent API for creating environments, running commands, managing files, and interpreting code.

Instead of each AI framework inventing its own execution sandbox, OpenSandbox offers a single, open protocol that all of them can share.


Part 2: The Investigation — Architecture Deep Dive

The Layered Architecture

OpenSandbox is structured into clear layers, each solving one concern:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
┌────────────────────────────────────────────────────────────┐
│                 Multi-Language SDKs                        │
│    Python  │  JS/TS  │  Java/Kotlin  │  C#/.NET  │  Go*   │
└────────────────────┬───────────────────────────────────────┘
                     │ Sandbox Protocol (OpenAPI / OSEPs)
┌────────────────────▼───────────────────────────────────────┐
│                OpenSandbox Server                          │
│  (Sandbox lifecycle: create, start, pause, kill)           │
└──────┬──────────────────────────────────────────┬──────────┘
       │                                          │
┌──────▼──────┐                         ┌─────────▼──────────┐
│   Docker    │                         │   Kubernetes HPA   │
│  Runtime    │                         │ (high-perf runtime)│
└─────────────┘                         └────────────────────┘
       │                                          │
┌──────▼──────────────────────────────────────────▼──────────┐
│              Sandbox Environments                          │
│  Commands  │  Files  │  Code Interpreter  │  Browser  │ VNC│
└─────────────────────────────────────────────────────────────┘

(*Go SDK is on the roadmap)

Project Structure

DirectoryPurpose
sdks/Client SDKs (Python, JS/TS, Java, C#)
specs/OpenAPI + OSEP (OpenSandbox Enhancement Proposals)
server/The core sandbox server
kubernetes/Kubernetes runtime for distributed scheduling
components/execd/Execution daemon inside the sandbox container
components/ingress/Ingress gateway with multi-routing strategies
components/egress/Per-sandbox egress/network policy control
sandboxes/Pre-built sandbox images
examples/End-to-end integration examples

Sandbox Protocol (OSEPs)

OpenSandbox uses a formal proposal process called OSEP (OpenSandbox Enhancement Proposals) to evolve the platform. This is similar to PEPs in Python, keeping the protocol community-driven and well-documented. The protocol defines two classes of APIs:

  • Lifecycle APIs: create, start, pause, resume, kill → manages the sandbox container
  • Execution APIs: commands.run, files.write, files.read, codes.run → interacts with what’s inside

Security — Strong Isolation Options

This is where OpenSandbox stands apart from naive Docker-only sandboxes. It natively supports secure container runtimes:

  • gVisor — userspace kernel that intercepts system calls
  • Kata Containers — lightweight VMs with hardware isolation
  • Firecracker microVMs — ultra-fast micro-virtual machines (used by AWS Lambda)

Each provides progressively stronger isolation guarantees between sandbox workloads and the host.


Part 3: The Diagnosis — What It Does for Developers

Problem 1: Every AI Agent Framework Reinvents the Same Sandbox

Before OpenSandbox, if you wanted to run Claude Code, Gemini CLI, and LangGraph safely side-by-side, you would need three different sandbox integration layers. OpenSandbox unifies them under one protocol.

Problem 2: Scaling From Laptop to Kubernetes Is Hard

OpenSandbox’s Docker runtime is for local development. Its Kubernetes runtime (kubernetes/) handles distributed, large-scale scheduling of thousands of sandboxes — without changing a single line of your application code. The same SDK calls work locally and in production.

Problem 3: Multi-Language Teams Need Multi-Language SDKs

Currently supported SDKs:

LanguageStatus
Python✅ Stable
JavaScript / TypeScript✅ Stable
Java / Kotlin✅ Stable
C# / .NET✅ Stable
Go🔜 Roadmap

Real-World Use Cases

ScenarioExample
Coding AgentClaude Code, Gemini CLI, OpenAI Codex CLI
LLM WorkflowLangGraph state machines creating sandbox jobs
GUI AutomationHeadless Chrome + Playwright in a sandbox
Desktop EnvironmentVNC + full Linux desktop inside a container
Remote DevVS Code (code-server) serving from a sandbox
RL TrainingRun training episodes in isolated containers
Agent EvaluationReproducible, isolated eval environments

Part 4: The Resolution — How to Use OpenSandbox

Quickstart in 3 Steps

Step 1 — Install and configure the server

1
2
uv pip install opensandbox-server
opensandbox-server init-config ~/.sandbox.toml --example docker

Step 2 — Start the sandbox server

1
opensandbox-server

Step 3 — Create a sandbox and run code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
import asyncio
from datetime import timedelta
from code_interpreter import CodeInterpreter, SupportedLanguage
from opensandbox import Sandbox
from opensandbox.models import WriteEntry

async def main() -> None:
    # 1. Create a sandbox from a Docker image
    sandbox = await Sandbox.create(
        "opensandbox/code-interpreter:v1.0.1",
        entrypoint=["/opt/opensandbox/code-interpreter.sh"],
        env={"PYTHON_VERSION": "3.11"},
        timeout=timedelta(minutes=10),
    )

    async with sandbox:
        # 2. Run a shell command
        execution = await sandbox.commands.run("echo 'Hello OpenSandbox!'")
        print(execution.logs.stdout[0].text)   # Hello OpenSandbox!

        # 3. Write a file
        await sandbox.files.write_files([
            WriteEntry(path="/tmp/hello.txt", data="Hello World", mode=644)
        ])

        # 4. Read it back
        content = await sandbox.files.read_file("/tmp/hello.txt")
        print(f"Content: {content}")  # Content: Hello World

        # 5. Run Python code inside the sandbox
        interpreter = await CodeInterpreter.create(sandbox)
        result = await interpreter.codes.run(
            """
            import sys
            print(sys.version)
            result = 2 + 2
            result
            """,
            language=SupportedLanguage.PYTHON,
        )
        print(result.result[0].text)       # 4
        print(result.logs.stdout[0].text)  # 3.11.x

    # Sandbox auto-cleaned up

Integrating with a Coding Agent (Google ADK Example)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# examples/google-adk: use OpenSandbox as the tool backend for a Google ADK agent
from google.adk.tools import BaseTool
from opensandbox import Sandbox

class SandboxRunTool(BaseTool):
    async def run_in_sandbox(self, code: str) -> str:
        sandbox = await Sandbox.create("opensandbox/code-interpreter:v1.0.1")
        async with sandbox:
            interpreter = await CodeInterpreter.create(sandbox)
            result = await interpreter.codes.run(code, language=SupportedLanguage.PYTHON)
            return result.result[0].text

Running Claude Code or Gemini CLI in a Sandbox

1
2
3
4
5
# Clone the examples
git clone https://github.com/alibaba/OpenSandbox.git
cd OpenSandbox/examples/claude-code  # or gemini-cli, codex-cli, etc.

# Follow the README in each example directory

Each example ships with a Dockerfile and a startup script that drops the specified AI CLI tool inside a fully managed OpenSandbox environment.


Final Mental Model

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
┌────────────────────────────────────────────────────────────┐
│                        OpenSandbox                         │
│                                                            │
│  "A universal socket for AI agent execution"               │
│                                                            │
│  What it IS:                                               │
│  → Open protocol sandbox with lifecycle + execution APIs   │
│  → Multi-language SDKs (Python, JS, Java, C#)             │
│  → Docker local dev + Kubernetes production scaling        │
│                                                            │
│  What it SOLVES:                                           │
│  → Fragmented sandbox implementations per AI framework     │
│  → Unsafe code execution without isolation                 │
│  → Scaling from laptop to cloud without code changes       │
│                                                            │
│  What it ENABLES:                                          │
│  → Coding agents (Claude, Gemini, Codex)                   │
│  → GUI agents (Chrome, Playwright, VNC)                    │
│  → RL training + agent evaluation                          │
│  → Remote dev (VS Code inside a sandbox)                   │
│                                                            │
│  Isolation options: gVisor | Kata Containers | Firecracker │
└────────────────────────────────────────────────────────────┘

GitHub: alibaba/OpenSandbox
Docs: open-sandbox.ai

Made with laziness love 🦥

Subscribe to My Newsletter