# Landscape: AI-agent sandbox tools A broader survey than [`landscape-containerized-claude.md`](landscape-containerized-claude.md), which focused on Claude-Code-specific containerizers. This one covers general AI-agent sandbox / containment projects — some Claude-specific, some agent-agnostic, some hosted SaaS — and contrasts them with bot-bottle's design. Research conducted 2026-05-11. ## Summary Eight projects surveyed. None duplicate bot-bottle's combination of local Docker, declarative JSON manifest, per-agent egress allowlist via pipelock, and bottle/agent split. Two clusters stand out: - **Closest neighbours** — agent-safehouse and litterbox: local, single-user, thin wrappers over an existing OS primitive (`sandbox-exec`, Podman + Landlock). - **Different category** — tilde.run (hosted SaaS), boxlite and microsandbox (microVM libraries for platform builders), endo-familiar (capability-security paradigm, no OS isolation). The microVM cluster (matchlock, smolmachines, boxlite, microsandbox) is the most relevant for the v2 isolation discussion in [`stronger-isolation-alternatives.md`](stronger-isolation-alternatives.md): libkrun and Apple's Virtualization.framework have made local microVMs ergonomic enough that a `"runtime": "microvm"` option on a bottle is now plausible without a heavy stack. ## Per-project notes ### endo-familiar - **Source**: https://dcfoundation.io/containing-ai-agents-the-endo-familiar-demo/ ; https://github.com/endojs/endo - **License**: Apache 2.0 - **Isolation**: Object-capability runtime in Hardened JavaScript. Not OS-level — agents simply cannot reference resources they were not handed. - **Locality**: Local / decentralized; WebSocket relay for capability sharing across machines. - **Agent integration**: Agent-agnostic, demo only. - **Config**: Programmatic capability passing; "pet name" system for human-readable capability handles. - **Network policy**: Capability model is the policy; no allowlist or firewall. - **Maturity**: Research demo, Foresight Institute grant. Production use of `endo` is via Agoric and MetaMask, not as a containment tool. ### litterbox - **Source**: https://litterbox.work/ ; https://github.com/Gerharddc/litterbox - **License**: Apache 2.0 (~66 stars) - **Isolation**: Podman container on Linux + Wayland socket forwarding; optional Landlock LSM for filesystem restriction. - **Locality**: Local, Linux only. - **Agent integration**: Generic dev sandbox; works with any agent that runs inside the container. - **Config**: Interactive CLI wizard — `define` (Dockerfile template), `build` (prompts), `start` (launch). - **Network policy**: "Limited isolation by default" — no strict allowlist documented. - **Notable**: Per-key SSH agent confirmation dialogs. - **Maturity**: Early-stage, ~66 stars. ### agent-safehouse - **Source**: https://agent-safehouse.dev/ ; https://github.com/eugene1g/agent-safehouse - **License**: Apache 2.0 (~1,400 stars) - **Isolation**: macOS `sandbox-exec` (Seatbelt) profiles — kernel-level syscall interception, no container. - **Locality**: Local, macOS only. - **Agent integration**: Explicit multi-agent wrapper — Claude Code, OpenAI Codex, Gemini CLI, Cline, Aider. Usage: `safehouse claude --dangerously-skip-permissions`. - **Config**: Shell functions or custom `sandbox-exec` profile files; LLM-assisted profile generation supported. - **Network policy**: Not addressed. - **Maturity**: Active through March 2026. ### matchlock - **Source**: https://github.com/jingkaihe/matchlock - **License**: MIT (~574 stars, v0.2.10) - **Isolation**: MicroVMs — Firecracker on Linux, Apple Virtualization.framework on macOS. Transparent proxy via nftables DNAT (Linux) or gVisor userspace TCP/IP (macOS). - **Locality**: Local (Homebrew, .deb, .rpm). - **Agent integration**: Agent-agnostic; SDK examples for Anthropic Claude API and OpenAI. Go, Python, TypeScript SDKs. - **Config**: CLI flags (`--allow-host`, `--secret`, `--no-network`) or SDK builder pattern. No manifest file. - **Network policy**: Default-deny + per-host allowlist. - **Notable**: Secrets injected in-flight by the host proxy — they never enter the VM. - **Maturity**: Marked experimental. ### tilde.run - **Source**: https://tilde.run/ - **License**: Proprietary, hosted SaaS. - **Isolation**: Cloud-hosted containers; underlying mechanism not publicly stated (unverified whether OCI containers or microVMs). - **Locality**: Hosted only. - **Agent integration**: Claude orchestration explicit; CLI (`tilde exec`) and Python SDK; plain-English agent instructions. - **Config**: DSL for RBAC policies (allow / deny / require human approval per action, per repo, per agent). - **Network policy**: Default-deny with per-request logging; cloud metadata endpoints and private networks blocked. - **Persistence**: All changes versioned and rollback-able via lakeFS; atomic commits per run. - **Maturity**: Private preview, © 2025, built by the lakeFS team. ### boxlite - **Source**: https://boxlite.ai/ ; https://github.com/boxlite-ai/boxlite - **License**: Apache 2.0 (~4,700 stars, YC-backed) - **Isolation**: MicroVMs with dedicated Linux kernel per box — KVM on Linux, Hypervisor.framework on macOS. Not containers/namespaces. - **Locality**: Local, no daemon. - **Agent integration**: Explicitly targets AI agents; MCP server companion (boxlite-ai/boxlite-mcp). Pivoted from dev environments in 2025. - **Config**: SDK only — Python, Node.js, Rust, C; Go pending. No declarative manifest. - **Network policy**: "Isolated Network per VM" — details not public *(unverified)*. - **Notable**: Sub-50ms boot, snapshot / fork / clone of VM state. Self description: "the SQLite of sandboxing". - **Maturity**: Active, YC. ### microsandbox - **Source**: https://github.com/microsandbox/microsandbox (the `superradcompany/microsandbox` URL redirects to the same project). - **License**: Apache 2.0 (~6,000 stars, YC-backed) - **Isolation**: MicroVMs via libkrun, OCI-compatible images. Sub-100ms boot, rootless, no daemon, embeddable as a library. - **Locality**: Local. - **Agent integration**: Explicit Claude Code + Cursor targeting via "Agent Skills" packages and an MCP server. Agents can create their own sandboxes programmatically. - **Config**: CLI (`msb`), SDKs (Rust, Python, TypeScript), MCP server. - **Network policy**: Not detailed in public docs. - **Maturity**: Beta, breaking changes expected; most-starred project in this set. ### smolmachines - **Source**: https://smolmachines.com/ ; https://github.com/smol-machines/smolvm - **License**: Apache 2.0 (~3,100 stars) - **Isolation**: MicroVMs via libkrun — Hypervisor.framework on macOS, KVM on Linux. No shared kernel. - **Locality**: Local, no daemon. - **Agent integration**: Includes an `AGENTS.md`; designed with coding agents in mind but no MCP/Skills turnkey integration. - **Config**: TOML Smolfiles declaring image, networking, volumes, SSH agent access, GPU acceleration. Portable `.smolmachine` files. - **Network policy**: Off by default; per-host allowlist via `--allow-host`. - **Persistence**: Named machines persistent by default; ephemeral runs also supported. - **Maturity**: Active through April 2026. ## Comparison table | Axis | bot-bottle | endo-familiar | litterbox | agent-safehouse | matchlock | tilde.run | boxlite | microsandbox | smolmachines | |---|---|---|---|---|---|---|---|---|---| | Isolation | Docker + internal net + pipelock; gVisor if present | Object-capability (no OS isolation) | Podman + opt. Landlock | macOS `sandbox-exec` | MicroVM (Firecracker / Virt.fw) | Hosted container (unverified) | MicroVM (KVM / Hypervisor.fw) | MicroVM (libkrun) | MicroVM (libkrun / KVM) | | Local vs hosted | Local | Local | Local (Linux) | Local (macOS) | Local | Hosted SaaS | Local | Local | Local | | Open source | Apache 2.0 | Apache 2.0 | Apache 2.0 | Apache 2.0 | MIT | No | Apache 2.0 | Apache 2.0 | Apache 2.0 | | Agent target | Claude Code | Generic (demo) | Generic | Multi-agent wrapper | Generic (+ Claude/OpenAI SDKs) | Claude focus | Generic | Claude + Cursor (MCP/Skills) | Generic (AGENTS.md) | | Network policy | Default-deny via pipelock + per-bottle allowlist + DLP | Capability model only | Limited | Not addressed | Default-deny + allowlist + secret-injecting proxy | Default-deny + logging | Per-VM net (unverified) | Not documented | Off by default + allowlist | | Parallel agents | Yes (one bottle per agent) | n/a | Not addressed | One at a time | Multiple VMs | Yes (dashboard) | SDK-level | SDK-level | Architectural | | Config | JSON manifest (bottles + agents) | Programmatic refs | CLI wizard | Profile files / shell fns | CLI / SDK | DSL + CLI + SDK | SDK | CLI / SDK / MCP | TOML Smolfile | | Maturity | Active May 2026 | Research (2022+) | Early (~66 ⭐) | Active (~1.4k ⭐) | Experimental (~574 ⭐) | Private preview | YC, ~4.7k ⭐ | YC, ~6k ⭐, beta | ~3.1k ⭐ | ## What's closest, what's different **Closest in design and scope.** agent-safehouse and litterbox sit nearest bot-bottle: local, single-user, thin wrappers over an existing OS primitive, low-dep. The split is the isolation primitive — bot-bottle uses Docker + pipelock egress (plus gVisor where available); agent-safehouse uses `sandbox-exec`; litterbox uses Podman + Landlock. matchlock and smolmachines are spiritually close on the *policy* side (default-deny net, per-host allowlist) but use microVMs instead of containers. **Solving a different problem.** tilde.run is hosted SaaS for team / production agent pipelines with data-versioned rollback — explicitly opposite to bot-bottle's "infrastructure I control" goal. boxlite and microsandbox are infrastructure libraries aimed at platform builders embedding sandboxes into agent frameworks; they would be a *backend* bot-bottle could call, not a competitor to its manifest layer. endo-familiar is in a different paradigm entirely: capability passing rather than kernel boundaries. ## Borrowable ideas What bot-bottle already has that the survey suggested as differentiators: - Default-deny egress with a per-agent allowlist (pipelock). - DLP scanning of outbound traffic. - Bottle / agent split (manifest layer above the isolation primitive). - gVisor auto-detection on Linux. Ideas worth considering, without abandoning the Python-stdlib-first / local-Docker stance: 1. **Per-use SSH key confirmation** (from litterbox). Even with KnownHostKey pinning and pipelock egress, a wrapper SSH agent that prompts on each key use (e.g. via `osascript` / `notify-send`) would catch an agent doing something off-policy with a key it legitimately holds. Pure-stdlib, no new deps. 2. **In-flight secret injection** (from matchlock). Pipelock already does egress allowlisting and DLP; teaching it to *inject* tokens at proxy time so e.g. `GITEA_TOKEN` never appears in the container's env would close the "agent reads its own env and exfiltrates" path. Fits the existing pipelock architecture. 3. **MicroVM backend as an opt-in bottle type** — already on the radar in `stronger-isolation-alternatives.md`. microsandbox, smolmachines, and matchlock all show that libkrun + Apple's Virtualization.framework is ergonomic enough that a `"runtime": "microvm"` field on a bottle is plausible without a heavy stack. Not worth borrowing: the SDK-first programmatic API style of boxlite / microsandbox (cuts against the declarative-manifest stance), and the hosted-SaaS dashboard model of tilde.run (cuts against the "infrastructure I control" goal). ## Caveats - Star counts and last-commit dates are point-in-time snapshots. - Several projects' network and persistence behaviour is not documented publicly; items so derived are marked *(unverified)*. - The `superradcompany/microsandbox` URL in the original prompt redirects to `microsandbox/microsandbox`; the surveyed project is the same.