c08b09dc9f
Assisted-by: Codex
233 lines
12 KiB
Markdown
233 lines
12 KiB
Markdown
# Landscape: AI-agent sandbox tools
|
|
|
|
A broader survey than [`landscape-containerized-claude.md`](landscape-containerized-claude.md),
|
|
which focused on Claude-Code-specific containerizers. This one covers
|
|
general AI-agent sandbox / containment projects — some Claude-specific,
|
|
some agent-agnostic, some hosted SaaS — and contrasts them with
|
|
bot-bottle's design.
|
|
|
|
Research conducted 2026-05-11.
|
|
|
|
## Summary
|
|
|
|
Eight projects surveyed. None duplicate bot-bottle's combination of
|
|
local Docker, declarative JSON manifest, per-agent egress allowlist via
|
|
pipelock, and bottle/agent split. Two clusters stand out:
|
|
|
|
- **Closest neighbours** — agent-safehouse and litterbox: local,
|
|
single-user, thin wrappers over an existing OS primitive
|
|
(`sandbox-exec`, Podman + Landlock).
|
|
- **Different category** — tilde.run (hosted SaaS), boxlite and
|
|
microsandbox (microVM libraries for platform builders), endo-familiar
|
|
(capability-security paradigm, no OS isolation).
|
|
|
|
The microVM cluster (matchlock, smolmachines, boxlite, microsandbox) is
|
|
the most relevant for the v2 isolation discussion in
|
|
[`stronger-isolation-alternatives.md`](stronger-isolation-alternatives.md):
|
|
libkrun and Apple's Virtualization.framework have made local microVMs
|
|
ergonomic enough that a `"runtime": "microvm"` option on a bottle is now
|
|
plausible without a heavy stack.
|
|
|
|
## Per-project notes
|
|
|
|
### endo-familiar
|
|
- **Source**: https://dcfoundation.io/containing-ai-agents-the-endo-familiar-demo/ ; https://github.com/endojs/endo
|
|
- **License**: Apache 2.0
|
|
- **Isolation**: Object-capability runtime in Hardened JavaScript. Not
|
|
OS-level — agents simply cannot reference resources they were not
|
|
handed.
|
|
- **Locality**: Local / decentralized; WebSocket relay for capability
|
|
sharing across machines.
|
|
- **Agent integration**: Agent-agnostic, demo only.
|
|
- **Config**: Programmatic capability passing; "pet name" system for
|
|
human-readable capability handles.
|
|
- **Network policy**: Capability model is the policy; no allowlist or
|
|
firewall.
|
|
- **Maturity**: Research demo, Foresight Institute grant. Production use
|
|
of `endo` is via Agoric and MetaMask, not as a containment tool.
|
|
|
|
### litterbox
|
|
- **Source**: https://litterbox.work/ ; https://github.com/Gerharddc/litterbox
|
|
- **License**: Apache 2.0 (~66 stars)
|
|
- **Isolation**: Podman container on Linux + Wayland socket forwarding;
|
|
optional Landlock LSM for filesystem restriction.
|
|
- **Locality**: Local, Linux only.
|
|
- **Agent integration**: Generic dev sandbox; works with any agent that
|
|
runs inside the container.
|
|
- **Config**: Interactive CLI wizard — `define` (Dockerfile template),
|
|
`build` (prompts), `start` (launch).
|
|
- **Network policy**: "Limited isolation by default" — no strict
|
|
allowlist documented.
|
|
- **Notable**: Per-key SSH agent confirmation dialogs.
|
|
- **Maturity**: Early-stage, ~66 stars.
|
|
|
|
### agent-safehouse
|
|
- **Source**: https://agent-safehouse.dev/ ; https://github.com/eugene1g/agent-safehouse
|
|
- **License**: Apache 2.0 (~1,400 stars)
|
|
- **Isolation**: macOS `sandbox-exec` (Seatbelt) profiles — kernel-level
|
|
syscall interception, no container.
|
|
- **Locality**: Local, macOS only.
|
|
- **Agent integration**: Explicit multi-agent wrapper — Claude Code,
|
|
OpenAI Codex, Gemini CLI, Cline, Aider. Usage:
|
|
`safehouse claude --dangerously-skip-permissions`.
|
|
- **Config**: Shell functions or custom `sandbox-exec` profile files;
|
|
LLM-assisted profile generation supported.
|
|
- **Network policy**: Not addressed.
|
|
- **Maturity**: Active through March 2026.
|
|
|
|
### matchlock
|
|
- **Source**: https://github.com/jingkaihe/matchlock
|
|
- **License**: MIT (~574 stars, v0.2.10)
|
|
- **Isolation**: MicroVMs — Firecracker on Linux, Apple
|
|
Virtualization.framework on macOS. Transparent proxy via nftables DNAT
|
|
(Linux) or gVisor userspace TCP/IP (macOS).
|
|
- **Locality**: Local (Homebrew, .deb, .rpm).
|
|
- **Agent integration**: Agent-agnostic; SDK examples for Anthropic
|
|
Claude API and OpenAI. Go, Python, TypeScript SDKs.
|
|
- **Config**: CLI flags (`--allow-host`, `--secret`, `--no-network`) or
|
|
SDK builder pattern. No manifest file.
|
|
- **Network policy**: Default-deny + per-host allowlist.
|
|
- **Notable**: Secrets injected in-flight by the host proxy — they never
|
|
enter the VM.
|
|
- **Maturity**: Marked experimental.
|
|
|
|
### tilde.run
|
|
- **Source**: https://tilde.run/
|
|
- **License**: Proprietary, hosted SaaS.
|
|
- **Isolation**: Cloud-hosted containers; underlying mechanism not
|
|
publicly stated (unverified whether OCI containers or microVMs).
|
|
- **Locality**: Hosted only.
|
|
- **Agent integration**: Claude orchestration explicit; CLI
|
|
(`tilde exec`) and Python SDK; plain-English agent instructions.
|
|
- **Config**: DSL for RBAC policies (allow / deny / require human
|
|
approval per action, per repo, per agent).
|
|
- **Network policy**: Default-deny with per-request logging; cloud
|
|
metadata endpoints and private networks blocked.
|
|
- **Persistence**: All changes versioned and rollback-able via lakeFS;
|
|
atomic commits per run.
|
|
- **Maturity**: Private preview, © 2025, built by the lakeFS team.
|
|
|
|
### boxlite
|
|
- **Source**: https://boxlite.ai/ ; https://github.com/boxlite-ai/boxlite
|
|
- **License**: Apache 2.0 (~4,700 stars, YC-backed)
|
|
- **Isolation**: MicroVMs with dedicated Linux kernel per box — KVM on
|
|
Linux, Hypervisor.framework on macOS. Not containers/namespaces.
|
|
- **Locality**: Local, no daemon.
|
|
- **Agent integration**: Explicitly targets AI agents; MCP server
|
|
companion (boxlite-ai/boxlite-mcp). Pivoted from dev environments in
|
|
2025.
|
|
- **Config**: SDK only — Python, Node.js, Rust, C; Go pending. No
|
|
declarative manifest.
|
|
- **Network policy**: "Isolated Network per VM" — details not public
|
|
*(unverified)*.
|
|
- **Notable**: Sub-50ms boot, snapshot / fork / clone of VM state. Self
|
|
description: "the SQLite of sandboxing".
|
|
- **Maturity**: Active, YC.
|
|
|
|
### microsandbox
|
|
- **Source**: https://github.com/microsandbox/microsandbox (the
|
|
`superradcompany/microsandbox` URL redirects to the same project).
|
|
- **License**: Apache 2.0 (~6,000 stars, YC-backed)
|
|
- **Isolation**: MicroVMs via libkrun, OCI-compatible images.
|
|
Sub-100ms boot, rootless, no daemon, embeddable as a library.
|
|
- **Locality**: Local.
|
|
- **Agent integration**: Explicit Claude Code + Cursor targeting via
|
|
"Agent Skills" packages and an MCP server. Agents can create their own
|
|
sandboxes programmatically.
|
|
- **Config**: CLI (`msb`), SDKs (Rust, Python, TypeScript), MCP server.
|
|
- **Network policy**: Not detailed in public docs.
|
|
- **Maturity**: Beta, breaking changes expected; most-starred project in
|
|
this set.
|
|
|
|
### smolmachines
|
|
- **Source**: https://smolmachines.com/ ; https://github.com/smol-machines/smolvm
|
|
- **License**: Apache 2.0 (~3,100 stars)
|
|
- **Isolation**: MicroVMs via libkrun — Hypervisor.framework on macOS,
|
|
KVM on Linux. No shared kernel.
|
|
- **Locality**: Local, no daemon.
|
|
- **Agent integration**: Includes an `AGENTS.md`; designed with coding
|
|
agents in mind but no MCP/Skills turnkey integration.
|
|
- **Config**: TOML Smolfiles declaring image, networking, volumes, SSH
|
|
agent access, GPU acceleration. Portable `.smolmachine` files.
|
|
- **Network policy**: Off by default; per-host allowlist via
|
|
`--allow-host`.
|
|
- **Persistence**: Named machines persistent by default; ephemeral runs
|
|
also supported.
|
|
- **Maturity**: Active through April 2026.
|
|
|
|
## Comparison table
|
|
|
|
| Axis | bot-bottle | endo-familiar | litterbox | agent-safehouse | matchlock | tilde.run | boxlite | microsandbox | smolmachines |
|
|
|---|---|---|---|---|---|---|---|---|---|
|
|
| Isolation | Docker + internal net + pipelock; gVisor if present | Object-capability (no OS isolation) | Podman + opt. Landlock | macOS `sandbox-exec` | MicroVM (Firecracker / Virt.fw) | Hosted container (unverified) | MicroVM (KVM / Hypervisor.fw) | MicroVM (libkrun) | MicroVM (libkrun / KVM) |
|
|
| Local vs hosted | Local | Local | Local (Linux) | Local (macOS) | Local | Hosted SaaS | Local | Local | Local |
|
|
| Open source | Apache 2.0 | Apache 2.0 | Apache 2.0 | Apache 2.0 | MIT | No | Apache 2.0 | Apache 2.0 | Apache 2.0 |
|
|
| Agent target | Claude Code | Generic (demo) | Generic | Multi-agent wrapper | Generic (+ Claude/OpenAI SDKs) | Claude focus | Generic | Claude + Cursor (MCP/Skills) | Generic (AGENTS.md) |
|
|
| Network policy | Default-deny via pipelock + per-bottle allowlist + DLP | Capability model only | Limited | Not addressed | Default-deny + allowlist + secret-injecting proxy | Default-deny + logging | Per-VM net (unverified) | Not documented | Off by default + allowlist |
|
|
| Parallel agents | Yes (one bottle per agent) | n/a | Not addressed | One at a time | Multiple VMs | Yes (dashboard) | SDK-level | SDK-level | Architectural |
|
|
| Config | JSON manifest (bottles + agents) | Programmatic refs | CLI wizard | Profile files / shell fns | CLI / SDK | DSL + CLI + SDK | SDK | CLI / SDK / MCP | TOML Smolfile |
|
|
| Maturity | Active May 2026 | Research (2022+) | Early (~66 ⭐) | Active (~1.4k ⭐) | Experimental (~574 ⭐) | Private preview | YC, ~4.7k ⭐ | YC, ~6k ⭐, beta | ~3.1k ⭐ |
|
|
|
|
## What's closest, what's different
|
|
|
|
**Closest in design and scope.** agent-safehouse and litterbox sit
|
|
nearest bot-bottle: local, single-user, thin wrappers over an
|
|
existing OS primitive, low-dep. The split is the isolation primitive —
|
|
bot-bottle uses Docker + pipelock egress (plus gVisor where
|
|
available); agent-safehouse uses `sandbox-exec`; litterbox uses Podman +
|
|
Landlock. matchlock and smolmachines are spiritually close on the
|
|
*policy* side (default-deny net, per-host allowlist) but use microVMs
|
|
instead of containers.
|
|
|
|
**Solving a different problem.** tilde.run is hosted SaaS for team /
|
|
production agent pipelines with data-versioned rollback — explicitly
|
|
opposite to bot-bottle's "infrastructure I control" goal. boxlite and
|
|
microsandbox are infrastructure libraries aimed at platform builders
|
|
embedding sandboxes into agent frameworks; they would be a *backend*
|
|
bot-bottle could call, not a competitor to its manifest layer.
|
|
endo-familiar is in a different paradigm entirely: capability passing
|
|
rather than kernel boundaries.
|
|
|
|
## Borrowable ideas
|
|
|
|
What bot-bottle already has that the survey suggested as
|
|
differentiators:
|
|
- Default-deny egress with a per-agent allowlist (pipelock).
|
|
- DLP scanning of outbound traffic.
|
|
- Bottle / agent split (manifest layer above the isolation primitive).
|
|
- gVisor auto-detection on Linux.
|
|
|
|
Ideas worth considering, without abandoning the Python-stdlib-first / local-Docker
|
|
stance:
|
|
|
|
1. **Per-use SSH key confirmation** (from litterbox). Even with
|
|
KnownHostKey pinning and pipelock egress, a wrapper SSH agent that
|
|
prompts on each key use (e.g. via `osascript` / `notify-send`) would
|
|
catch an agent doing something off-policy with a key it legitimately
|
|
holds. Pure-stdlib, no new deps.
|
|
2. **In-flight secret injection** (from matchlock). Pipelock already
|
|
does egress allowlisting and DLP; teaching it to *inject* tokens at
|
|
proxy time so e.g. `GITEA_TOKEN` never appears in the container's
|
|
env would close the "agent reads its own env and exfiltrates" path.
|
|
Fits the existing pipelock architecture.
|
|
3. **MicroVM backend as an opt-in bottle type** — already on the radar
|
|
in `stronger-isolation-alternatives.md`. microsandbox, smolmachines,
|
|
and matchlock all show that libkrun + Apple's
|
|
Virtualization.framework is ergonomic enough that a
|
|
`"runtime": "microvm"` field on a bottle is plausible without a heavy
|
|
stack.
|
|
|
|
Not worth borrowing: the SDK-first programmatic API style of boxlite /
|
|
microsandbox (cuts against the declarative-manifest stance), and the
|
|
hosted-SaaS dashboard model of tilde.run (cuts against the
|
|
"infrastructure I control" goal).
|
|
|
|
## Caveats
|
|
|
|
- Star counts and last-commit dates are point-in-time snapshots.
|
|
- Several projects' network and persistence behaviour is not
|
|
documented publicly; items so derived are marked *(unverified)*.
|
|
- The `superradcompany/microsandbox` URL in the original prompt
|
|
redirects to `microsandbox/microsandbox`; the surveyed project is the
|
|
same.
|