Compares claude-bottle to endo-familiar, litterbox, agent-safehouse, matchlock, tilde.run, boxlite, microsandbox, and smolmachines. Covers isolation primitive, locality, agent integration, network policy, and maturity, and notes three borrowable ideas (per-use SSH confirmation, in-flight secret injection, microVM backend) that fit the current bash-first / local-Docker stance.
12 KiB
Landscape: AI-agent sandbox tools
A broader survey than landscape-containerized-claude.md,
which focused on Claude-Code-specific containerizers. This one covers
general AI-agent sandbox / containment projects — some Claude-specific,
some agent-agnostic, some hosted SaaS — and contrasts them with
claude-bottle's design.
Research conducted 2026-05-11.
Summary
Eight projects surveyed. None duplicate claude-bottle's combination of local Docker, declarative JSON manifest, per-agent egress allowlist via pipelock, and bottle/agent split. Two clusters stand out:
- Closest neighbours — agent-safehouse and litterbox: local,
single-user, thin wrappers over an existing OS primitive
(
sandbox-exec, Podman + Landlock). - Different category — tilde.run (hosted SaaS), boxlite and microsandbox (microVM libraries for platform builders), endo-familiar (capability-security paradigm, no OS isolation).
The microVM cluster (matchlock, smolmachines, boxlite, microsandbox) is
the most relevant for the v2 isolation discussion in
stronger-isolation-alternatives.md:
libkrun and Apple's Virtualization.framework have made local microVMs
ergonomic enough that a "runtime": "microvm" option on a bottle is now
plausible without a heavy stack.
Per-project notes
endo-familiar
- Source: https://dcfoundation.io/containing-ai-agents-the-endo-familiar-demo/ ; https://github.com/endojs/endo
- License: Apache 2.0
- Isolation: Object-capability runtime in Hardened JavaScript. Not OS-level — agents simply cannot reference resources they were not handed.
- Locality: Local / decentralized; WebSocket relay for capability sharing across machines.
- Agent integration: Agent-agnostic, demo only.
- Config: Programmatic capability passing; "pet name" system for human-readable capability handles.
- Network policy: Capability model is the policy; no allowlist or firewall.
- Maturity: Research demo, Foresight Institute grant. Production use
of
endois via Agoric and MetaMask, not as a containment tool.
litterbox
- Source: https://litterbox.work/ ; https://github.com/Gerharddc/litterbox
- License: Apache 2.0 (~66 stars)
- Isolation: Podman container on Linux + Wayland socket forwarding; optional Landlock LSM for filesystem restriction.
- Locality: Local, Linux only.
- Agent integration: Generic dev sandbox; works with any agent that runs inside the container.
- Config: Interactive CLI wizard —
define(Dockerfile template),build(prompts),start(launch). - Network policy: "Limited isolation by default" — no strict allowlist documented.
- Notable: Per-key SSH agent confirmation dialogs.
- Maturity: Early-stage, ~66 stars.
agent-safehouse
- Source: https://agent-safehouse.dev/ ; https://github.com/eugene1g/agent-safehouse
- License: Apache 2.0 (~1,400 stars)
- Isolation: macOS
sandbox-exec(Seatbelt) profiles — kernel-level syscall interception, no container. - Locality: Local, macOS only.
- Agent integration: Explicit multi-agent wrapper — Claude Code,
OpenAI Codex, Gemini CLI, Cline, Aider. Usage:
safehouse claude --dangerously-skip-permissions. - Config: Shell functions or custom
sandbox-execprofile files; LLM-assisted profile generation supported. - Network policy: Not addressed.
- Maturity: Active through March 2026.
matchlock
- Source: https://github.com/jingkaihe/matchlock
- License: MIT (~574 stars, v0.2.10)
- Isolation: MicroVMs — Firecracker on Linux, Apple Virtualization.framework on macOS. Transparent proxy via nftables DNAT (Linux) or gVisor userspace TCP/IP (macOS).
- Locality: Local (Homebrew, .deb, .rpm).
- Agent integration: Agent-agnostic; SDK examples for Anthropic Claude API and OpenAI. Go, Python, TypeScript SDKs.
- Config: CLI flags (
--allow-host,--secret,--no-network) or SDK builder pattern. No manifest file. - Network policy: Default-deny + per-host allowlist.
- Notable: Secrets injected in-flight by the host proxy — they never enter the VM.
- Maturity: Marked experimental.
tilde.run
- Source: https://tilde.run/
- License: Proprietary, hosted SaaS.
- Isolation: Cloud-hosted containers; underlying mechanism not publicly stated (unverified whether OCI containers or microVMs).
- Locality: Hosted only.
- Agent integration: Claude orchestration explicit; CLI
(
tilde exec) and Python SDK; plain-English agent instructions. - Config: DSL for RBAC policies (allow / deny / require human approval per action, per repo, per agent).
- Network policy: Default-deny with per-request logging; cloud metadata endpoints and private networks blocked.
- Persistence: All changes versioned and rollback-able via lakeFS; atomic commits per run.
- Maturity: Private preview, © 2025, built by the lakeFS team.
boxlite
- Source: https://boxlite.ai/ ; https://github.com/boxlite-ai/boxlite
- License: Apache 2.0 (~4,700 stars, YC-backed)
- Isolation: MicroVMs with dedicated Linux kernel per box — KVM on Linux, Hypervisor.framework on macOS. Not containers/namespaces.
- Locality: Local, no daemon.
- Agent integration: Explicitly targets AI agents; MCP server companion (boxlite-ai/boxlite-mcp). Pivoted from dev environments in 2025.
- Config: SDK only — Python, Node.js, Rust, C; Go pending. No declarative manifest.
- Network policy: "Isolated Network per VM" — details not public (unverified).
- Notable: Sub-50ms boot, snapshot / fork / clone of VM state. Self description: "the SQLite of sandboxing".
- Maturity: Active, YC.
microsandbox
- Source: https://github.com/microsandbox/microsandbox (the
superradcompany/microsandboxURL redirects to the same project). - License: Apache 2.0 (~6,000 stars, YC-backed)
- Isolation: MicroVMs via libkrun, OCI-compatible images. Sub-100ms boot, rootless, no daemon, embeddable as a library.
- Locality: Local.
- Agent integration: Explicit Claude Code + Cursor targeting via "Agent Skills" packages and an MCP server. Agents can create their own sandboxes programmatically.
- Config: CLI (
msb), SDKs (Rust, Python, TypeScript), MCP server. - Network policy: Not detailed in public docs.
- Maturity: Beta, breaking changes expected; most-starred project in this set.
smolmachines
- Source: https://smolmachines.com/ ; https://github.com/smol-machines/smolvm
- License: Apache 2.0 (~3,100 stars)
- Isolation: MicroVMs via libkrun — Hypervisor.framework on macOS, KVM on Linux. No shared kernel.
- Locality: Local, no daemon.
- Agent integration: Includes an
AGENTS.md; designed with coding agents in mind but no MCP/Skills turnkey integration. - Config: TOML Smolfiles declaring image, networking, volumes, SSH
agent access, GPU acceleration. Portable
.smolmachinefiles. - Network policy: Off by default; per-host allowlist via
--allow-host. - Persistence: Named machines persistent by default; ephemeral runs also supported.
- Maturity: Active through April 2026.
Comparison table
| Axis | claude-bottle | endo-familiar | litterbox | agent-safehouse | matchlock | tilde.run | boxlite | microsandbox | smolmachines |
|---|---|---|---|---|---|---|---|---|---|
| Isolation | Docker + internal net + pipelock; gVisor if present | Object-capability (no OS isolation) | Podman + opt. Landlock | macOS sandbox-exec |
MicroVM (Firecracker / Virt.fw) | Hosted container (unverified) | MicroVM (KVM / Hypervisor.fw) | MicroVM (libkrun) | MicroVM (libkrun / KVM) |
| Local vs hosted | Local | Local | Local (Linux) | Local (macOS) | Local | Hosted SaaS | Local | Local | Local |
| Open source | Apache 2.0 | Apache 2.0 | Apache 2.0 | Apache 2.0 | MIT | No | Apache 2.0 | Apache 2.0 | Apache 2.0 |
| Agent target | Claude Code | Generic (demo) | Generic | Multi-agent wrapper | Generic (+ Claude/OpenAI SDKs) | Claude focus | Generic | Claude + Cursor (MCP/Skills) | Generic (AGENTS.md) |
| Network policy | Default-deny via pipelock + per-bottle allowlist + DLP | Capability model only | Limited | Not addressed | Default-deny + allowlist + secret-injecting proxy | Default-deny + logging | Per-VM net (unverified) | Not documented | Off by default + allowlist |
| Parallel agents | Yes (one bottle per agent) | n/a | Not addressed | One at a time | Multiple VMs | Yes (dashboard) | SDK-level | SDK-level | Architectural |
| Config | JSON manifest (bottles + agents) | Programmatic refs | CLI wizard | Profile files / shell fns | CLI / SDK | DSL + CLI + SDK | SDK | CLI / SDK / MCP | TOML Smolfile |
| Maturity | Active May 2026 | Research (2022+) | Early (~66 ⭐) | Active (~1.4k ⭐) | Experimental (~574 ⭐) | Private preview | YC, ~4.7k ⭐ | YC, ~6k ⭐, beta | ~3.1k ⭐ |
What's closest, what's different
Closest in design and scope. agent-safehouse and litterbox sit
nearest claude-bottle: local, single-user, thin wrappers over an
existing OS primitive, low-dep. The split is the isolation primitive —
claude-bottle uses Docker + pipelock egress (plus gVisor where
available); agent-safehouse uses sandbox-exec; litterbox uses Podman +
Landlock. matchlock and smolmachines are spiritually close on the
policy side (default-deny net, per-host allowlist) but use microVMs
instead of containers.
Solving a different problem. tilde.run is hosted SaaS for team / production agent pipelines with data-versioned rollback — explicitly opposite to claude-bottle's "infrastructure I control" goal. boxlite and microsandbox are infrastructure libraries aimed at platform builders embedding sandboxes into agent frameworks; they would be a backend claude-bottle could call, not a competitor to its manifest layer. endo-familiar is in a different paradigm entirely: capability passing rather than kernel boundaries.
Borrowable ideas
What claude-bottle already has that the survey suggested as differentiators:
- Default-deny egress with a per-agent allowlist (pipelock).
- DLP scanning of outbound traffic.
- Bottle / agent split (manifest layer above the isolation primitive).
- gVisor auto-detection on Linux.
Ideas worth considering, without abandoning the bash-first / local-Docker stance:
- Per-use SSH key confirmation (from litterbox). Even with
KnownHostKey pinning and pipelock egress, a wrapper SSH agent that
prompts on each key use (e.g. via
osascript/notify-send) would catch an agent doing something off-policy with a key it legitimately holds. Pure-stdlib, no new deps. - In-flight secret injection (from matchlock). Pipelock already
does egress allowlisting and DLP; teaching it to inject tokens at
proxy time so e.g.
GITEA_TOKENnever appears in the container's env would close the "agent reads its own env and exfiltrates" path. Fits the existing pipelock architecture. - MicroVM backend as an opt-in bottle type — already on the radar
in
stronger-isolation-alternatives.md. microsandbox, smolmachines, and matchlock all show that libkrun + Apple's Virtualization.framework is ergonomic enough that a"runtime": "microvm"field on a bottle is plausible without a heavy stack.
Not worth borrowing: the SDK-first programmatic API style of boxlite / microsandbox (cuts against the declarative-manifest stance), and the hosted-SaaS dashboard model of tilde.run (cuts against the "infrastructure I control" goal).
Caveats
- Star counts and last-commit dates are point-in-time snapshots.
- Several projects' network and persistence behaviour is not documented publicly; items so derived are marked (unverified).
- The
superradcompany/microsandboxURL in the original prompt redirects tomicrosandbox/microsandbox; the surveyed project is the same.