Delete CLAUDE.md in favor of AGENTS.md as the orientation doc, rebrand the project from Codex-bottle to provider-agnostic bot-bottle, and repoint every CLAUDE.md reference across PRDs, research notes, the implementer agent example, and the yaml_subset comment. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
10 KiB
Stronger isolation alternatives: gVisor, Kata, Firecracker, Apple Container
Research into what it would take to replace or augment Docker (with runc)
as the agent runtime in bot-bottle, and what each option would actually
buy in security terms vs. cost in launcher rewrite.
Summary
There is a ladder, not a menu. Three realistic rungs, ordered by effort:
- gVisor (
runsc) — flip a runtime flag per bottle. ~1–2 days. Adds a userspace syscall boundary; blocks most kernel-CVE escape classes. - Kata Containers — flip a runtime flag per bottle. Same Docker UX, real microVM underneath. Linux-host only.
- Firecracker direct — replace Docker as the runtime entirely. Weeks of work. Strongest boundary, no macOS support.
A fourth option, Apple Container, is the right macOS-native answer to "I want Kata's isolation model without giving up MacBooks as the dev target." Probably the right v2 if bot-bottle keeps macOS in scope.
The pipelock egress design is portable across all four: every option can
provide a network primitive that means "no default route except through
the proxy" (Docker --internal, Kata's virtualized bridge, TAP-only
Firecracker, Apple Container's per-VM networking). Whichever rung is
chosen, the security-load-bearing part of the v1 design survives.
Threat model recap
The current v1 boundary is a single node:22-slim container running as
uid 1000 under runc, sharing a kernel with the host. This protects
against:
- accidental host-filesystem access by Claude Code,
- network egress not approved by the pipelock allowlist,
- a misbehaving but uncoordinated agent.
It does not protect against:
- a kernel-level container escape (Dirty Pipe / runc CVE class),
- a coordinated attacker with code execution inside the container who targets the host kernel,
- side channels accessible from the shared kernel.
Stronger isolation closes the second column. Whether that's worth the effort depends on whether you trust the agent's code-execution surface more or less than you trust the host kernel.
Rung 1: gVisor (runsc)
gVisor is a userspace kernel that registers as a Docker runtime. The agent's syscalls are intercepted and re-implemented in Go rather than forwarded to the host kernel.
What changes in this codebase
bot_bottle/cli/start.py(wheredocker runis assembled): add--runtime=runscto the container args when the bottle requests it. Make it configurable:bottles.<name>.runtime: "runsc" | "runc", defaultrunc.bot_bottle/docker.py: add arequire_runsc()check that runsdocker info --format '{{.Runtimes}}'once and dies with an install pointer ifrunscisn't registered.network.py,pipelock.py,skills.py,ssh.py: no changes. Docker networks,docker exec,docker cp, volume mounts, the pipelock sidecar — all of it still works because gVisor is invisible at the Docker API layer.
What you get
- A second syscall boundary between the agent and the host kernel.
Most container-escape CVEs (Dirty Pipe / runc-escape class) stop at
runsc. - Roughly 2–10% perf hit on syscall-heavy workloads.
npm installwill feel it; interactiveclaudetyping will not.
Caveats
- macOS does not run
runscnatively. It needs a Linux kernel. On Mac, gVisor would run inside Docker Desktop's Linux VM, so the effective boundary becomes "agent ↔ runsc ↔ Docker Desktop's Linux VM ↔ hypervisor ↔ macOS". The hypervisor was already doing the heavy lifting; on Mac, runsc is mostly defense-in-depth. On a Linux host it's a real win. - Some syscalls are unsupported (a small list —
io_uringhistorically, someptraceshapes). For Claude Code + git + npm I expect zero issues, but a smoke test (claude --version && git status && npm install) inside the runsc image is worth it.
Effort
~1–2 days, plus a paragraph in the README. Cleanest first step.
Rung 2: Kata Containers
Kata also registers as a Docker/containerd runtime
(--runtime=kata-runtime), but each container actually runs inside its
own lightweight VM. The VMM under the hood is configurable: Firecracker,
Cloud Hypervisor, or QEMU.
What changes in this codebase
Essentially the same as the gVisor path: flip a runtime flag, add a require-check. Pipelock keeps working unchanged, because Kata virtualizes the network at the VM level but exposes it as a normal Docker network.
Tradeoffs vs. gVisor
- Stronger boundary (real VM, not a syscall filter).
- Slower cold start (hundreds of ms vs. tens). For interactive Claude this is fine; for ephemeral batch agents you would notice.
- Not natively supported on macOS at all — needs a Linux host or a Linux VM you control. This is the moment bot-bottle stops being "works on a Mac dev laptop with Docker Desktop."
When this is the right rung
If the deployment target is "agents run on a small Linux server I administer," Kata is the sweet spot. If the target stays "users run this on their MacBook," skip to the Apple Container option.
Rung 3: Firecracker directly
Firecracker is a VMM, not a container runtime. Adopting it means replacing Docker, not adding to it.
What you would lose / have to rebuild
| Today | With Firecracker |
|---|---|
Dockerfile → node:22-slim image |
A rootfs (ext4 image) + a kernel (vmlinux) you build and pin |
docker run --network … |
TAP devices on the host, connected to a Linux bridge or routed manually |
docker exec -it for the interactive TTY |
vsock + a small in-guest agent, or SSH into the microVM |
docker cp for skills + pipelock YAML |
Bake into the rootfs, mount a virtio-blk overlay, or 9p / virtiofs share |
Pipelock as a sidecar on a --internal network |
Pipelock as a separate microVM (or on the host) with a TAP-only path between the two; the agent VM gets no host route |
docker rm -f on exit |
A SIGTERM to firecracker + cleanup of TAPs, sockets, overlay disks |
Files in this repo that would change
bot_bottle/docker.py→ replaced by a newbot_bottle/firecracker.pythat POSTs to the Firecracker API socket per microVM (/boot-source,/drives,/network-interfaces,/actions).bot_bottle/network.py→ a host-side networking module that creates a Linux bridge per agent, two TAPs (agent-side, pipelock-side), and either iptables rules or no host route at all so the agent VM literally cannot reach anything except pipelock.bot_bottle/pipelock.py→ instead of a sidecar container, run pipelock as its own microVM (or on the host pinned to the bridge). The hostname-allowlist semantics carry over; the implementation is different.bot_bottle/skills.py,ssh.py→ can no longer usedocker cp. Bake skills into the rootfs at build time, or mount a virtiofs share read-only.Dockerfile→ replaced by a rootfs builder. Realistically this means using something likefirecracker-containerdor building the rootfs withdebootstrap/mkosiand a kernel from upstream.
What you would gain
- A real KVM boundary. The strongest isolation realistically achievable on commodity hardware.
- Sub-second cold starts (Firecracker boots in ~125 ms; rootfs prep dominates).
What you would give up
- macOS support. Firecracker is KVM-only. The only way back to Mac is to nest a Linux VM hosting Firecracker, at which point the security argument gets thin again.
- Ecosystem ergonomics. No
docker logs, nodocker exec, nodocker network inspect. You write all of that yourself or adoptfirecracker-containerdor Ignite (which is unmaintained — verify before committing).
Effort
Realistically 2–4 weeks of focused work on the runtime layer. Forces dropping "v1 works on Mac" as a goal. PRD-worthy, not a side quest.
Rung 3.5: Apple Container (macOS-native VM-per-container)
Apple Container is Apple's container CLI, native on Apple Silicon.
Each container runs in its own Virtualization.framework VM. It is the
macOS-native answer to "I want Kata's isolation model on my MacBook."
Why it matters here
The CLI surface mirrors Docker closely (container run, container network create, etc.), so the launcher rewrite is far smaller than
Firecracker's. On Linux hosts you would still take the gVisor or Kata
path. The result is:
- macOS: Apple Container (per-container VM via Virtualization.framework),
- Linux: gVisor or Kata,
- one Python launcher that switches on host OS.
Open questions before committing
- Does Apple Container support a
--internal-equivalent network with no default gateway, so the pipelock topology is reproducible? - Image format: Apple Container uses OCI images, so the existing
Dockerfileshould be reusable, but this needs verification. exec-equivalent semantics: the launcher relies ondocker execto attach a TTY after the container is up. Confirmcontainer execbehaves equivalently for interactive use.
A short spike (~1 day) answering those three questions would unblock a PRD-level decision.
Recommendation
If this were my project today, given the README still names macOS as in
scope and the manifest example carries /Users/didericis paths:
- Today. Add
bottles.<name>.runtimewithrunc/runscoptions. Land it as a one-day PR. README gets a small "Linux hosts can opt into gVisor for stronger isolation" note. Mac users get nothing new but lose nothing. - If VM-grade isolation on macOS becomes the goal. Skip Firecracker and look at Apple Container. Smaller launcher rewrite than Firecracker; Linux stays on the gVisor / Kata path. Probably the right v2.
- Firecracker only if bot-bottle's deployment target settles on
self-hosted Linux, not laptops — at which point the "non-goal:
self-hosted VMs" line in
AGENTS.mdflips and the project's identity changes.
The pipelock egress design ports across all of these, so none of this work threatens the existing security-load-bearing piece of v1.
Caveats
- gVisor's unsupported-syscall list shifts release-to-release; verify against the version pinned in any future image.
- Kata's default VMM is configurable; performance and CVE surface vary by VMM choice.
- Firecracker tooling has churned (Ignite is effectively unmaintained;
firecracker-containerdis the active path). Re-survey before committing. - Apple Container is young; behavior around
--internal-style networks andexecsemantics needs to be verified directly, not assumed. - Research conducted 2026-05-10.