docs(prd-0023): pivot to smolvm + TSI single-IP allowlist
test / unit (pull_request) Successful in 22s
test / integration (pull_request) Successful in 43s

Chunk-1's empirical spike against smolvm 0.8.0 contradicted the
research note that motivated the gvproxy network design: smolvm
exposes no virtio-net-over-unixgram attachment. The first draft's
"why gvproxy, not TSI" argument turns out to apply only to
`--outbound-localhost-only`, not to TSI generally.

New design:

- Bundle (PRD 0024) runs on a dedicated per-bottle docker bridge
  with a pinned IP. Smolfile sets `[network] allow_cidrs =
  ["<bundle-ip>/32"]` and nothing else. Agent can reach the bundle
  and nothing else — host loopback, LAN, public internet directly
  are all refused at the VMM (TSI) layer.
- Bind-address mitigation: egress binds 127.0.0.1:9099 inside the
  bundle (pipelock-internal); pipelock / git-gate / supervise
  bind 0.0.0.0 so the agent (across the TSI allowlist) can reach
  them. This is the port-granularity TSI's IP-only allowlist
  doesn't provide.
- Smolfile renderer rewritten in chunk 2 to smolvm 0.8.0's actual
  schema (image / entrypoint / cmd / env / [network] allow_cidrs).
  The chunk-1 renderer (name= / [[net]]= under the gvproxy
  design) emits the wrong shape and will be replaced.
- Drop gvproxy + VZFileHandleNetworkDeviceAttachment + the
  PyObjC fallback. Backend layout loses gvproxy_config.py,
  gvproxy.py, vfkit_attach.py.
- Acceptance plan adds an egress-port-bypass probe in addition
  to the localhost-reach probe.
- Chunks reshape: chunk 1 stays (renderer rewrite is part of
  chunk 2's cost); chunk 2 covers VM lifecycle + bundle + new
  Smolfile renderer; chunk 3 is the bundle bind-address change;
  chunks 4-5 unchanged in spirit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-27 03:47:03 -04:00
parent b1ad6295a4
commit 5929caa219
+324 -304
View File
@@ -9,36 +9,54 @@
Ship a second concrete `BottleBackend`
`SmolmachinesBottleBackend`, selected via
`CLAUDE_BOTTLE_BACKEND=smolmachines` — that runs each bottle inside
a per-agent microVM on macOS. The egress topology is enforced by
**gvproxy** (gvisor-tap-vsock), a userspace TCP/IP stack the guest's
virtio-net device is wired into via `VZFileHandleNetworkDeviceAttachment`.
gvproxy's only outbound configuration is an explicit per-bottle
port-forward set into a **single per-bottle sidecar container** that
bundles pipelock + egress + git-gate + supervise behind one supervised
init. Everything else — the host's LAN, the host's loopback
services, the public internet — is unreachable from the guest by
construction.
a per-agent libkrun microVM via `smolvm`. Egress is enforced by
libkrun's TSI ("Transport Socket Interface") allowlist set to a
**single /32** — the docker IP of the per-bottle sidecar bundle
(PRD 0024) on a dedicated docker bridge. Everything else — host
loopback, LAN, public internet directly — is denied at the VMM
layer, before a host-side socket is ever opened.
The sidecar bundle is the same image PRD 0024 introduces for the
docker backend; this PRD consumes it. Inside the bundle, egress is
pipelock's internal upstream over localhost and is not exposed
externally. gvproxy port-forwards three external ports into the
bundle: pipelock (for `HTTPS_PROXY`), git-gate (for git push), and
supervise (for MCP).
This explicitly rejects libkrun's TSI ("Transport Socket Interface")
allowlist as the network primitive. TSI's `--outbound-localhost-only`
is permissive on the entire `127.0.0.0/8` range with no
destination-port filter — the agent can dial any host-side service
bound to loopback (a local Postgres, an IDE plugin, a different
bottle's pipelock). That's the wrong default for a malicious-agent
threat model; see "Why gvproxy, not TSI" below.
The sidecar bundle is the same image PRD 0024 ships for the docker
backend; this PRD consumes it. Inside the bundle, pipelock /
git-gate / supervise bind `0.0.0.0:<port>` so the agent (reaching
the bundle via the allowed /32) can talk to them; egress (the
internal upstream of pipelock) binds `127.0.0.1:9099` so it's only
reachable from pipelock within the bundle — the agent can't dial
it directly even though TSI's allowlist is IP-granular rather than
port-granular.
The Docker backend ships unchanged; this is opt-in via the existing
env-var selector. The acceptance gate is PRD 0022's
`tests/integration/test_sandbox_escape.py` running green against
`CLAUDE_BOTTLE_BACKEND=smolmachines`.
### Design pivot from the first draft
The original PRD landed (PR #53) calling for **gvproxy** as the
network primitive — a userspace TCP/IP stack the guest's virtio-net
device would hook into via `VZFileHandleNetworkDeviceAttachment`,
with explicit `port_forwards` controlling what the guest could
reach. That design was built around the smolmachines research
note's claim that libkrun supports a virtio-net mode separate
from TSI.
Chunk 1's empirical spike against `smolvm 0.8.0`'s actual CLI
contradicted that claim: smolvm exposes only TSI-style egress
filters (`--allow-host`, `--allow-cidr`, `--outbound-localhost-only`),
with no documented option to attach virtio-net to a custom unixgram
socket. The gvproxy path would have required dropping smolvm
entirely and driving `Virtualization.framework` via PyObjC.
Re-examining the "why gvproxy" argument with smolvm's real surface,
the loopback gap PRD 0023 worried about only exists with
`--outbound-localhost-only`. With `--allow-cidr <bundle-ip>/32`
instead — and no `--outbound-localhost-only` — the agent can reach
exactly one IP (the bundle) and nothing else: not host loopback,
not LAN, not public internet. That's the same security property
the gvproxy design was chasing, enforced one layer lower (VMM
socket interception, not a userspace TCP/IP stack we maintain),
with significantly less code.
## Problem
`agent-vm-isolation.md` argues for hardware-isolated microVMs over
@@ -55,8 +73,10 @@ and four things motivate a second one now:
an editor plugin, another bottle's sidecar) without traversing
pipelock. The Docker backend's `--internal` network blocks the
first; nothing in the current stack blocks the second cleanly.
This PRD's gvproxy-based design closes both gaps: the guest can
only reach the explicit port-forward list, period.
This PRD's design closes both gaps via TSI's
`--allow-cidr <bundle-ip>/32`: the guest can only dial that one
IP, period. Host loopback, LAN, and the public internet are
refused at the VMM layer.
- **Isolation ceiling.** On macOS the Docker backend's agent
container shares Docker Desktop's host VM with every other
bottle. Container escape from claude-code lands the agent inside
@@ -77,30 +97,46 @@ and four things motivate a second one now:
clean in places where Docker-specific assumptions have been
tolerated.
## Why gvproxy, not TSI
## How TSI's single-IP allowlist achieves the property
libkrun's TSI hijacks guest socket syscalls inside the VMM and
opens the actual sockets from the host process, with a CIDR
allowlist gate. That works fine for blocking LAN reach (don't
allowlist `192.168.0.0/16`, agent can't dial it). But TSI's
`--outbound-localhost-only` permits the *entire* `127.0.0.0/8`
range across all ports — there is no destination-port filter at
the TSI layer (`smolmachines-as-vm-backend.md` flags this in the
"`--allow-host` semantics" caveat). For our threat model that
means any host-loopback service is reachable from the guest.
opens the actual sockets from the host process, gated by a CIDR
allowlist. Three flags expose the allowlist:
gvproxy implements a full userspace TCP/IP stack on the host side
of a `VZFileHandleNetworkDeviceAttachment` unixgram socket. The
guest has a real virtio-net device; gvproxy is its gateway. The
guest can only reach what gvproxy is configured to forward —
typically a single port forward to the per-bottle pipelock —
and DNS resolves NXDOMAIN by default. There is no "permissive
loopback" mode to mis-configure; if it's not in `port_forwards`,
the guest cannot reach it.
- `--outbound-localhost-only` — opens up the whole `127.0.0.0/8`
range, all ports. This is the flag the first draft of this PRD
rejected, and we still reject it: it would let the agent dial
any host-loopback service (local Postgres, IDE plugins, another
bottle's sidecar).
- `--allow-cidr CIDR` — IP/CIDR allowlist with no port filter.
- `--allow-host HOSTNAME` — resolves the host on the host's DNS
at VM-start time, stores the result as `/32` CIDRs, and also
enables guest-side DNS filtering (only the allowed hostname
resolves).
That property — *explicit allowlist by port forward, not CIDR*
is the load-bearing reason this PRD chooses gvproxy. TSI shows up
once more in this doc, under Non-goals, where it is closed off.
This backend uses `--allow-cidr <bundle-ip>/32` (single host) and
nothing else. With the bundle running as a docker container with a
known IP on a dedicated docker bridge, the agent can reach exactly
one address: the bundle. Host loopback is denied (not in the
allowlist). LAN is denied. Public internet directly is denied. DNS
inside the guest is denied (no resolver in the allowlist) — the
agent uses an IP literal for `HTTPS_PROXY`.
The one wrinkle TSI doesn't directly handle is **port granularity
within the allowed IP**. The bundle runs four daemons; pipelock /
git-gate / supervise are agent-facing, egress is pipelock's
internal upstream. If egress were bound to `0.0.0.0:9099` inside
the bundle, the agent could dial `<bundle-ip>:9099` and bypass
pipelock's DLP. We mitigate by binding egress to `127.0.0.1:9099`
*inside* the bundle so only pipelock — also in the bundle, on the
same localhost — can reach it. The bind-address strategy gives us
port-level isolation that TSI's IP-only allowlist doesn't.
Net result: same security property the first draft chased with
gvproxy, enforced at the VMM layer rather than via a userspace
TCP/IP stack, with significantly less code (no gvproxy lifecycle,
no `VZFileHandleNetworkDeviceAttachment` plumbing, no Smolfile
virtio-net carve-out smolvm doesn't expose anyway).
## Goals / Success Criteria
@@ -133,33 +169,33 @@ The feature is **done** when all of the following ship:
- `SmolmachinesBottleBackend` registered under the
`"smolmachines"` key in `claude_bottle/backend/__init__.py:_BACKENDS`.
- Per-bottle Smolfile generation: a runtime-rendered TOML written
to the bottle's stage dir, analogous to the compose file the
Docker backend writes today. The Smolfile pins `command`,
`env`, and a virtio-net device backed by a unixgram socket
pointed at the per-bottle gvproxy. There is no TSI
`--allow-cidr` / `--outbound-localhost-only` / `--allow-host`
in the Smolfile — TSI is not used.
- Per-bottle gvproxy: one `gvproxy` process per bottle, started
before the VM, listening on a unixgram socket the VM's
virtio-net device hooks into. The gvproxy config has up to
three `port_forwards` entries (pipelock / git-gate / supervise
— git-gate and supervise only when the bottle uses them) all
pointing at the per-bottle sidecar bundle's exposed ports, plus
a DNS section that resolves only `proxy.internal`. Every other
hostname returns NXDOMAIN; every other destination is
unreachable.
to the bottle's stage dir using smolvm 0.8.0's actual schema
(`image`, `entrypoint`, `cmd`, `env = ["K=V", …]`, `[network]
allow_cidrs = ["<bundle-ip>/32"]`). The renderer chunk 1
shipped emits the wrong shape (built around the gvproxy
unixgram attachment) — it gets rewritten in this chunk plan as
the cost of the design pivot.
- Per-bottle docker bridge for the bundle: the sidecar bundle
runs as a docker container on a dedicated per-bottle bridge
network with a pinned IP (`--ip <bundle-ip>` against a
per-slug `/24` derived from the slug hash). The pinned IP is
what TSI's allowlist points at; without pinning we'd need to
inspect the running container's IP and feed it back into the
Smolfile, which is a race.
- Per-bottle sidecar bundle: one container per bottle running the
bundle image defined in PRD 0024. The bundle exposes up to
three host ports (pipelock for `HTTPS_PROXY`, git-gate for git
push, supervise for MCP), bound to `127.0.0.1` on dynamically
allocated ports. egress runs *inside* the bundle as pipelock's
upstream over localhost and is not exposed externally. The
agent's environment carries the resolved URLs (e.g.
`HTTPS_PROXY=http://proxy.internal:<pipelock-gateway-port>`).
bundle image defined in PRD 0024. pipelock / git-gate /
supervise bind `0.0.0.0:<port>` so the agent (reaching the
bundle via the allowed /32) can reach them. egress binds
`127.0.0.1:9099` inside the bundle so only pipelock can reach
it — the agent sees `<bundle-ip>:9099` refuse the connection
even though TSI's allowlist permits the IP. The agent's
environment carries IP-literal URLs (e.g.
`HTTPS_PROXY=http://<bundle-ip>:8888`).
- The agent guest image is produced from the existing `Dockerfile`
(or a thin variant), exported as an OCI archive, and consumed by
`smolvm machine create`. The image build step is part of `prepare`,
analogous to `docker_mod.build_image`.
via `smolvm pack create``.smolmachine` artifact, then loaded
into smolvm via `machine create --from <path>`. The image build
step is part of `prepare`, analogous to
`docker_mod.build_image`.
- The PRD 0022 sandbox-escape suite, run with
`CLAUDE_BOTTLE_BACKEND=smolmachines`, passes locally on a
smolmachines-capable host. The suite is updated to skip cleanly
@@ -182,15 +218,16 @@ The feature is **done** when all of the following ship:
value of `CLAUDE_BOTTLE_BACKEND`; smolmachines is strictly
opt-in until it has been load-bearing on at least one operator's
workflow for a release cycle.
- **No TSI for network policy.** libkrun's TSI mode is rejected
for this backend — it lacks per-port filtering on `127.0.0.0/8`
and would expose every host-loopback service to the guest. The
Smolfile must select libkrun's virtio-net mode and attach to
the per-bottle gvproxy unixgram socket; if that combination is
not supported by the pinned smolmachines version (see open
question 1), the implementation falls back to driving
Virtualization.framework directly via PyObjC and reuses the
same gvproxy attachment.
- **No `--outbound-localhost-only`.** That TSI flag opens the
entire `127.0.0.0/8` range and is the loopback gap the original
draft of this PRD called out. Use `--allow-cidr <bundle-ip>/32`
instead so the agent reaches one IP and one IP only.
- **No gvproxy.** Rejected after the chunk-1 spike against the
real smolvm CLI: smolvm 0.8.0 exposes no virtio-net-over-unixgram
attachment. Adopting gvproxy would have required dropping smolvm
and driving Virtualization.framework via PyObjC; the TSI
single-IP approach gives the same property at a fraction of the
cost.
- **No host bind mounts.** The smolmachines research note flagged
that `-v HOST:GUEST` mounts via virtiofs would defeat the
isolation goal. The manifest already has no concept of host
@@ -216,30 +253,35 @@ The feature is **done** when all of the following ship:
- New `claude_bottle/backend/smolmachines/` subpackage with the
full set of `BottleBackend` overrides.
- Smolfile generator (TOML), analogous to
`backend/docker/compose.py`'s `bottle_plan_to_compose`.
- Smolfile generator (TOML) emitting the smolvm 0.8.0 schema:
top-level `image`, `entrypoint`, `cmd`, `env = [...]`,
`[network] allow_cidrs = ["<bundle-ip>/32"]`. (The renderer
that chunk 1 shipped under the gvproxy design — `name=`,
`[[net]]` — gets rewritten as part of this chunk plan.)
- A host-side sidecar-bundle lifecycle manager that brings up
one container per bottle (the bundle image defined in PRD 0024),
publishes its one to three host ports, waits for readiness,
and tears it down with the bottle. This backend depends on
PRD 0024's bundle image; it does not own the bundle's
Dockerfile or init.
one container per bottle on a dedicated per-bottle docker
bridge with a pinned IP (`--ip <bundle-ip>`), waits for the
daemons to bind their ports, and tears it down with the bottle.
This backend depends on PRD 0024's bundle image; it does not
own the bundle's Dockerfile or init.
- Per-bottle CA install path: the bundle's CA cert lands inside
the microVM via `smolvm machine exec` after start
(analogous to the existing `provision_ca` for Docker).
- gvproxy lifecycle: per-bottle `gvproxy` started by the backend
before VM bringup, torn down after VM teardown, configured with
up to three `port_forwards` entries (gateway port → host
bundle port for each of pipelock / git-gate / supervise) and a
DNS section that resolves only `proxy.internal`. Subnet and
gateway IP are derived from the bottle slug so two concurrent
bottles don't collide.
- DNS policy: the bottle's `egress.allowlist` does *not* go into
gvproxy's DNS — the agent resolves only `proxy.internal`, and
pipelock on the host enforces the egress allowlist against
the actual upstream connect target. This keeps the DNS-exfil
attack (PRD 0022 test 4) blocked because gvproxy answers
NXDOMAIN for every name except `proxy.internal`.
- Per-bottle docker bridge: a `claude-bottle-bundle-<slug>`
network with a /24 subnet derived from the slug hash; the
bundle gets a pinned IP at `.2` (gateway is `.1`). Pinning the
IP at start time avoids a race between the bundle's IP being
assigned and the Smolfile being written.
- TSI policy: the Smolfile sets `[network] allow_cidrs =
["<bundle-ip>/32"]` and nothing else. The agent can reach the
bundle's IP (any port) and nothing else; no DNS resolution is
available inside the guest, so the agent uses IP-literal URLs.
- Bundle bind addresses: egress binds `127.0.0.1:9099` inside
the bundle (pipelock-only); pipelock / git-gate / supervise
bind `0.0.0.0` so the agent can reach them. This is the
port-granularity TSI's IP-only allowlist doesn't provide.
PRD 0024's bundle init may need a config knob for this;
raised as open question 4.
- Preflight `smolvm` check: if the user selects this backend and
`smolvm` isn't on `$PATH`, die with an install pointer (brew tap
+ version pin TBD in implementation; see open question 3).
@@ -248,7 +290,7 @@ The feature is **done** when all of the following ship:
rejects host mounts; this is a forward-compat check).
- Tests:
- Smoke unit-level test: Smolfile renderer produces the
expected TOML for a fixture bottle.
expected TOML for a fixture bottle (smolvm 0.8.0 shape).
- Integration test: `prepare → launch → exec("echo hi") →
teardown` on a smolmachines-capable host (skips otherwise
via the same env/platform gate the Docker integration tests
@@ -282,80 +324,65 @@ claude_bottle/backend/smolmachines/
launch.py @contextmanager launch(plan) -> SmolmachinesBottle
cleanup.py prepare_cleanup / cleanup / list_active
smolfile.py bottle_plan_to_smolfile(...) -> dict + render
gvproxy.py per-bottle gvproxy config render + process lifecycle
sidecar_bundle.py host-side lifecycle for the PRD 0024 bundle container
smolvm.py thin subprocess wrapper: machine create/start/exec/stop
vfkit_attach.py VZFileHandleNetworkDeviceAttachment + VFKT handshake
util.py slugify, port allocation, OCI archive helpers
sidecar_bundle.py host-side bundle lifecycle (per-bottle docker bridge + pinned IP)
smolvm.py thin subprocess wrapper: machine create/start/exec/stop, pack create
util.py slugify, subnet derivation, OCI archive helpers
provision/ ca.py, prompt.py, skills.py, git.py, supervise.py
```
Note what's NOT here vs. the original draft: `gvproxy.py`,
`vfkit_attach.py`. The gvproxy design needed both; the TSI single-IP
design needs neither.
### Network + egress topology
```
┌── macOS host ─────────────────────────────────────────────────────┐
│ │
│ ┌── per-bottle sidecar bundle (one container per microVM) ─┐ │
│ │ init.py (Python supervisor) │ │
│ │ ├─ pipelock (binds 0.0.0.0:8888 in container) │
│ │ ├─ egress (mitmproxy) (binds 127.0.0.1:p_internal)
│ │ ├─ git-gate (binds 0.0.0.0:8889) │ │
│ │ └─ supervise (MCP) (binds 0.0.0.0:8890) │ │
│ │ pipelock's upstream is 127.0.0.1:p_internal (egress); │ │
│ │ egress is not exposed outside the bundle. │ │
└─────────────────────────────────────────────────────┬─────┘
Host ports published (loopback, dynamic):
pipelock 127.0.0.1:<p1>
git-gate 127.0.0.1:<p2> (conditional) │
supervise 127.0.0.1:<p3> (conditional) │
▲ host TCP, reached via gvproxy port-forward
┌── gvproxy (per bottle) ─────────────────────────────┐
│ │ subnet: 192.168.127.X/24 (X derived from slug)
│ │ gateway: 192.168.127.X.1
│ │ port_forwards: │
│ │ - gateway 8888 → host 127.0.0.1:<p1>
│ │ - gateway 8889 → host 127.0.0.1:<p2> (cond) │
│ │ - gateway 8890 → host 127.0.0.1:<p3> (cond)
│ # nothing else │
│ │ DNS: proxy.internal → gateway IP; * → NXDOMAIN │ │
│ └─────────────────────────────────────────────────────┘ │
│ ▲ unixgram socket (VFKT handshake) │
│ │ │
│ ┌── microVM (per bottle) ─────────────────────────────┐ │
│ │ virtio-net device backed by VZFileHandle... │ │
│ │ env: HTTPS_PROXY=http://proxy.internal:8888 │ │
│ │ GIT_GATE_URL=http://proxy.internal:8889 │ │
│ │ MCP_SUPERVISE_URL=http://proxy.internal:8890 │ │
│ │ no other host visible │ │
│ └─────────────────────────────────────────────────────┘ │
│ ┌── per-bottle docker bridge claude-bottle-bundle-<slug> ──┐ │
│ │ subnet: 192.168.X.0/24 (X = hash(slug) mod 254) │ │
│ │
│ │ ┌── bundle container (pinned --ip 192.168.X.2) ────────┐ │ │
│ │ │ init.py (PRD 0024 Python supervisor)
│ │ │ ├─ pipelock (binds 0.0.0.0:8888)
│ │ │ ├─ egress (mitmproxy) (binds 127.0.0.1:9099)
│ │ │ ├─ git-gate (binds 0.0.0.0:9418)
│ │ └─ supervise (binds 0.0.0.0:9100) │ │
│ Internal-only egress is unreachable from outside │ │
the bundle even though TSI permits the IP.
│ └──────────────────────────────────────────────────────┘
└──────────────────────────────────────────────────────┬─────┘
┌── microVM (per bottle, libkrun via smolvm) ──────────▼─┐
│ Smolfile: [network] allow_cidrs = ["192.168.X.2/32"]
│ │ env: HTTPS_PROXY=http://192.168.X.2:8888 │ │
│ │ GIT_GATE_URL=git://192.168.X.2:9418 (cond.) │ │
│ │ MCP_SUPERVISE_URL=http://192.168.X.2:9100 (cond) │
│ │ No other host reachable — TSI denies any connect() │ │
│ │ that isn't to 192.168.X.2. No DNS inside the guest
│ │ (no resolver in the allowlist).
└────────────────────────────────────────────────────────┘
│ │
└───────────────────────────────────────────────────────────────────┘
```
What the guest can reach, exhaustively: **only `proxy.internal`
on the gateway-port set we configured.** Everything else —
host LAN, host loopback (Postgres, IDE plugins, other bottles'
sidecars), public internet directly — is gone, enforced at the
gvproxy userspace stack rather than relying on guest cooperation.
What the guest can reach, exhaustively: **only `<bundle-ip>` on
ports the bundle binds to 0.0.0.0**. Egress's 127.0.0.1-only bind
makes it bundle-internal; host loopback / LAN / public internet
direct are all refused by TSI's allowlist.
Three changes vs. the Docker backend:
1. **One sidecar container per bottle, not four.** The bundle
defined in PRD 0024 is the unit of sidecar lifecycle on both
backends. egress is internal to the bundle as pipelock's
upstream, never directly addressed.
2. **Sidecar container is on the host, not a sibling on a Docker
internal network.** Isolation primitive is gvproxy's explicit
port-forward list, not Docker's `--internal` flag.
3. **The agent's first hop is `proxy.internal`, not a sidecar's
container hostname.** Same scanning + DLP + auth-injection
chain, but the first hop crosses a userspace TCP/IP stack we
own, not a Docker-managed bridge.
git-gate and supervise are conditional port forwards: only
emitted into gvproxy's config when the bottle actually uses
them, narrowing the attack surface for bottles that don't.
1. **One sidecar container per bottle, not four.** Same bundle
image PRD 0024 ships for the docker backend.
2. **Sidecar container is on a per-bottle docker bridge with a
pinned IP**, reached directly by the smolvm guest's allowed
/32 — no localhost port allocation, no userspace TCP/IP stack
in the middle.
3. **The agent dials IP literals, not hostnames.** TSI doesn't
filter DNS at the protocol level, and we don't put DNS
resolvers in the allowlist, so name resolution is denied by
construction.
### Lifecycle
@@ -363,61 +390,59 @@ them, narrowing the attack surface for bottles that don't.
1. Cross-backend validation via `BottleBackend._validate` (skills,
git identity files).
2. Allocate one to three host loopback ports for the sidecar
bundle (pipelock always; git-gate and supervise conditional on
manifest — egress is internal to the bundle and gets no host
port).
3. Resolve the agent OCI archive path (build if missing, cache by
Dockerfile + agent-name hash). The sidecar-bundle image
(`claude-bottle-sidecars:<pinned>`) is pulled or built per
PRD 0024; this backend does not own its build.
4. Pick a per-bottle gvproxy subnet (e.g. `192.168.127.X/24` where
`X` is derived from the slug) and render
`stage_dir/gvproxy.yaml`: one DNS entry for `proxy.internal`
and one `port_forwards` entry per active sidecar port
(gateway port → host loopback port on the bundle).
5. Render the per-bottle Smolfile to `stage_dir/smolfile.toml`,
pinning command / env / a virtio-net device backed by the
gvproxy unixgram socket path. No TSI flags.
6. Resolve the in-VM CA paths so launch knows where to copy
2. Derive a per-bottle docker subnet from `sha256(slug) % 254`
(skipping the docker-default 17): `192.168.X.0/24`. The bundle
IP is always `192.168.X.2` (gateway is `.1`).
3. Resolve the agent guest image: convert the existing
`Dockerfile` into a `.smolmachine` artifact via
`smolvm pack create --image <name> -o <stage>/agent.smolmachine`
(idempotent, layer-cached).
4. Render the per-bottle Smolfile to `stage_dir/smolfile.toml`
using smolvm 0.8.0's schema:
- `image` / `entrypoint` / `cmd` — bundled into the
`.smolmachine` from the previous step (one Smolfile, one
artifact).
- `env = [...]` — `HTTPS_PROXY`, `NO_PROXY`, `NODE_EXTRA_CA_CERTS`,
etc., all pointing at IP-literal URLs (`http://192.168.X.2:8888`).
- `[network] allow_cidrs = ["192.168.X.2/32"]` — TSI's single
/32 allowlist.
5. Resolve the in-VM CA paths so launch knows where to copy
pipelock's CA after start.
7. Return a `SmolmachinesBottlePlan` carrying the slug, port map,
OCI archive path, Smolfile path, gvproxy config path, and
the bundle's container/run spec.
6. Return a `SmolmachinesBottlePlan` carrying the slug, bundle
subnet/IP, `.smolmachine` artifact path, Smolfile path, and
bundle run spec.
`SmolmachinesBottleBackend.launch(plan)`:
1. Start the sidecar bundle container with `docker run` (still
using the local Docker daemon for sidecars; the VM is what's
moving off Docker). Wait for its three readiness signals:
pipelock listening, git-gate listening (if enabled), supervise
listening (if enabled). Register the teardown callback.
2. Start the per-bottle `gvproxy` against the unixgram socket
path the Smolfile references, with `port_forwards` pointed at
the bundle's published host ports. Wait for the socket to
appear (the spike-style poll loop from `agent-vm-isolation.md`).
3. `smolvm machine create --smolfile <path>` and
`smolvm machine start <name>`. The Smolfile's virtio-net
device handshakes (`VFKT` magic) with gvproxy on start.
4. Provisioning: CA install → prompt → skills → git → supervise
config, each via `smolvm machine exec` (analogous to
`docker exec`).
5. Yield a `SmolmachinesBottle` whose `exec_claude` / `exec` /
1. Create the per-bottle docker bridge network
(`claude-bottle-bundle-<slug>` with the resolved subnet) and
start the sidecar bundle container with `docker run --network
... --ip <bundle-ip> ...`. Wait for its daemons to bind:
pipelock on 8888, git-gate on 9418 (conditional), supervise
on 9100 (conditional). Register teardown callbacks.
2. `smolvm machine create --from <stage>/agent.smolmachine
--smolfile <stage>/smolfile.toml <name>` and
`smolvm machine start --name <name>`. The Smolfile's TSI
allowlist gates outbound to the bundle's /32; libkrun's TSI
layer enforces it.
3. Provisioning: CA install → prompt → skills → git → supervise
config, each via `smolvm machine exec` / `smolvm machine cp`.
4. Yield a `SmolmachinesBottle` whose `exec_claude` / `exec` /
`cp_in` all funnel through `smolvm machine exec` /
`smolvm machine cp`.
6. Teardown: stop and remove the VM → stop gvproxy → stop +
remove the sidecar bundle container.
5. Teardown: stop and delete the VM → stop + remove the bundle
container → remove the per-bottle docker network.
### Data model
No manifest schema change. `bottles[]` continues to carry
`egress.allowlist`, `env`, `git`, `skills` references, etc.; the
smolmachines backend reads the same fields as the docker backend.
`egress.allowlist` is enforced by pipelock on the host side
(unchanged from the docker backend); gvproxy's DNS resolves only
`proxy.internal` regardless of the allowlist's contents, so an
agent that bypasses pipelock by raw IP cannot resolve any name
gvproxy doesn't know about.
`egress.allowlist` is enforced by pipelock inside the bundle
(unchanged from the docker backend); the guest has no DNS resolver
in TSI's allowlist, so an agent that tries to dial an arbitrary
hostname can't resolve it in the first place — the DNS-exfil
attack from PRD 0022 test 4 is blocked at the resolver step.
The `BottleSpec` dataclass and the `Bottle` ABC do not change.
@@ -442,37 +467,38 @@ The existing "unknown backend" `die()` path stays as-is.
- `smolvm` CLI binary on `$PATH` (one new external dep, gated by
the preflight check). Pinned version policy is deferred to the
open questions; v1 reads `smolvm --version` and refuses to launch
outside a known-good range.
- `gvproxy` binary on `$PATH`
(`go install github.com/containers/gvisor-tap-vsock/cmd/gvproxy@latest`,
or vendored). Same preflight pattern as `smolvm`.
- `pyobjc-framework-Virtualization` *only* if smolmachines does
not expose a way to attach virtio-net to a unixgram socket and
we fall back to driving Virtualization.framework directly (see
open question 1). Default path is "no PyObjC needed."
outside a known-good range (currently 0.8.x).
- No `gvproxy` dep (the original draft listed it; dropped after
the chunk-1 spike).
- No `pyobjc-framework-Virtualization` dep (dropped from the
original draft for the same reason).
- No new pure-Python packages. Subprocess + stdlib `tomllib` for
Smolfile authoring; the gvproxy YAML is small enough to render
by hand from a `dict[str, Any]`.
Smolfile authoring.
### Acceptance test plan
- **Unit (smolfile):** `tests/unit/test_smolfile.py` verifies the
renderer produces the expected TOML for a fixture bottle
command line, env entries, virtio-net device referencing the
expected unixgram socket path, no TSI flags.
- **Unit (gvproxy config):** `tests/unit/test_gvproxy_config.py`
verifies the per-bottle YAML has exactly one DNS entry
(`proxy.internal`), one `port_forwards` entry per active
sidecar pointed at the resolved host loopback port, and a
per-bottle subnet/gateway derived from the slug.
renderer produces the expected TOML for a fixture bottle in
smolvm 0.8.0's schema — top-level `image` / `entrypoint` /
`cmd` / `env`, plus `[network] allow_cidrs = ["<bundle-ip>/32"]`
and nothing else under `[network]`.
- **Unit (subnet derivation):** the existing
`test_smolmachines_util.py` covers the per-bottle subnet hash
+ collision-avoidance and stays as-is.
- **Integration smoke:** `tests/integration/test_smolmachines_smoke.py`
with `prepare → launch → exec → teardown`, guarded by a
`smolvm` + `gvproxy` presence check + macOS / KVM platform check.
`smolvm` presence check + macOS / KVM platform check.
- **Localhost-reach probe:** a focused integration test that
brings up a bottle, has the host bind a test service on
`127.0.0.1:<unused-port>`, and asserts the in-bottle agent
cannot connect to it. This is the regression test for the
exact gap that motivated choosing gvproxy over TSI.
exact gap `--outbound-localhost-only` would have introduced —
with `--allow-cidr <bundle-ip>/32` only, the probe must fail.
- **Egress-port-bypass probe:** also brings up a bottle and
asserts the in-bottle agent's connect to `<bundle-ip>:9099`
(egress's port) is refused — confirming the bundle-internal
bind of egress to `127.0.0.1` works as the port-granularity
layer TSI doesn't provide.
- **PRD 0022 re-run:** with `CLAUDE_BOTTLE_BACKEND=smolmachines`,
all five attack categories return sandbox-block markers and the
suite passes. The test code does not change beyond the env-var
@@ -484,28 +510,32 @@ The existing "unknown backend" `die()` path stays as-is.
PRD 0024's bundle image is a prerequisite — this PRD assumes
`claude-bottle-sidecars:<pinned>` is available when chunk 3 lands.
1. **Backend skeleton + selection + Smolfile + gvproxy renderers.**
Subpackage layout, `_resolve_plan` stub that emits both a
TOML Smolfile and a gvproxy YAML but doesn't launch anything,
`_BACKENDS` registration, preflight `smolvm` + `gvproxy`
checks. Unit tests on both renderers. No VM bringup yet.
2. **gvproxy + VM lifecycle + OCI archive build.** `smolvm.py`
and `gvproxy.py` subprocess wrappers, prepare-time image
build (existing Dockerfile → OCI archive), launch path that
starts gvproxy, brings up the VM attached to gvproxy's socket
via VFKT handshake, exec into the VM, tear everything down.
1. **Backend skeleton + selection + Smolfile/gvproxy renderers.**
*Shipped (PR #62), but under the now-rejected gvproxy design.*
The Smolfile renderer emits `name = …` / `[[net]]` instead of
smolvm 0.8.0's `image` / `[network] allow_cidrs`. The gvproxy
renderer is dead. Chunk 2 rewrites the Smolfile renderer and
deletes `gvproxy_config.py` / its tests.
2. **VM lifecycle + bundle bringup + Smolfile rewrite.**
`smolvm.py` subprocess wrapper, prepare-time image conversion
(`smolvm pack create` → `.smolmachine`), per-bottle docker
bridge + bundle container with pinned IP, launch path that
starts the bundle and brings up the VM (`smolvm machine create
--from --smolfile`), exec into the VM, tear everything down.
Smoke integration test: `exec("echo hi")` inside a started
VM. Includes the localhost-reach probe test from the
acceptance plan.
3. **Sidecar bundle lifecycle.** `sidecar_bundle.py`: per-bottle
bundle container brought up via `docker run`, with one to
three published host ports, gvproxy `port_forwards` pointed
at them, and teardown integrated into the bottle's lifecycle.
Port allocator. No provisioning yet beyond what the bundle
needs.
VM. Includes the localhost-reach probe + egress-port-bypass
probe from the acceptance plan. The chunk-1 Smolfile renderer
gets rewritten to the smolvm 0.8.0 schema; `gvproxy_config.py`
and `gvproxy.py` (if any) get deleted.
3. **Bundle bind-address mitigation.** Update PRD 0024's bundle
init to bind egress on `127.0.0.1:9099` instead of `0.0.0.0`
(or expose a config knob — open question 4). Reverify the
egress-port-bypass probe. Pipelock / git-gate / supervise
continue to bind `0.0.0.0`.
4. **Provisioning parity with Docker.** CA install via
`smolvm machine exec`, prompt/skills/.git copy-in, supervise
MCP config. End-to-end `start` works for a real agent manifest.
`smolvm machine cp`, prompt/skills/.git copy-in, supervise
MCP config. End-to-end `start` works for a real agent
manifest.
5. **PRD 0022 sandbox-escape suite green.** Skip-guard update,
small adjustments to test helpers if any (the test uses
`bottle.exec(script)` and inspects `returncode` + body for
@@ -514,78 +544,68 @@ PRD 0024's bundle image is a prerequisite — this PRD assumes
## Open questions
1. **VMM choice: smolmachines vs PyObjC + Virtualization.framework.**
The network design requires libkrun's virtio-net mode attached
to a unixgram socket (so gvproxy is the gateway). The
smolmachines research note says libkrun *has* a virtio-net
mode but says it "does not support policy" — meaning libkrun
itself enforces no allowlist in that mode, which is exactly
what we want (gvproxy is the policy). What's unverified is
whether the Smolfile surface lets us point virtio-net at a
custom unixgram socket. If yes: this is a smolmachines backend
verbatim. If no: chunk 2 drops `smolvm` and drives
`Virtualization.framework` via PyObjC directly (the recipe in
`agent-vm-isolation.md` § "gvisor-tap-vsock + PyObjC +
Pipelock"), keeping the backend name "smolmachines" because
the operator-facing UX is unchanged. Resolve in chunk 1 via a
spike against the pinned smolmachines version.
2. **`smolvm` + `gvproxy` install policy.** Pin via brew /
`go install` versions, or vendor binaries in the repo. v1
likely runs `smolvm --version` / `gvproxy --help` at preflight
and accepts a documented range; vendoring is heavier but
reduces "works on my Mac" drift.
3. **CA install inside the OCI overlay.** Two paths: bake at
prepare time (one OCI archive per CA fingerprint, big cache
key) vs. inject at start time via `smolvm machine exec` after
the VM is up. PRD 0006 chose the runtime path for Docker
(docker-cp + `update-ca-certificates`); smolvm has the same
shape via `machine exec`. Default to runtime injection unless
it conflicts with VM start order.
4. **gvproxy subnet collision.** Two concurrent bottles must not
land on the same `192.168.127.X/24` subnet — they'd both want
the same gateway IP. Derive the third octet from a hash of
the slug (mod 254, skip the docker-default 17), and at launch
time confirm the subnet isn't already in use by another
bottle's gvproxy. Resolve the hash-collision policy in
chunk 2.
1. **~~VMM choice~~ Resolved.** Chunk-1 spike against `smolvm
0.8.0` confirmed there's no virtio-net-over-unixgram option;
the gvproxy design isn't viable on top of smolvm. Resolved
by switching to TSI `--allow-cidr <bundle-ip>/32` + bundle
bind-address mitigation; smolvm stays as the VMM. See the
"Design pivot from the first draft" section.
2. **`smolvm` install policy.** Pin via brew / `curl install.sh`,
or vendor a binary in the repo. v1 likely runs
`smolvm --version` at preflight and accepts a documented range
(currently 0.8.x). The
`curl -sSL https://smolmachines.com/install.sh | sh` path is
what the operator used; document it in the README.
3. **CA install inside the agent guest.** Two paths: bake at
prepare time (one `.smolmachine` artifact per CA fingerprint,
big cache key) vs. inject at start time via `smolvm machine
cp` after the VM is up. PRD 0006 chose the runtime path for
Docker (docker-cp + `update-ca-certificates`); smolvm has the
same shape via `machine cp` + `machine exec`. Default to
runtime injection.
4. **Bundle bind-address knob.** PRD 0024's bundle currently runs
all four daemons under one supervisor with daemon argv
hardcoded. To make egress bind `127.0.0.1:9099` instead of
`0.0.0.0:9099`, either: (a) edit the supervisor's
`_DAEMONS` entry to pass a `--listen-host 127.0.0.1` flag to
mitmdump, OR (b) introduce a per-daemon `bind_localhost`
knob the renderer can set. Option (a) is simpler and matches
that egress is bundle-internal regardless of backend; resolve
in chunk 3.
5. **`bottle.exec(script)` exit-code fidelity.** The PRD 0022 test
suite reads `returncode` + stdout + stderr from
`ExecResult`. Confirm the VM-exec path (`smolvm machine exec`
or its PyObjC equivalent) propagates exit codes and separated
streams. The research note's "external integration is the CLI"
implies yes, but the embedded SDK bug it flagged suggests we
should verify before coding around it.
`ExecResult`. Confirm `smolvm machine exec` propagates exit
codes and separated streams. The CLI help mentions a
`--stream` flag for streaming output; behavior under default
(non-stream) mode is what we want — verify in chunk 2.
6. **CI gating.** Gitea's act_runner is Linux without nested KVM,
so this backend's integration tests will skip there for the
same structural reason the Docker bringup tests do (no real
isolation primitive available on the runner). The skip
predicate becomes `not (smolvm_available() and gvproxy_available()
and platform.system() == "Darwin")`. CI coverage for this
backend will come from local runs on the maintainer's macOS
host until a Darwin runner is wired up; ack that as a known
gap.
same structural reason the Docker bringup tests do. The skip
predicate becomes `not (smolvm_available() and
platform.system() == "Darwin")`. CI coverage for this backend
will come from local runs on the maintainer's macOS host
until a Darwin runner is wired up; ack that as a known gap.
7. **Active bottle discovery.** Docker uses container labels to
enumerate active bottles (`list_active` queries the daemon).
The microVM enumeration story is `smolvm machine list`
(or the PyObjC backend's own bookkeeping); the plan is to
mirror the label scheme via Smolfile metadata
(`labels = { "claude-bottle" = "1" }`-style entries, if the
format supports it; otherwise via a deterministic name prefix
`claude-bottle-<slug>` + on-disk metadata under
`state/<slug>/`).
The microVM enumeration story is `smolvm machine ls --json`;
the plan is to filter on a deterministic name prefix
`claude-bottle-<slug>` + cross-reference with on-disk metadata
under `state/<slug>/`.
## References
- `docs/research/agent-vm-isolation.md` — primary reference for
the gvproxy + `VZFileHandleNetworkDeviceAttachment` network
attachment used here. The "Full Setup: gvisor-tap-vsock +
PyObjC + Pipelock" section is the recipe the PyObjC fallback
in open question 1 would adopt verbatim.
- `docs/research/agent-vm-isolation.md` — describes the
gvproxy + `VZFileHandleNetworkDeviceAttachment` path. The
current design no longer needs that recipe (the TSI single-IP
approach replaced it after the chunk-1 spike); kept for
historical context if a future operator needs to drop smolvm
and own the VM lifecycle directly.
- `docs/research/smolmachines-as-vm-backend.md` — evaluation of
smolmachines as the VM lifecycle wrapper. This PRD diverges
from its conclusion on the *network* primitive (rejecting TSI
in favor of gvproxy) but keeps its VM-lifecycle conclusion
conditional on the libkrun-virtio-net spike in open question 1.
smolmachines as the VM lifecycle wrapper. The research note's
TSI-bad-due-to-loopback-gap argument turned out to apply only
to `--outbound-localhost-only`, not to TSI generally; this PRD
uses `--allow-cidr <bundle-ip>/32` instead, sidestepping the
gap.
- `docs/research/agent-sandbox-landscape.md` — identifies
`"runtime": "microvm"`-style opt-in as the borrowable idea;
smolmachines is the concrete implementation.