From 5929caa21934175a1c9e6effa67349219dda7134 Mon Sep 17 00:00:00 2001 From: claude Date: Wed, 27 May 2026 03:47:03 -0400 Subject: [PATCH] docs(prd-0023): pivot to smolvm + TSI single-IP allowlist MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Chunk-1's empirical spike against smolvm 0.8.0 contradicted the research note that motivated the gvproxy network design: smolvm exposes no virtio-net-over-unixgram attachment. The first draft's "why gvproxy, not TSI" argument turns out to apply only to `--outbound-localhost-only`, not to TSI generally. New design: - Bundle (PRD 0024) runs on a dedicated per-bottle docker bridge with a pinned IP. Smolfile sets `[network] allow_cidrs = ["/32"]` and nothing else. Agent can reach the bundle and nothing else — host loopback, LAN, public internet directly are all refused at the VMM (TSI) layer. - Bind-address mitigation: egress binds 127.0.0.1:9099 inside the bundle (pipelock-internal); pipelock / git-gate / supervise bind 0.0.0.0 so the agent (across the TSI allowlist) can reach them. This is the port-granularity TSI's IP-only allowlist doesn't provide. - Smolfile renderer rewritten in chunk 2 to smolvm 0.8.0's actual schema (image / entrypoint / cmd / env / [network] allow_cidrs). The chunk-1 renderer (name= / [[net]]= under the gvproxy design) emits the wrong shape and will be replaced. - Drop gvproxy + VZFileHandleNetworkDeviceAttachment + the PyObjC fallback. Backend layout loses gvproxy_config.py, gvproxy.py, vfkit_attach.py. - Acceptance plan adds an egress-port-bypass probe in addition to the localhost-reach probe. - Chunks reshape: chunk 1 stays (renderer rewrite is part of chunk 2's cost); chunk 2 covers VM lifecycle + bundle + new Smolfile renderer; chunk 3 is the bundle bind-address change; chunks 4-5 unchanged in spirit. Co-Authored-By: Claude Opus 4.7 --- docs/prds/0023-smolmachines-backend.md | 628 +++++++++++++------------ 1 file changed, 324 insertions(+), 304 deletions(-) diff --git a/docs/prds/0023-smolmachines-backend.md b/docs/prds/0023-smolmachines-backend.md index 502cc4f..3fe1e3f 100644 --- a/docs/prds/0023-smolmachines-backend.md +++ b/docs/prds/0023-smolmachines-backend.md @@ -9,36 +9,54 @@ Ship a second concrete `BottleBackend` — `SmolmachinesBottleBackend`, selected via `CLAUDE_BOTTLE_BACKEND=smolmachines` — that runs each bottle inside -a per-agent microVM on macOS. The egress topology is enforced by -**gvproxy** (gvisor-tap-vsock), a userspace TCP/IP stack the guest's -virtio-net device is wired into via `VZFileHandleNetworkDeviceAttachment`. -gvproxy's only outbound configuration is an explicit per-bottle -port-forward set into a **single per-bottle sidecar container** that -bundles pipelock + egress + git-gate + supervise behind one supervised -init. Everything else — the host's LAN, the host's loopback -services, the public internet — is unreachable from the guest by -construction. +a per-agent libkrun microVM via `smolvm`. Egress is enforced by +libkrun's TSI ("Transport Socket Interface") allowlist set to a +**single /32** — the docker IP of the per-bottle sidecar bundle +(PRD 0024) on a dedicated docker bridge. Everything else — host +loopback, LAN, public internet directly — is denied at the VMM +layer, before a host-side socket is ever opened. -The sidecar bundle is the same image PRD 0024 introduces for the -docker backend; this PRD consumes it. Inside the bundle, egress is -pipelock's internal upstream over localhost and is not exposed -externally. gvproxy port-forwards three external ports into the -bundle: pipelock (for `HTTPS_PROXY`), git-gate (for git push), and -supervise (for MCP). - -This explicitly rejects libkrun's TSI ("Transport Socket Interface") -allowlist as the network primitive. TSI's `--outbound-localhost-only` -is permissive on the entire `127.0.0.0/8` range with no -destination-port filter — the agent can dial any host-side service -bound to loopback (a local Postgres, an IDE plugin, a different -bottle's pipelock). That's the wrong default for a malicious-agent -threat model; see "Why gvproxy, not TSI" below. +The sidecar bundle is the same image PRD 0024 ships for the docker +backend; this PRD consumes it. Inside the bundle, pipelock / +git-gate / supervise bind `0.0.0.0:` so the agent (reaching +the bundle via the allowed /32) can talk to them; egress (the +internal upstream of pipelock) binds `127.0.0.1:9099` so it's only +reachable from pipelock within the bundle — the agent can't dial +it directly even though TSI's allowlist is IP-granular rather than +port-granular. The Docker backend ships unchanged; this is opt-in via the existing env-var selector. The acceptance gate is PRD 0022's `tests/integration/test_sandbox_escape.py` running green against `CLAUDE_BOTTLE_BACKEND=smolmachines`. +### Design pivot from the first draft + +The original PRD landed (PR #53) calling for **gvproxy** as the +network primitive — a userspace TCP/IP stack the guest's virtio-net +device would hook into via `VZFileHandleNetworkDeviceAttachment`, +with explicit `port_forwards` controlling what the guest could +reach. That design was built around the smolmachines research +note's claim that libkrun supports a virtio-net mode separate +from TSI. + +Chunk 1's empirical spike against `smolvm 0.8.0`'s actual CLI +contradicted that claim: smolvm exposes only TSI-style egress +filters (`--allow-host`, `--allow-cidr`, `--outbound-localhost-only`), +with no documented option to attach virtio-net to a custom unixgram +socket. The gvproxy path would have required dropping smolvm +entirely and driving `Virtualization.framework` via PyObjC. + +Re-examining the "why gvproxy" argument with smolvm's real surface, +the loopback gap PRD 0023 worried about only exists with +`--outbound-localhost-only`. With `--allow-cidr /32` +instead — and no `--outbound-localhost-only` — the agent can reach +exactly one IP (the bundle) and nothing else: not host loopback, +not LAN, not public internet. That's the same security property +the gvproxy design was chasing, enforced one layer lower (VMM +socket interception, not a userspace TCP/IP stack we maintain), +with significantly less code. + ## Problem `agent-vm-isolation.md` argues for hardware-isolated microVMs over @@ -55,8 +73,10 @@ and four things motivate a second one now: an editor plugin, another bottle's sidecar) without traversing pipelock. The Docker backend's `--internal` network blocks the first; nothing in the current stack blocks the second cleanly. - This PRD's gvproxy-based design closes both gaps: the guest can - only reach the explicit port-forward list, period. + This PRD's design closes both gaps via TSI's + `--allow-cidr /32`: the guest can only dial that one + IP, period. Host loopback, LAN, and the public internet are + refused at the VMM layer. - **Isolation ceiling.** On macOS the Docker backend's agent container shares Docker Desktop's host VM with every other bottle. Container escape from claude-code lands the agent inside @@ -77,30 +97,46 @@ and four things motivate a second one now: clean in places where Docker-specific assumptions have been tolerated. -## Why gvproxy, not TSI +## How TSI's single-IP allowlist achieves the property libkrun's TSI hijacks guest socket syscalls inside the VMM and -opens the actual sockets from the host process, with a CIDR -allowlist gate. That works fine for blocking LAN reach (don't -allowlist `192.168.0.0/16`, agent can't dial it). But TSI's -`--outbound-localhost-only` permits the *entire* `127.0.0.0/8` -range across all ports — there is no destination-port filter at -the TSI layer (`smolmachines-as-vm-backend.md` flags this in the -"`--allow-host` semantics" caveat). For our threat model that -means any host-loopback service is reachable from the guest. +opens the actual sockets from the host process, gated by a CIDR +allowlist. Three flags expose the allowlist: -gvproxy implements a full userspace TCP/IP stack on the host side -of a `VZFileHandleNetworkDeviceAttachment` unixgram socket. The -guest has a real virtio-net device; gvproxy is its gateway. The -guest can only reach what gvproxy is configured to forward — -typically a single port forward to the per-bottle pipelock — -and DNS resolves NXDOMAIN by default. There is no "permissive -loopback" mode to mis-configure; if it's not in `port_forwards`, -the guest cannot reach it. +- `--outbound-localhost-only` — opens up the whole `127.0.0.0/8` + range, all ports. This is the flag the first draft of this PRD + rejected, and we still reject it: it would let the agent dial + any host-loopback service (local Postgres, IDE plugins, another + bottle's sidecar). +- `--allow-cidr CIDR` — IP/CIDR allowlist with no port filter. +- `--allow-host HOSTNAME` — resolves the host on the host's DNS + at VM-start time, stores the result as `/32` CIDRs, and also + enables guest-side DNS filtering (only the allowed hostname + resolves). -That property — *explicit allowlist by port forward, not CIDR* — -is the load-bearing reason this PRD chooses gvproxy. TSI shows up -once more in this doc, under Non-goals, where it is closed off. +This backend uses `--allow-cidr /32` (single host) and +nothing else. With the bundle running as a docker container with a +known IP on a dedicated docker bridge, the agent can reach exactly +one address: the bundle. Host loopback is denied (not in the +allowlist). LAN is denied. Public internet directly is denied. DNS +inside the guest is denied (no resolver in the allowlist) — the +agent uses an IP literal for `HTTPS_PROXY`. + +The one wrinkle TSI doesn't directly handle is **port granularity +within the allowed IP**. The bundle runs four daemons; pipelock / +git-gate / supervise are agent-facing, egress is pipelock's +internal upstream. If egress were bound to `0.0.0.0:9099` inside +the bundle, the agent could dial `:9099` and bypass +pipelock's DLP. We mitigate by binding egress to `127.0.0.1:9099` +*inside* the bundle so only pipelock — also in the bundle, on the +same localhost — can reach it. The bind-address strategy gives us +port-level isolation that TSI's IP-only allowlist doesn't. + +Net result: same security property the first draft chased with +gvproxy, enforced at the VMM layer rather than via a userspace +TCP/IP stack, with significantly less code (no gvproxy lifecycle, +no `VZFileHandleNetworkDeviceAttachment` plumbing, no Smolfile +virtio-net carve-out smolvm doesn't expose anyway). ## Goals / Success Criteria @@ -133,33 +169,33 @@ The feature is **done** when all of the following ship: - `SmolmachinesBottleBackend` registered under the `"smolmachines"` key in `claude_bottle/backend/__init__.py:_BACKENDS`. - Per-bottle Smolfile generation: a runtime-rendered TOML written - to the bottle's stage dir, analogous to the compose file the - Docker backend writes today. The Smolfile pins `command`, - `env`, and a virtio-net device backed by a unixgram socket - pointed at the per-bottle gvproxy. There is no TSI - `--allow-cidr` / `--outbound-localhost-only` / `--allow-host` - in the Smolfile — TSI is not used. -- Per-bottle gvproxy: one `gvproxy` process per bottle, started - before the VM, listening on a unixgram socket the VM's - virtio-net device hooks into. The gvproxy config has up to - three `port_forwards` entries (pipelock / git-gate / supervise - — git-gate and supervise only when the bottle uses them) all - pointing at the per-bottle sidecar bundle's exposed ports, plus - a DNS section that resolves only `proxy.internal`. Every other - hostname returns NXDOMAIN; every other destination is - unreachable. + to the bottle's stage dir using smolvm 0.8.0's actual schema + (`image`, `entrypoint`, `cmd`, `env = ["K=V", …]`, `[network] + allow_cidrs = ["/32"]`). The renderer chunk 1 + shipped emits the wrong shape (built around the gvproxy + unixgram attachment) — it gets rewritten in this chunk plan as + the cost of the design pivot. +- Per-bottle docker bridge for the bundle: the sidecar bundle + runs as a docker container on a dedicated per-bottle bridge + network with a pinned IP (`--ip ` against a + per-slug `/24` derived from the slug hash). The pinned IP is + what TSI's allowlist points at; without pinning we'd need to + inspect the running container's IP and feed it back into the + Smolfile, which is a race. - Per-bottle sidecar bundle: one container per bottle running the - bundle image defined in PRD 0024. The bundle exposes up to - three host ports (pipelock for `HTTPS_PROXY`, git-gate for git - push, supervise for MCP), bound to `127.0.0.1` on dynamically - allocated ports. egress runs *inside* the bundle as pipelock's - upstream over localhost and is not exposed externally. The - agent's environment carries the resolved URLs (e.g. - `HTTPS_PROXY=http://proxy.internal:`). + bundle image defined in PRD 0024. pipelock / git-gate / + supervise bind `0.0.0.0:` so the agent (reaching the + bundle via the allowed /32) can reach them. egress binds + `127.0.0.1:9099` inside the bundle so only pipelock can reach + it — the agent sees `:9099` refuse the connection + even though TSI's allowlist permits the IP. The agent's + environment carries IP-literal URLs (e.g. + `HTTPS_PROXY=http://:8888`). - The agent guest image is produced from the existing `Dockerfile` - (or a thin variant), exported as an OCI archive, and consumed by - `smolvm machine create`. The image build step is part of `prepare`, - analogous to `docker_mod.build_image`. + via `smolvm pack create` → `.smolmachine` artifact, then loaded + into smolvm via `machine create --from `. The image build + step is part of `prepare`, analogous to + `docker_mod.build_image`. - The PRD 0022 sandbox-escape suite, run with `CLAUDE_BOTTLE_BACKEND=smolmachines`, passes locally on a smolmachines-capable host. The suite is updated to skip cleanly @@ -182,15 +218,16 @@ The feature is **done** when all of the following ship: value of `CLAUDE_BOTTLE_BACKEND`; smolmachines is strictly opt-in until it has been load-bearing on at least one operator's workflow for a release cycle. -- **No TSI for network policy.** libkrun's TSI mode is rejected - for this backend — it lacks per-port filtering on `127.0.0.0/8` - and would expose every host-loopback service to the guest. The - Smolfile must select libkrun's virtio-net mode and attach to - the per-bottle gvproxy unixgram socket; if that combination is - not supported by the pinned smolmachines version (see open - question 1), the implementation falls back to driving - Virtualization.framework directly via PyObjC and reuses the - same gvproxy attachment. +- **No `--outbound-localhost-only`.** That TSI flag opens the + entire `127.0.0.0/8` range and is the loopback gap the original + draft of this PRD called out. Use `--allow-cidr /32` + instead so the agent reaches one IP and one IP only. +- **No gvproxy.** Rejected after the chunk-1 spike against the + real smolvm CLI: smolvm 0.8.0 exposes no virtio-net-over-unixgram + attachment. Adopting gvproxy would have required dropping smolvm + and driving Virtualization.framework via PyObjC; the TSI + single-IP approach gives the same property at a fraction of the + cost. - **No host bind mounts.** The smolmachines research note flagged that `-v HOST:GUEST` mounts via virtiofs would defeat the isolation goal. The manifest already has no concept of host @@ -216,30 +253,35 @@ The feature is **done** when all of the following ship: - New `claude_bottle/backend/smolmachines/` subpackage with the full set of `BottleBackend` overrides. -- Smolfile generator (TOML), analogous to - `backend/docker/compose.py`'s `bottle_plan_to_compose`. +- Smolfile generator (TOML) emitting the smolvm 0.8.0 schema: + top-level `image`, `entrypoint`, `cmd`, `env = [...]`, + `[network] allow_cidrs = ["/32"]`. (The renderer + that chunk 1 shipped under the gvproxy design — `name=`, + `[[net]]` — gets rewritten as part of this chunk plan.) - A host-side sidecar-bundle lifecycle manager that brings up - one container per bottle (the bundle image defined in PRD 0024), - publishes its one to three host ports, waits for readiness, - and tears it down with the bottle. This backend depends on - PRD 0024's bundle image; it does not own the bundle's - Dockerfile or init. + one container per bottle on a dedicated per-bottle docker + bridge with a pinned IP (`--ip `), waits for the + daemons to bind their ports, and tears it down with the bottle. + This backend depends on PRD 0024's bundle image; it does not + own the bundle's Dockerfile or init. - Per-bottle CA install path: the bundle's CA cert lands inside the microVM via `smolvm machine exec` after start (analogous to the existing `provision_ca` for Docker). -- gvproxy lifecycle: per-bottle `gvproxy` started by the backend - before VM bringup, torn down after VM teardown, configured with - up to three `port_forwards` entries (gateway port → host - bundle port for each of pipelock / git-gate / supervise) and a - DNS section that resolves only `proxy.internal`. Subnet and - gateway IP are derived from the bottle slug so two concurrent - bottles don't collide. -- DNS policy: the bottle's `egress.allowlist` does *not* go into - gvproxy's DNS — the agent resolves only `proxy.internal`, and - pipelock on the host enforces the egress allowlist against - the actual upstream connect target. This keeps the DNS-exfil - attack (PRD 0022 test 4) blocked because gvproxy answers - NXDOMAIN for every name except `proxy.internal`. +- Per-bottle docker bridge: a `claude-bottle-bundle-` + network with a /24 subnet derived from the slug hash; the + bundle gets a pinned IP at `.2` (gateway is `.1`). Pinning the + IP at start time avoids a race between the bundle's IP being + assigned and the Smolfile being written. +- TSI policy: the Smolfile sets `[network] allow_cidrs = + ["/32"]` and nothing else. The agent can reach the + bundle's IP (any port) and nothing else; no DNS resolution is + available inside the guest, so the agent uses IP-literal URLs. +- Bundle bind addresses: egress binds `127.0.0.1:9099` inside + the bundle (pipelock-only); pipelock / git-gate / supervise + bind `0.0.0.0` so the agent can reach them. This is the + port-granularity TSI's IP-only allowlist doesn't provide. + PRD 0024's bundle init may need a config knob for this; + raised as open question 4. - Preflight `smolvm` check: if the user selects this backend and `smolvm` isn't on `$PATH`, die with an install pointer (brew tap + version pin TBD in implementation; see open question 3). @@ -248,7 +290,7 @@ The feature is **done** when all of the following ship: rejects host mounts; this is a forward-compat check). - Tests: - Smoke unit-level test: Smolfile renderer produces the - expected TOML for a fixture bottle. + expected TOML for a fixture bottle (smolvm 0.8.0 shape). - Integration test: `prepare → launch → exec("echo hi") → teardown` on a smolmachines-capable host (skips otherwise via the same env/platform gate the Docker integration tests @@ -282,80 +324,65 @@ claude_bottle/backend/smolmachines/ launch.py @contextmanager launch(plan) -> SmolmachinesBottle cleanup.py prepare_cleanup / cleanup / list_active smolfile.py bottle_plan_to_smolfile(...) -> dict + render - gvproxy.py per-bottle gvproxy config render + process lifecycle - sidecar_bundle.py host-side lifecycle for the PRD 0024 bundle container - smolvm.py thin subprocess wrapper: machine create/start/exec/stop - vfkit_attach.py VZFileHandleNetworkDeviceAttachment + VFKT handshake - util.py slugify, port allocation, OCI archive helpers + sidecar_bundle.py host-side bundle lifecycle (per-bottle docker bridge + pinned IP) + smolvm.py thin subprocess wrapper: machine create/start/exec/stop, pack create + util.py slugify, subnet derivation, OCI archive helpers provision/ ca.py, prompt.py, skills.py, git.py, supervise.py ``` +Note what's NOT here vs. the original draft: `gvproxy.py`, +`vfkit_attach.py`. The gvproxy design needed both; the TSI single-IP +design needs neither. + ### Network + egress topology ``` ┌── macOS host ─────────────────────────────────────────────────────┐ │ │ - │ ┌── per-bottle sidecar bundle (one container per microVM) ─┐ │ - │ │ init.py (Python supervisor) │ │ - │ │ ├─ pipelock (binds 0.0.0.0:8888 in container) │ │ - │ │ ├─ egress (mitmproxy) (binds 127.0.0.1:p_internal) │ │ - │ │ ├─ git-gate (binds 0.0.0.0:8889) │ │ - │ │ └─ supervise (MCP) (binds 0.0.0.0:8890) │ │ - │ │ pipelock's upstream is 127.0.0.1:p_internal (egress); │ │ - │ │ egress is not exposed outside the bundle. │ │ - │ └─────────────────────────────────────────────────────┬─────┘ │ - │ Host ports published (loopback, dynamic): │ │ - │ pipelock 127.0.0.1: │ │ - │ git-gate 127.0.0.1: (conditional) │ │ - │ supervise 127.0.0.1: (conditional) │ │ - │ ▲ host TCP, reached via gvproxy port-forward │ - │ │ │ - │ ┌── gvproxy (per bottle) ─────────────────────────────┐ │ - │ │ subnet: 192.168.127.X/24 (X derived from slug) │ │ - │ │ gateway: 192.168.127.X.1 │ │ - │ │ port_forwards: │ │ - │ │ - gateway 8888 → host 127.0.0.1: │ │ - │ │ - gateway 8889 → host 127.0.0.1: (cond) │ │ - │ │ - gateway 8890 → host 127.0.0.1: (cond) │ │ - │ │ # nothing else │ │ - │ │ DNS: proxy.internal → gateway IP; * → NXDOMAIN │ │ - │ └─────────────────────────────────────────────────────┘ │ - │ ▲ unixgram socket (VFKT handshake) │ - │ │ │ - │ ┌── microVM (per bottle) ─────────────────────────────┐ │ - │ │ virtio-net device backed by VZFileHandle... │ │ - │ │ env: HTTPS_PROXY=http://proxy.internal:8888 │ │ - │ │ GIT_GATE_URL=http://proxy.internal:8889 │ │ - │ │ MCP_SUPERVISE_URL=http://proxy.internal:8890 │ │ - │ │ no other host visible │ │ - │ └─────────────────────────────────────────────────────┘ │ + │ ┌── per-bottle docker bridge claude-bottle-bundle- ──┐ │ + │ │ subnet: 192.168.X.0/24 (X = hash(slug) mod 254) │ │ + │ │ │ │ + │ │ ┌── bundle container (pinned --ip 192.168.X.2) ────────┐ │ │ + │ │ │ init.py (PRD 0024 Python supervisor) │ │ │ + │ │ │ ├─ pipelock (binds 0.0.0.0:8888) │ │ │ + │ │ │ ├─ egress (mitmproxy) (binds 127.0.0.1:9099) │ │ │ + │ │ │ ├─ git-gate (binds 0.0.0.0:9418) │ │ │ + │ │ │ └─ supervise (binds 0.0.0.0:9100) │ │ │ + │ │ │ Internal-only egress is unreachable from outside │ │ │ + │ │ │ the bundle even though TSI permits the IP. │ │ │ + │ │ └──────────────────────────────────────────────────────┘ │ │ + │ └──────────────────────────────────────────────────────┬─────┘ │ + │ │ │ + │ ┌── microVM (per bottle, libkrun via smolvm) ──────────▼─┐ │ + │ │ Smolfile: [network] allow_cidrs = ["192.168.X.2/32"] │ │ + │ │ env: HTTPS_PROXY=http://192.168.X.2:8888 │ │ + │ │ GIT_GATE_URL=git://192.168.X.2:9418 (cond.) │ │ + │ │ MCP_SUPERVISE_URL=http://192.168.X.2:9100 (cond) │ │ + │ │ No other host reachable — TSI denies any connect() │ │ + │ │ that isn't to 192.168.X.2. No DNS inside the guest │ │ + │ │ (no resolver in the allowlist). │ │ + │ └────────────────────────────────────────────────────────┘ │ │ │ └───────────────────────────────────────────────────────────────────┘ ``` -What the guest can reach, exhaustively: **only `proxy.internal` -on the gateway-port set we configured.** Everything else — -host LAN, host loopback (Postgres, IDE plugins, other bottles' -sidecars), public internet directly — is gone, enforced at the -gvproxy userspace stack rather than relying on guest cooperation. +What the guest can reach, exhaustively: **only `` on +ports the bundle binds to 0.0.0.0**. Egress's 127.0.0.1-only bind +makes it bundle-internal; host loopback / LAN / public internet +direct are all refused by TSI's allowlist. Three changes vs. the Docker backend: -1. **One sidecar container per bottle, not four.** The bundle - defined in PRD 0024 is the unit of sidecar lifecycle on both - backends. egress is internal to the bundle as pipelock's - upstream, never directly addressed. -2. **Sidecar container is on the host, not a sibling on a Docker - internal network.** Isolation primitive is gvproxy's explicit - port-forward list, not Docker's `--internal` flag. -3. **The agent's first hop is `proxy.internal`, not a sidecar's - container hostname.** Same scanning + DLP + auth-injection - chain, but the first hop crosses a userspace TCP/IP stack we - own, not a Docker-managed bridge. - -git-gate and supervise are conditional port forwards: only -emitted into gvproxy's config when the bottle actually uses -them, narrowing the attack surface for bottles that don't. +1. **One sidecar container per bottle, not four.** Same bundle + image PRD 0024 ships for the docker backend. +2. **Sidecar container is on a per-bottle docker bridge with a + pinned IP**, reached directly by the smolvm guest's allowed + /32 — no localhost port allocation, no userspace TCP/IP stack + in the middle. +3. **The agent dials IP literals, not hostnames.** TSI doesn't + filter DNS at the protocol level, and we don't put DNS + resolvers in the allowlist, so name resolution is denied by + construction. ### Lifecycle @@ -363,61 +390,59 @@ them, narrowing the attack surface for bottles that don't. 1. Cross-backend validation via `BottleBackend._validate` (skills, git identity files). -2. Allocate one to three host loopback ports for the sidecar - bundle (pipelock always; git-gate and supervise conditional on - manifest — egress is internal to the bundle and gets no host - port). -3. Resolve the agent OCI archive path (build if missing, cache by - Dockerfile + agent-name hash). The sidecar-bundle image - (`claude-bottle-sidecars:`) is pulled or built per - PRD 0024; this backend does not own its build. -4. Pick a per-bottle gvproxy subnet (e.g. `192.168.127.X/24` where - `X` is derived from the slug) and render - `stage_dir/gvproxy.yaml`: one DNS entry for `proxy.internal` - and one `port_forwards` entry per active sidecar port - (gateway port → host loopback port on the bundle). -5. Render the per-bottle Smolfile to `stage_dir/smolfile.toml`, - pinning command / env / a virtio-net device backed by the - gvproxy unixgram socket path. No TSI flags. -6. Resolve the in-VM CA paths so launch knows where to copy +2. Derive a per-bottle docker subnet from `sha256(slug) % 254` + (skipping the docker-default 17): `192.168.X.0/24`. The bundle + IP is always `192.168.X.2` (gateway is `.1`). +3. Resolve the agent guest image: convert the existing + `Dockerfile` into a `.smolmachine` artifact via + `smolvm pack create --image -o /agent.smolmachine` + (idempotent, layer-cached). +4. Render the per-bottle Smolfile to `stage_dir/smolfile.toml` + using smolvm 0.8.0's schema: + - `image` / `entrypoint` / `cmd` — bundled into the + `.smolmachine` from the previous step (one Smolfile, one + artifact). + - `env = [...]` — `HTTPS_PROXY`, `NO_PROXY`, `NODE_EXTRA_CA_CERTS`, + etc., all pointing at IP-literal URLs (`http://192.168.X.2:8888`). + - `[network] allow_cidrs = ["192.168.X.2/32"]` — TSI's single + /32 allowlist. +5. Resolve the in-VM CA paths so launch knows where to copy pipelock's CA after start. -7. Return a `SmolmachinesBottlePlan` carrying the slug, port map, - OCI archive path, Smolfile path, gvproxy config path, and - the bundle's container/run spec. +6. Return a `SmolmachinesBottlePlan` carrying the slug, bundle + subnet/IP, `.smolmachine` artifact path, Smolfile path, and + bundle run spec. `SmolmachinesBottleBackend.launch(plan)`: -1. Start the sidecar bundle container with `docker run` (still - using the local Docker daemon for sidecars; the VM is what's - moving off Docker). Wait for its three readiness signals: - pipelock listening, git-gate listening (if enabled), supervise - listening (if enabled). Register the teardown callback. -2. Start the per-bottle `gvproxy` against the unixgram socket - path the Smolfile references, with `port_forwards` pointed at - the bundle's published host ports. Wait for the socket to - appear (the spike-style poll loop from `agent-vm-isolation.md`). -3. `smolvm machine create --smolfile ` and - `smolvm machine start `. The Smolfile's virtio-net - device handshakes (`VFKT` magic) with gvproxy on start. -4. Provisioning: CA install → prompt → skills → git → supervise - config, each via `smolvm machine exec` (analogous to - `docker exec`). -5. Yield a `SmolmachinesBottle` whose `exec_claude` / `exec` / +1. Create the per-bottle docker bridge network + (`claude-bottle-bundle-` with the resolved subnet) and + start the sidecar bundle container with `docker run --network + ... --ip ...`. Wait for its daemons to bind: + pipelock on 8888, git-gate on 9418 (conditional), supervise + on 9100 (conditional). Register teardown callbacks. +2. `smolvm machine create --from /agent.smolmachine + --smolfile /smolfile.toml ` and + `smolvm machine start --name `. The Smolfile's TSI + allowlist gates outbound to the bundle's /32; libkrun's TSI + layer enforces it. +3. Provisioning: CA install → prompt → skills → git → supervise + config, each via `smolvm machine exec` / `smolvm machine cp`. +4. Yield a `SmolmachinesBottle` whose `exec_claude` / `exec` / `cp_in` all funnel through `smolvm machine exec` / `smolvm machine cp`. -6. Teardown: stop and remove the VM → stop gvproxy → stop + - remove the sidecar bundle container. +5. Teardown: stop and delete the VM → stop + remove the bundle + container → remove the per-bottle docker network. ### Data model No manifest schema change. `bottles[]` continues to carry `egress.allowlist`, `env`, `git`, `skills` references, etc.; the smolmachines backend reads the same fields as the docker backend. -`egress.allowlist` is enforced by pipelock on the host side -(unchanged from the docker backend); gvproxy's DNS resolves only -`proxy.internal` regardless of the allowlist's contents, so an -agent that bypasses pipelock by raw IP cannot resolve any name -gvproxy doesn't know about. +`egress.allowlist` is enforced by pipelock inside the bundle +(unchanged from the docker backend); the guest has no DNS resolver +in TSI's allowlist, so an agent that tries to dial an arbitrary +hostname can't resolve it in the first place — the DNS-exfil +attack from PRD 0022 test 4 is blocked at the resolver step. The `BottleSpec` dataclass and the `Bottle` ABC do not change. @@ -442,37 +467,38 @@ The existing "unknown backend" `die()` path stays as-is. - `smolvm` CLI binary on `$PATH` (one new external dep, gated by the preflight check). Pinned version policy is deferred to the open questions; v1 reads `smolvm --version` and refuses to launch - outside a known-good range. -- `gvproxy` binary on `$PATH` - (`go install github.com/containers/gvisor-tap-vsock/cmd/gvproxy@latest`, - or vendored). Same preflight pattern as `smolvm`. -- `pyobjc-framework-Virtualization` *only* if smolmachines does - not expose a way to attach virtio-net to a unixgram socket and - we fall back to driving Virtualization.framework directly (see - open question 1). Default path is "no PyObjC needed." + outside a known-good range (currently 0.8.x). +- No `gvproxy` dep (the original draft listed it; dropped after + the chunk-1 spike). +- No `pyobjc-framework-Virtualization` dep (dropped from the + original draft for the same reason). - No new pure-Python packages. Subprocess + stdlib `tomllib` for - Smolfile authoring; the gvproxy YAML is small enough to render - by hand from a `dict[str, Any]`. + Smolfile authoring. ### Acceptance test plan - **Unit (smolfile):** `tests/unit/test_smolfile.py` verifies the - renderer produces the expected TOML for a fixture bottle — - command line, env entries, virtio-net device referencing the - expected unixgram socket path, no TSI flags. -- **Unit (gvproxy config):** `tests/unit/test_gvproxy_config.py` - verifies the per-bottle YAML has exactly one DNS entry - (`proxy.internal`), one `port_forwards` entry per active - sidecar pointed at the resolved host loopback port, and a - per-bottle subnet/gateway derived from the slug. + renderer produces the expected TOML for a fixture bottle in + smolvm 0.8.0's schema — top-level `image` / `entrypoint` / + `cmd` / `env`, plus `[network] allow_cidrs = ["/32"]` + and nothing else under `[network]`. +- **Unit (subnet derivation):** the existing + `test_smolmachines_util.py` covers the per-bottle subnet hash + + collision-avoidance and stays as-is. - **Integration smoke:** `tests/integration/test_smolmachines_smoke.py` with `prepare → launch → exec → teardown`, guarded by a - `smolvm` + `gvproxy` presence check + macOS / KVM platform check. + `smolvm` presence check + macOS / KVM platform check. - **Localhost-reach probe:** a focused integration test that brings up a bottle, has the host bind a test service on `127.0.0.1:`, and asserts the in-bottle agent cannot connect to it. This is the regression test for the - exact gap that motivated choosing gvproxy over TSI. + exact gap `--outbound-localhost-only` would have introduced — + with `--allow-cidr /32` only, the probe must fail. +- **Egress-port-bypass probe:** also brings up a bottle and + asserts the in-bottle agent's connect to `:9099` + (egress's port) is refused — confirming the bundle-internal + bind of egress to `127.0.0.1` works as the port-granularity + layer TSI doesn't provide. - **PRD 0022 re-run:** with `CLAUDE_BOTTLE_BACKEND=smolmachines`, all five attack categories return sandbox-block markers and the suite passes. The test code does not change beyond the env-var @@ -484,28 +510,32 @@ The existing "unknown backend" `die()` path stays as-is. PRD 0024's bundle image is a prerequisite — this PRD assumes `claude-bottle-sidecars:` is available when chunk 3 lands. -1. **Backend skeleton + selection + Smolfile + gvproxy renderers.** - Subpackage layout, `_resolve_plan` stub that emits both a - TOML Smolfile and a gvproxy YAML but doesn't launch anything, - `_BACKENDS` registration, preflight `smolvm` + `gvproxy` - checks. Unit tests on both renderers. No VM bringup yet. -2. **gvproxy + VM lifecycle + OCI archive build.** `smolvm.py` - and `gvproxy.py` subprocess wrappers, prepare-time image - build (existing Dockerfile → OCI archive), launch path that - starts gvproxy, brings up the VM attached to gvproxy's socket - via VFKT handshake, exec into the VM, tear everything down. +1. **Backend skeleton + selection + Smolfile/gvproxy renderers.** + *Shipped (PR #62), but under the now-rejected gvproxy design.* + The Smolfile renderer emits `name = …` / `[[net]]` instead of + smolvm 0.8.0's `image` / `[network] allow_cidrs`. The gvproxy + renderer is dead. Chunk 2 rewrites the Smolfile renderer and + deletes `gvproxy_config.py` / its tests. +2. **VM lifecycle + bundle bringup + Smolfile rewrite.** + `smolvm.py` subprocess wrapper, prepare-time image conversion + (`smolvm pack create` → `.smolmachine`), per-bottle docker + bridge + bundle container with pinned IP, launch path that + starts the bundle and brings up the VM (`smolvm machine create + --from --smolfile`), exec into the VM, tear everything down. Smoke integration test: `exec("echo hi")` inside a started - VM. Includes the localhost-reach probe test from the - acceptance plan. -3. **Sidecar bundle lifecycle.** `sidecar_bundle.py`: per-bottle - bundle container brought up via `docker run`, with one to - three published host ports, gvproxy `port_forwards` pointed - at them, and teardown integrated into the bottle's lifecycle. - Port allocator. No provisioning yet beyond what the bundle - needs. + VM. Includes the localhost-reach probe + egress-port-bypass + probe from the acceptance plan. The chunk-1 Smolfile renderer + gets rewritten to the smolvm 0.8.0 schema; `gvproxy_config.py` + and `gvproxy.py` (if any) get deleted. +3. **Bundle bind-address mitigation.** Update PRD 0024's bundle + init to bind egress on `127.0.0.1:9099` instead of `0.0.0.0` + (or expose a config knob — open question 4). Reverify the + egress-port-bypass probe. Pipelock / git-gate / supervise + continue to bind `0.0.0.0`. 4. **Provisioning parity with Docker.** CA install via - `smolvm machine exec`, prompt/skills/.git copy-in, supervise - MCP config. End-to-end `start` works for a real agent manifest. + `smolvm machine cp`, prompt/skills/.git copy-in, supervise + MCP config. End-to-end `start` works for a real agent + manifest. 5. **PRD 0022 sandbox-escape suite green.** Skip-guard update, small adjustments to test helpers if any (the test uses `bottle.exec(script)` and inspects `returncode` + body for @@ -514,78 +544,68 @@ PRD 0024's bundle image is a prerequisite — this PRD assumes ## Open questions -1. **VMM choice: smolmachines vs PyObjC + Virtualization.framework.** - The network design requires libkrun's virtio-net mode attached - to a unixgram socket (so gvproxy is the gateway). The - smolmachines research note says libkrun *has* a virtio-net - mode but says it "does not support policy" — meaning libkrun - itself enforces no allowlist in that mode, which is exactly - what we want (gvproxy is the policy). What's unverified is - whether the Smolfile surface lets us point virtio-net at a - custom unixgram socket. If yes: this is a smolmachines backend - verbatim. If no: chunk 2 drops `smolvm` and drives - `Virtualization.framework` via PyObjC directly (the recipe in - `agent-vm-isolation.md` § "gvisor-tap-vsock + PyObjC + - Pipelock"), keeping the backend name "smolmachines" because - the operator-facing UX is unchanged. Resolve in chunk 1 via a - spike against the pinned smolmachines version. -2. **`smolvm` + `gvproxy` install policy.** Pin via brew / - `go install` versions, or vendor binaries in the repo. v1 - likely runs `smolvm --version` / `gvproxy --help` at preflight - and accepts a documented range; vendoring is heavier but - reduces "works on my Mac" drift. -3. **CA install inside the OCI overlay.** Two paths: bake at - prepare time (one OCI archive per CA fingerprint, big cache - key) vs. inject at start time via `smolvm machine exec` after - the VM is up. PRD 0006 chose the runtime path for Docker - (docker-cp + `update-ca-certificates`); smolvm has the same - shape via `machine exec`. Default to runtime injection unless - it conflicts with VM start order. -4. **gvproxy subnet collision.** Two concurrent bottles must not - land on the same `192.168.127.X/24` subnet — they'd both want - the same gateway IP. Derive the third octet from a hash of - the slug (mod 254, skip the docker-default 17), and at launch - time confirm the subnet isn't already in use by another - bottle's gvproxy. Resolve the hash-collision policy in - chunk 2. +1. **~~VMM choice~~ Resolved.** Chunk-1 spike against `smolvm + 0.8.0` confirmed there's no virtio-net-over-unixgram option; + the gvproxy design isn't viable on top of smolvm. Resolved + by switching to TSI `--allow-cidr /32` + bundle + bind-address mitigation; smolvm stays as the VMM. See the + "Design pivot from the first draft" section. +2. **`smolvm` install policy.** Pin via brew / `curl install.sh`, + or vendor a binary in the repo. v1 likely runs + `smolvm --version` at preflight and accepts a documented range + (currently 0.8.x). The + `curl -sSL https://smolmachines.com/install.sh | sh` path is + what the operator used; document it in the README. +3. **CA install inside the agent guest.** Two paths: bake at + prepare time (one `.smolmachine` artifact per CA fingerprint, + big cache key) vs. inject at start time via `smolvm machine + cp` after the VM is up. PRD 0006 chose the runtime path for + Docker (docker-cp + `update-ca-certificates`); smolvm has the + same shape via `machine cp` + `machine exec`. Default to + runtime injection. +4. **Bundle bind-address knob.** PRD 0024's bundle currently runs + all four daemons under one supervisor with daemon argv + hardcoded. To make egress bind `127.0.0.1:9099` instead of + `0.0.0.0:9099`, either: (a) edit the supervisor's + `_DAEMONS` entry to pass a `--listen-host 127.0.0.1` flag to + mitmdump, OR (b) introduce a per-daemon `bind_localhost` + knob the renderer can set. Option (a) is simpler and matches + that egress is bundle-internal regardless of backend; resolve + in chunk 3. 5. **`bottle.exec(script)` exit-code fidelity.** The PRD 0022 test suite reads `returncode` + stdout + stderr from - `ExecResult`. Confirm the VM-exec path (`smolvm machine exec` - or its PyObjC equivalent) propagates exit codes and separated - streams. The research note's "external integration is the CLI" - implies yes, but the embedded SDK bug it flagged suggests we - should verify before coding around it. + `ExecResult`. Confirm `smolvm machine exec` propagates exit + codes and separated streams. The CLI help mentions a + `--stream` flag for streaming output; behavior under default + (non-stream) mode is what we want — verify in chunk 2. 6. **CI gating.** Gitea's act_runner is Linux without nested KVM, so this backend's integration tests will skip there for the - same structural reason the Docker bringup tests do (no real - isolation primitive available on the runner). The skip - predicate becomes `not (smolvm_available() and gvproxy_available() - and platform.system() == "Darwin")`. CI coverage for this - backend will come from local runs on the maintainer's macOS - host until a Darwin runner is wired up; ack that as a known - gap. + same structural reason the Docker bringup tests do. The skip + predicate becomes `not (smolvm_available() and + platform.system() == "Darwin")`. CI coverage for this backend + will come from local runs on the maintainer's macOS host + until a Darwin runner is wired up; ack that as a known gap. 7. **Active bottle discovery.** Docker uses container labels to enumerate active bottles (`list_active` queries the daemon). - The microVM enumeration story is `smolvm machine list` - (or the PyObjC backend's own bookkeeping); the plan is to - mirror the label scheme via Smolfile metadata - (`labels = { "claude-bottle" = "1" }`-style entries, if the - format supports it; otherwise via a deterministic name prefix - `claude-bottle-` + on-disk metadata under - `state//`). + The microVM enumeration story is `smolvm machine ls --json`; + the plan is to filter on a deterministic name prefix + `claude-bottle-` + cross-reference with on-disk metadata + under `state//`. ## References -- `docs/research/agent-vm-isolation.md` — primary reference for - the gvproxy + `VZFileHandleNetworkDeviceAttachment` network - attachment used here. The "Full Setup: gvisor-tap-vsock + - PyObjC + Pipelock" section is the recipe the PyObjC fallback - in open question 1 would adopt verbatim. +- `docs/research/agent-vm-isolation.md` — describes the + gvproxy + `VZFileHandleNetworkDeviceAttachment` path. The + current design no longer needs that recipe (the TSI single-IP + approach replaced it after the chunk-1 spike); kept for + historical context if a future operator needs to drop smolvm + and own the VM lifecycle directly. - `docs/research/smolmachines-as-vm-backend.md` — evaluation of - smolmachines as the VM lifecycle wrapper. This PRD diverges - from its conclusion on the *network* primitive (rejecting TSI - in favor of gvproxy) but keeps its VM-lifecycle conclusion - conditional on the libkrun-virtio-net spike in open question 1. + smolmachines as the VM lifecycle wrapper. The research note's + TSI-bad-due-to-loopback-gap argument turned out to apply only + to `--outbound-localhost-only`, not to TSI generally; this PRD + uses `--allow-cidr /32` instead, sidestepping the + gap. - `docs/research/agent-sandbox-landscape.md` — identifies `"runtime": "microvm"`-style opt-in as the borrowable idea; smolmachines is the concrete implementation.