diff --git a/docs/prds/0023-smolmachines-backend.md b/docs/prds/0023-smolmachines-backend.md index 441005b..502cc4f 100644 --- a/docs/prds/0023-smolmachines-backend.md +++ b/docs/prds/0023-smolmachines-backend.md @@ -12,12 +12,19 @@ Ship a second concrete `BottleBackend` — a per-agent microVM on macOS. The egress topology is enforced by **gvproxy** (gvisor-tap-vsock), a userspace TCP/IP stack the guest's virtio-net device is wired into via `VZFileHandleNetworkDeviceAttachment`. -gvproxy's only outbound configuration is an explicit per-bottle port -forward to a host-side pipelock; everything else — the host's LAN, -the host's loopback services, the public internet — is unreachable -from the guest by construction. pipelock + egress + git-gate + -supervise stay as host-side processes on per-bottle loopback ports, -reached *only* through gvproxy's forwarded ports. +gvproxy's only outbound configuration is an explicit per-bottle +port-forward set into a **single per-bottle sidecar container** that +bundles pipelock + egress + git-gate + supervise behind one supervised +init. Everything else — the host's LAN, the host's loopback +services, the public internet — is unreachable from the guest by +construction. + +The sidecar bundle is the same image PRD 0024 introduces for the +docker backend; this PRD consumes it. Inside the bundle, egress is +pipelock's internal upstream over localhost and is not exposed +externally. gvproxy port-forwards three external ports into the +bundle: pipelock (for `HTTPS_PROXY`), git-gate (for git push), and +supervise (for MCP). This explicitly rejects libkrun's TSI ("Transport Socket Interface") allowlist as the network primitive. TSI's `--outbound-localhost-only` @@ -134,19 +141,21 @@ The feature is **done** when all of the following ship: in the Smolfile — TSI is not used. - Per-bottle gvproxy: one `gvproxy` process per bottle, started before the VM, listening on a unixgram socket the VM's - virtio-net device hooks into. The gvproxy config has exactly - one `port_forwards` entry — gateway-port to the per-bottle - pipelock's host port — and a DNS section that resolves only - `proxy.internal`. Every other hostname returns NXDOMAIN; every - other destination is unreachable. -- Host-side sidecar relocation: pipelock, egress, git-gate, and - supervise each run as host processes (one set per bottle), - bound to `127.0.0.1` on per-bottle dynamically-allocated ports. - The agent's environment carries the resolved URLs (e.g. + virtio-net device hooks into. The gvproxy config has up to + three `port_forwards` entries (pipelock / git-gate / supervise + — git-gate and supervise only when the bottle uses them) all + pointing at the per-bottle sidecar bundle's exposed ports, plus + a DNS section that resolves only `proxy.internal`. Every other + hostname returns NXDOMAIN; every other destination is + unreachable. +- Per-bottle sidecar bundle: one container per bottle running the + bundle image defined in PRD 0024. The bundle exposes up to + three host ports (pipelock for `HTTPS_PROXY`, git-gate for git + push, supervise for MCP), bound to `127.0.0.1` on dynamically + allocated ports. egress runs *inside* the bundle as pipelock's + upstream over localhost and is not exposed externally. The + agent's environment carries the resolved URLs (e.g. `HTTPS_PROXY=http://proxy.internal:`). - Only pipelock is exposed through gvproxy; egress / git-gate / - supervise are chained *behind* pipelock on the host side and - are not reachable directly from the guest. - The agent guest image is produced from the existing `Dockerfile` (or a thin variant), exported as an OCI archive, and consumed by `smolvm machine create`. The image build step is part of `prepare`, @@ -209,17 +218,19 @@ The feature is **done** when all of the following ship: full set of `BottleBackend` overrides. - Smolfile generator (TOML), analogous to `backend/docker/compose.py`'s `bottle_plan_to_compose`. -- A host-side sidecar process manager that owns the lifecycle of - pipelock + egress + git-gate + supervise for one bottle, binding - them to per-bottle loopback ports and tearing them down with the - bottle. This is the smolmachines-specific replacement for - `docker compose up`/`down`. -- Per-bottle CA install path: the egress sidecar's CA cert lands - inside the microVM via `smolvm machine exec` after start +- A host-side sidecar-bundle lifecycle manager that brings up + one container per bottle (the bundle image defined in PRD 0024), + publishes its one to three host ports, waits for readiness, + and tears it down with the bottle. This backend depends on + PRD 0024's bundle image; it does not own the bundle's + Dockerfile or init. +- Per-bottle CA install path: the bundle's CA cert lands inside + the microVM via `smolvm machine exec` after start (analogous to the existing `provision_ca` for Docker). - gvproxy lifecycle: per-bottle `gvproxy` started by the backend before VM bringup, torn down after VM teardown, configured with - one `port_forwards` entry (gateway → host pipelock port) and a + up to three `port_forwards` entries (gateway port → host + bundle port for each of pipelock / git-gate / supervise) and a DNS section that resolves only `proxy.internal`. Subnet and gateway IP are derived from the bottle slug so two concurrent bottles don't collide. @@ -272,7 +283,7 @@ claude_bottle/backend/smolmachines/ cleanup.py prepare_cleanup / cleanup / list_active smolfile.py bottle_plan_to_smolfile(...) -> dict + render gvproxy.py per-bottle gvproxy config render + process lifecycle - sidecars.py host-side pipelock/egress/git-gate/supervise lifecycle + sidecar_bundle.py host-side lifecycle for the PRD 0024 bundle container smolvm.py thin subprocess wrapper: machine create/start/exec/stop vfkit_attach.py VZFileHandleNetworkDeviceAttachment + VFKT handshake util.py slugify, port allocation, OCI archive helpers @@ -284,16 +295,19 @@ claude_bottle/backend/smolmachines/ ``` ┌── macOS host ─────────────────────────────────────────────────────┐ │ │ - │ ┌── per-bottle sidecar chain (one set per microVM) ────┐ │ - │ │ agent ──HTTPS_PROXY──► pipelock ──► egress ──► internet │ - │ │ 127.0.0.1:p1 (DLP) (MITM, │ - │ │ auth-inject) │ - │ │ │ - │ │ git push ──► git-gate ──► upstream │ - │ │ 127.0.0.1:p3 (gitleaks) │ - │ │ │ - │ │ MCP ──► supervise 127.0.0.1:p4 │ - │ └────────────────────────────────────────────────────────────────┘ + │ ┌── per-bottle sidecar bundle (one container per microVM) ─┐ │ + │ │ init.py (Python supervisor) │ │ + │ │ ├─ pipelock (binds 0.0.0.0:8888 in container) │ │ + │ │ ├─ egress (mitmproxy) (binds 127.0.0.1:p_internal) │ │ + │ │ ├─ git-gate (binds 0.0.0.0:8889) │ │ + │ │ └─ supervise (MCP) (binds 0.0.0.0:8890) │ │ + │ │ pipelock's upstream is 127.0.0.1:p_internal (egress); │ │ + │ │ egress is not exposed outside the bundle. │ │ + │ └─────────────────────────────────────────────────────┬─────┘ │ + │ Host ports published (loopback, dynamic): │ │ + │ pipelock 127.0.0.1: │ │ + │ git-gate 127.0.0.1: (conditional) │ │ + │ supervise 127.0.0.1: (conditional) │ │ │ ▲ host TCP, reached via gvproxy port-forward │ │ │ │ │ ┌── gvproxy (per bottle) ─────────────────────────────┐ │ @@ -301,6 +315,8 @@ claude_bottle/backend/smolmachines/ │ │ gateway: 192.168.127.X.1 │ │ │ │ port_forwards: │ │ │ │ - gateway 8888 → host 127.0.0.1: │ │ + │ │ - gateway 8889 → host 127.0.0.1: (cond) │ │ + │ │ - gateway 8890 → host 127.0.0.1: (cond) │ │ │ │ # nothing else │ │ │ │ DNS: proxy.internal → gateway IP; * → NXDOMAIN │ │ │ └─────────────────────────────────────────────────────┘ │ @@ -323,24 +339,23 @@ host LAN, host loopback (Postgres, IDE plugins, other bottles' sidecars), public internet directly — is gone, enforced at the gvproxy userspace stack rather than relying on guest cooperation. -Two changes vs. the Docker backend: +Three changes vs. the Docker backend: -1. **Sidecars are host processes, not sibling containers.** No - internal Docker network. The isolation primitive is gvproxy's - explicit port-forward list, not Docker's `--internal` flag. -2. **The agent's first hop is `proxy.internal`, not a sidecar's - container hostname.** Egress out to the public internet still - happens through pipelock + egress — same scanning + DLP + - auth-injection chain — but the first hop crosses a userspace - TCP/IP stack we own, not a Docker-managed bridge. +1. **One sidecar container per bottle, not four.** The bundle + defined in PRD 0024 is the unit of sidecar lifecycle on both + backends. egress is internal to the bundle as pipelock's + upstream, never directly addressed. +2. **Sidecar container is on the host, not a sibling on a Docker + internal network.** Isolation primitive is gvproxy's explicit + port-forward list, not Docker's `--internal` flag. +3. **The agent's first hop is `proxy.internal`, not a sidecar's + container hostname.** Same scanning + DLP + auth-injection + chain, but the first hop crosses a userspace TCP/IP stack we + own, not a Docker-managed bridge. -The chain `agent → pipelock → egress → internet` collapses on -the host side: pipelock listens on 127.0.0.1:p1, makes its -upstream connect against egress at 127.0.0.1:p2, which makes its -upstream connect against the public internet. git-gate and -supervise are separate gateway ports if and only if the bottle -uses them — otherwise they're omitted from gvproxy's -`port_forwards`, narrowing the attack surface further. +git-gate and supervise are conditional port forwards: only +emitted into gvproxy's config when the bottle actually uses +them, narrowing the attack surface for bottles that don't. ### Lifecycle @@ -348,16 +363,19 @@ uses them — otherwise they're omitted from gvproxy's 1. Cross-backend validation via `BottleBackend._validate` (skills, git identity files). -2. Allocate host loopback ports for each sidecar the bottle uses - (pipelock always; egress / git-gate / supervise conditional on - manifest). +2. Allocate one to three host loopback ports for the sidecar + bundle (pipelock always; git-gate and supervise conditional on + manifest — egress is internal to the bundle and gets no host + port). 3. Resolve the agent OCI archive path (build if missing, cache by - Dockerfile + agent-name hash). + Dockerfile + agent-name hash). The sidecar-bundle image + (`claude-bottle-sidecars:`) is pulled or built per + PRD 0024; this backend does not own its build. 4. Pick a per-bottle gvproxy subnet (e.g. `192.168.127.X/24` where `X` is derived from the slug) and render - `stage_dir/gvproxy.yaml`: one DNS entry for `proxy.internal`, - one `port_forwards` entry per active sidecar (gateway port → - host loopback port). + `stage_dir/gvproxy.yaml`: one DNS entry for `proxy.internal` + and one `port_forwards` entry per active sidecar port + (gateway port → host loopback port on the bundle). 5. Render the per-bottle Smolfile to `stage_dir/smolfile.toml`, pinning command / env / a virtio-net device backed by the gvproxy unixgram socket path. No TSI flags. @@ -365,17 +383,19 @@ uses them — otherwise they're omitted from gvproxy's pipelock's CA after start. 7. Return a `SmolmachinesBottlePlan` carrying the slug, port map, OCI archive path, Smolfile path, gvproxy config path, and - host sidecar specs. + the bundle's container/run spec. `SmolmachinesBottleBackend.launch(plan)`: -1. Start host sidecars in dependency order (egress → pipelock → - git-gate → supervise — egress before pipelock so pipelock's - upstream resolves; pipelock is the only one exposed through - gvproxy). Register teardown callbacks in reverse order. +1. Start the sidecar bundle container with `docker run` (still + using the local Docker daemon for sidecars; the VM is what's + moving off Docker). Wait for its three readiness signals: + pipelock listening, git-gate listening (if enabled), supervise + listening (if enabled). Register the teardown callback. 2. Start the per-bottle `gvproxy` against the unixgram socket - path the Smolfile references. Wait for the socket to appear - (the spike-style poll loop from `agent-vm-isolation.md`). + path the Smolfile references, with `port_forwards` pointed at + the bundle's published host ports. Wait for the socket to + appear (the spike-style poll loop from `agent-vm-isolation.md`). 3. `smolvm machine create --smolfile ` and `smolvm machine start `. The Smolfile's virtio-net device handshakes (`VFKT` magic) with gvproxy on start. @@ -385,8 +405,8 @@ uses them — otherwise they're omitted from gvproxy's 5. Yield a `SmolmachinesBottle` whose `exec_claude` / `exec` / `cp_in` all funnel through `smolvm machine exec` / `smolvm machine cp`. -6. Teardown: stop and remove the VM → stop gvproxy → stop - sidecars (in reverse start order). +6. Teardown: stop and remove the VM → stop gvproxy → stop + + remove the sidecar bundle container. ### Data model @@ -461,6 +481,9 @@ The existing "unknown backend" `die()` path stays as-is. ## Sizing — into chunks +PRD 0024's bundle image is a prerequisite — this PRD assumes +`claude-bottle-sidecars:` is available when chunk 3 lands. + 1. **Backend skeleton + selection + Smolfile + gvproxy renderers.** Subpackage layout, `_resolve_plan` stub that emits both a TOML Smolfile and a gvproxy YAML but doesn't launch anything, @@ -474,11 +497,12 @@ The existing "unknown backend" `die()` path stays as-is. Smoke integration test: `exec("echo hi")` inside a started VM. Includes the localhost-reach probe test from the acceptance plan. -3. **Host-side sidecar relocation.** `sidecars.py`: per-bottle - pipelock + egress + git-gate + supervise as host processes on - loopback, with gvproxy `port_forwards` wired only for the - sidecars the bottle actually uses. Port allocator. Teardown - ordering. No provisioning yet beyond what the sidecars need. +3. **Sidecar bundle lifecycle.** `sidecar_bundle.py`: per-bottle + bundle container brought up via `docker run`, with one to + three published host ports, gvproxy `port_forwards` pointed + at them, and teardown integrated into the bottle's lifecycle. + Port allocator. No provisioning yet beyond what the bundle + needs. 4. **Provisioning parity with Docker.** CA install via `smolvm machine exec`, prompt/skills/.git copy-in, supervise MCP config. End-to-end `start` works for a real agent manifest. @@ -569,10 +593,14 @@ The existing "unknown backend" `die()` path stays as-is. backend abstraction this PRD is the first non-Docker consumer of. - PRD 0017 (`docs/prds/0017-egress-proxy-via-mitmproxy.md`) — the - egress sidecar the host-side relocation reuses verbatim, only - with a different transport. + egress sidecar the bundle reuses verbatim as pipelock's internal + upstream. - PRD 0022 (`docs/prds/0022-sandbox-escape-integration-test.md`) — the acceptance gate for this PRD; the suite already runs through `get_bottle_backend()` so the env-var flip is the only change needed to exercise the smolmachines path. +- PRD 0024 + (`docs/prds/0024-consolidate-sidecar-bundle.md`) — defines the + single bundle image (`claude-bottle-sidecars`) this PRD + consumes. Prerequisite for chunk 3 of this PRD.