docs(prd-0023): smolmachines bottle backend
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m7s

Specs a second concrete BottleBackend selectable via
CLAUDE_BOTTLE_BACKEND=smolmachines: per-agent libkrun microVM on
macOS, sidecars relocated to host-side loopback ports plumbed via
Smolfile env, PRD 0022's sandbox-escape suite as the acceptance
gate (the env-var flip is the only change required). Docker
backend ships unchanged and remains default.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-26 23:19:08 -04:00
parent e8a14fd860
commit a2ac124d5c
+427
View File
@@ -0,0 +1,427 @@
# PRD 0023: smolmachines bottle backend
- **Status:** Draft
- **Author:** didericis
- **Created:** 2026-05-26
## Summary
Ship a second concrete `BottleBackend``SmolmachinesBottleBackend`,
selected via `CLAUDE_BOTTLE_BACKEND=smolmachines` — that runs a
bottle inside a per-agent libkrun microVM on macOS (and KVM on Linux,
opportunistically). The egress topology moves out of an internal
Docker network and onto libkrun's TSI ("Transport Socket Interface")
allowlist plus a host-side pipelock/egress/git-gate/supervise stack
listening on per-bottle loopback ports. The Docker backend ships
unchanged; this is opt-in via the existing env-var selector.
The acceptance gate is PRD 0022's `tests/integration/test_sandbox_escape.py`
running green against `CLAUDE_BOTTLE_BACKEND=smolmachines`.
## Problem
`agent-vm-isolation.md` argues for hardware-isolated microVMs over
container-based bottles on macOS; `smolmachines-as-vm-backend.md`
concludes that smolmachines is the most plausible concrete VMM for
this project. Today, the only backend in the registry is Docker
(`claude_bottle/backend/__init__.py:_BACKENDS = {"docker": ...}`),
and three things motivate a second one now:
- **Isolation ceiling.** On macOS the Docker backend's agent
container shares Docker Desktop's host VM with every other bottle.
Container escape from claude-code lands the agent inside that
shared VM. A per-bottle libkrun microVM gets hardware page tables
via `Hypervisor.framework`; cross-bottle isolation becomes
enforced by the CPU's MMU instead of namespace bookkeeping.
- **PRD 0022 is backend-agnostic by design** but currently only
exercises the Docker backend. The suite was written with
`CLAUDE_BOTTLE_BACKEND` selection in mind precisely so the
smolmachines path could be validated against the same five
attacks. Until a second backend exists, the abstraction is
unproven.
- **CI carve-outs.** Most bottle-bringup integration tests skip
under `GITEA_ACTIONS=true` because act_runner shares the host
Docker socket but not the host filesystem. A smolmachines path
doesn't share that constraint shape (it has its own, but
different), so adding the backend forces the abstraction to be
clean in places where Docker-specific assumptions have been
tolerated.
The smolmachines research note's `## Recommendation` ("adopt
smolmachines as the bottle VM backend on macOS; keep pipelock DIY")
is the design hypothesis under test here.
## Goals / Success Criteria
The feature works when all of the following are observable on a
macOS host with smolmachines installed:
- `CLAUDE_BOTTLE_BACKEND=smolmachines python3 cli.py start <agent>`
brings up a microVM, runs claude-code inside it, and tears it
down on exit. Same y/N preflight UX as Docker — only the
resolved-runtime line differs.
- The sandbox-escape suite in `tests/integration/test_sandbox_escape.py`
runs green against the smolmachines backend (all five attack
categories blocked).
- Selecting the backend on a host without `smolvm` installed dies
at startup with an install pointer; no silent fall-through to
Docker.
- Active bottles show up under
`python3 cli.py list-bottles` regardless of backend.
- `python3 cli.py stop <bottle>` and orphan cleanup work for both
Docker bottles and smolmachines bottles via the same CLI surface.
The feature is **done** when all of the following ship:
- A new `claude_bottle/backend/smolmachines/` subpackage exists,
mirroring the layout of `claude_bottle/backend/docker/`
(`backend.py`, `bottle.py`, `bottle_plan.py`,
`bottle_cleanup_plan.py`, `prepare.py`, `launch.py`,
`cleanup.py`, `util.py`, and a `provision/` subpackage for the
five `provision_*` methods).
- `SmolmachinesBottleBackend` registered under the
`"smolmachines"` key in `claude_bottle/backend/__init__.py:_BACKENDS`.
- Per-bottle Smolfile generation: a runtime-rendered TOML written
to the bottle's stage dir, analogous to the compose file the
Docker backend writes today. The Smolfile pins `command`,
`env`, `--outbound-localhost-only`, and the per-bottle DNS
allowlist.
- Host-side sidecar relocation: pipelock, egress, git-gate, and
supervise each run as host processes (one set per bottle),
bound to `127.0.0.1` on per-bottle dynamically-allocated ports.
The agent's environment carries the resolved URLs (e.g.
`HTTPS_PROXY=http://127.0.0.1:<pipelock-port>`).
- The agent guest image is produced from the existing `Dockerfile`
(or a thin variant), exported as an OCI archive, and consumed by
`smolvm machine create`. The image build step is part of `prepare`,
analogous to `docker_mod.build_image`.
- The PRD 0022 sandbox-escape suite, run with
`CLAUDE_BOTTLE_BACKEND=smolmachines`, passes locally on a
smolmachines-capable host. The suite is updated to skip cleanly
on hosts that can't reach smolmachines (same shape as the
existing `GITEA_ACTIONS == "true"` skip), not to fail.
- README + `CLAUDE.md` updated to document the env-var selection,
the macOS-only scope for v1, and the `smolvm` install
prerequisite.
## Non-goals
- **No Linux KVM support shipped in this PRD.** smolmachines works
on Linux via KVM, but the abstraction win is biggest on macOS
where Docker's shared-VM topology hurts most. Linux can come
later behind the same selector.
- **No removal of the Docker backend.** Both backends ship side by
side. Selection stays env-driven; the manifest does not gain a
`backend` field.
- **No default-backend change.** `docker` remains the default
value of `CLAUDE_BOTTLE_BACKEND`; smolmachines is strictly
opt-in until it has been load-bearing on at least one operator's
workflow for a release cycle.
- **No host bind mounts.** The smolmachines research note flagged
that `-v HOST:GUEST` mounts via virtiofs would defeat the
isolation goal. The manifest already has no concept of host
mounts; this PRD does not introduce one. If a future PRD wants
agent-side access to host files, it must come through a
controlled channel (vsock relay, OCI overlay, supervise sidecar
endpoint).
- **No HTTP API mode.** `smolvm serve` is the long-term-clean
control plane, but v1 drives smolmachines via CLI subprocess
invocations — the lower-overhead first iteration the research
note already endorses.
- **No custom kernel / initrd.** smolmachines uses libkrunfw
only; the agent image is an OCI ref, not a kernel + rootfs pair.
- **No warm-pool or snapshot/restore.** Each bottle gets a fresh
microVM; cold-start cost is paid up front.
- **No supervise/agent-credential rewrites for the new backend.**
Provisioning logic ports as-is; only the *transport* (host-side
port URLs instead of in-network DNS names) changes.
## Scope
### In scope
- New `claude_bottle/backend/smolmachines/` subpackage with the
full set of `BottleBackend` overrides.
- Smolfile generator (TOML), analogous to
`backend/docker/compose.py`'s `bottle_plan_to_compose`.
- A host-side sidecar process manager that owns the lifecycle of
pipelock + egress + git-gate + supervise for one bottle, binding
them to per-bottle loopback ports and tearing them down with the
bottle. This is the smolmachines-specific replacement for
`docker compose up`/`down`.
- Per-bottle CA install path: the egress sidecar's CA cert lands
inside the microVM via `smolvm machine exec` after start
(analogous to the existing `provision_ca` for Docker).
- DNS allowlist plumbing: every host in `bottle.egress.allowlist`
goes into the Smolfile's DNS filter section (vsock port 6002),
so the VMM-layer DNS filter and the bottle's policy stay in
sync — agent can't `dig` its way out via raw IP literals (TSI
+ CIDR allowlist enforces this; DNS filter denies hostname
resolution).
- Preflight `smolvm` check: if the user selects this backend and
`smolvm` isn't on `$PATH`, die with an install pointer (brew tap
+ version pin TBD in implementation; see open question 3).
- Manifest validation: refuse any bottle field this backend can't
honor (today there are none, since the Docker backend already
rejects host mounts; this is a forward-compat check).
- Tests:
- Smoke unit-level test: Smolfile renderer produces the
expected TOML for a fixture bottle.
- Integration test: `prepare → launch → exec("echo hi") →
teardown` on a smolmachines-capable host (skips otherwise
via the same env/platform gate the Docker integration tests
use).
- PRD 0022 suite, re-run with the env var flipped, passes.
### Out of scope
- VM image caching across bottles (each prepare rebuilds from the
OCI archive; layer reuse is whatever smolmachines provides).
- Cross-host bottle relocation (the OCI archive is local-only).
- Operator-facing knobs for vCPU / memory / overlay size (use
sensible defaults; expose as manifest fields in a later PRD if
needed).
- Integration with the `supervise` plane's permission-prompt UX
beyond port plumbing — supervise already speaks HTTP and binds
to whatever loopback the backend hands it.
## Proposed Design
### Backend layout
```
claude_bottle/backend/smolmachines/
__init__.py re-exports SmolmachinesBottleBackend
backend.py SmolmachinesBottleBackend façade
bottle.py SmolmachinesBottle (exec_claude / exec / cp_in / close)
bottle_plan.py SmolmachinesBottlePlan + .print()
bottle_cleanup_plan.py SmolmachinesBottleCleanupPlan
prepare.py resolve_plan(spec, stage_dir, ...) -> SmolmachinesBottlePlan
launch.py @contextmanager launch(plan) -> SmolmachinesBottle
cleanup.py prepare_cleanup / cleanup / list_active
smolfile.py bottle_plan_to_smolfile(...) -> dict + render
sidecars.py host-side pipelock/egress/git-gate/supervise lifecycle
smolvm.py thin subprocess wrapper: machine create/start/exec/stop
util.py slugify, port allocation, OCI archive helpers
provision/ ca.py, prompt.py, skills.py, git.py, supervise.py
```
### Network + egress topology
```
┌── macOS host ─────────────────────────────────────────────┐
│ │
│ ┌── per-bottle host sidecars (one set per microVM) ─┐ │
│ │ pipelock 127.0.0.1:<p1> │ │
│ │ egress 127.0.0.1:<p2> │ │
│ │ git-gate 127.0.0.1:<p3> │ │
│ │ supervise 127.0.0.1:<p4> │ │
│ └───────────────────────────────────────────────────┘ │
│ ▲ │
│ │ TSI passthrough (localhost) │
│ │ │
│ ┌── libkrun microVM (per bottle) ───────────────────┐ │
│ │ env: HTTPS_PROXY=http://127.0.0.1:<p1> │ │
│ │ EGRESS_URL=http://127.0.0.1:<p2> │ │
│ │ GIT_GATE_URL=http://127.0.0.1:<p3> │ │
│ │ MCP_SUPERVISE_URL=http://127.0.0.1:<p4> │ │
│ │ --outbound-localhost-only │ │
│ │ DNS filter (vsock:6002) → host allowlist │ │
│ └───────────────────────────────────────────────────┘ │
│ │
└───────────────────────────────────────────────────────────┘
```
Two changes vs. the Docker backend:
1. **Sidecars are host processes, not sibling containers.** No
internal Docker network; isolation comes from TSI plus the
per-bottle loopback port set.
2. **The "internal" allowlist becomes localhost-only.** Egress out
to the public internet still happens through pipelock + egress
— the same scanning + DLP + auth-injection chain — but the
agent's first hop is `127.0.0.1:<p1>` reached via TSI, not a
sidecar's IP on a Docker-managed bridge.
### Lifecycle
`SmolmachinesBottleBackend.prepare(spec, stage_dir)`:
1. Cross-backend validation via `BottleBackend._validate` (skills,
git identity files).
2. Allocate four loopback ports (bind, get free port, release;
record on plan).
3. Resolve the agent OCI archive path (build if missing, cache by
Dockerfile + agent-name hash).
4. Render the per-bottle Smolfile to `stage_dir/smolfile.toml`,
pinning command/env/`--outbound-localhost-only` + DNS allowlist.
5. Resolve the in-VM CA paths so launch knows where to copy
pipelock's CA after start.
6. Return a `SmolmachinesBottlePlan` carrying the slug, port map,
OCI archive path, Smolfile path, and host sidecar specs.
`SmolmachinesBottleBackend.launch(plan)`:
1. Start the four host sidecars in dependency order (pipelock →
egress → git-gate → supervise), bound to the plan's allocated
ports. Register teardown callbacks in reverse order.
2. `smolvm machine create --smolfile <path>` and
`smolvm machine start <name>`.
3. Provisioning: CA install → prompt → skills → git → supervise
config, each via `smolvm machine exec` (analogous to
`docker exec`).
4. Yield a `SmolmachinesBottle` whose `exec_claude` / `exec` /
`cp_in` all funnel through `smolvm machine exec` /
`smolvm machine cp`.
5. Teardown: stop and remove the VM, then stop the sidecars (in
reverse start order).
### Data model
No manifest schema change. `bottles[]` continues to carry
`egress.allowlist`, `env`, `git`, `skills` references, etc.; the
smolmachines backend reads the same fields as the docker backend.
The DNS allowlist plumbed into the Smolfile is just
`bottle.egress.allowlist` re-encoded as TOML.
The `BottleSpec` dataclass and the `Bottle` ABC do not change.
### Selection wiring
In `claude_bottle/backend/__init__.py`:
```python
from .docker import DockerBottleBackend
from .smolmachines import SmolmachinesBottleBackend
_BACKENDS: dict[str, BottleBackend[Any, Any]] = {
"docker": DockerBottleBackend(),
"smolmachines": SmolmachinesBottleBackend(),
}
```
The existing "unknown backend" `die()` path stays as-is.
### External dependencies
- `smolvm` CLI binary on `$PATH` (one new external dep, gated by
the preflight check). Pinned version policy is deferred to the
open questions; v1 reads `smolvm --version` and refuses to launch
outside a known-good range.
- No new Python packages. Subprocess + stdlib `tomllib`/`tomli_w`
for Smolfile authoring. (`tomli_w` is the only candidate
module; if it's not stdlib in the target Python, render TOML
by hand from a `dict[str, Any]` — Smolfile shape is small.)
### Acceptance test plan
- **Unit:** `tests/unit/test_smolfile.py` verifies the renderer
produces the expected TOML for a fixture bottle (allowlist →
DNS rules, env → `env =`, command line, outbound-localhost
flag).
- **Integration smoke:** `tests/integration/test_smolmachines_smoke.py`
with `prepare → launch → exec → teardown`, guarded by a
`smolvm` presence check + macOS / KVM platform check.
- **PRD 0022 re-run:** with `CLAUDE_BOTTLE_BACKEND=smolmachines`,
all five attack categories return sandbox-block markers and the
suite passes. The test code does not change beyond the env-var
flip — that's the contract the PRD 0022 abstraction was
designed for.
## Sizing — into chunks
1. **Backend skeleton + selection + Smolfile renderer.** Subpackage
layout, `_resolve_plan` stub that emits a TOML file but doesn't
launch anything, `_BACKENDS` registration, preflight `smolvm`
check. Unit test on the renderer. No VM bringup yet.
2. **VM lifecycle + OCI archive build.** `smolvm.py` subprocess
wrapper, prepare-time image build (existing Dockerfile → OCI
archive), launch path that creates + starts + stops a VM with
no sidecars wired. Smoke integration test: `exec("echo hi")`
inside a started VM.
3. **Host-side sidecar relocation.** `sidecars.py`: per-bottle
pipelock + egress + git-gate + supervise as host processes on
loopback. Port allocator. Teardown ordering. No provisioning
yet beyond what the sidecars need.
4. **Provisioning parity with Docker.** CA install via
`smolvm machine exec`, prompt/skills/.git copy-in, supervise
MCP config. End-to-end `start` works for a real agent manifest.
5. **PRD 0022 sandbox-escape suite green.** Skip-guard update,
small adjustments to test helpers if any (the test uses
`bottle.exec(script)` and inspects `returncode` + body for
sandbox markers — should be transport-agnostic, but verify).
Document the macOS-only scope in README.
## Open questions
1. **Sidecar locality: host process vs in-VM init.** This PRD
defaults to host-process sidecars (proposed design above). The
alternative — bake pipelock + egress + git-gate + supervise
into the OCI image and start them via init in the same VM —
would simplify port plumbing (the agent reaches sidecars over
localhost inside the VM, not over TSI) but expands the trust
boundary of the agent VM. Default A unless someone identifies
a TSI loopback edge case during chunk 3.
2. **`smolvm` install policy.** Pin via brew formula version, or
build-from-source step, or vendored binary checked into the
repo. v1 most likely runs `smolvm --version` at preflight and
accepts a documented range; vendoring is heavier but reduces
"works on my Mac" drift.
3. **CA install inside the OCI overlay.** Two paths: bake at
prepare time (one OCI archive per CA fingerprint, big cache
key) vs. inject at start time via `smolvm machine exec` after
the VM is up. PRD 0006 chose the runtime path for Docker
(docker-cp + `update-ca-certificates`); smolvm has the same
shape via `machine exec`. Default to runtime injection unless
it conflicts with `--outbound-localhost-only` start order.
4. **DNS filter granularity.** smolmachines's vsock-6002 filter
accepts an allowlist of hostnames; we want to enforce both
"agent can only resolve names on the bottle's allowlist" *and*
"agent can only egress via TSI to 127.0.0.1." Confirm
empirically (smoke test in chunk 2) that the allowlist applies
to *guest-initiated* DNS only and doesn't accidentally NXDOMAIN
the host-side pipelock's upstream lookups.
5. **`bottle.exec(script)` exit-code fidelity.** The PRD 0022 test
suite reads `returncode` + stdout + stderr from
`ExecResult`. Confirm `smolvm machine exec` propagates exit
codes and separated streams — the research note's
"external integration is the CLI" implies yes, but the
embedded SDK bug it flagged suggests we should verify before
coding around it.
6. **CI gating.** Gitea's act_runner is Linux without nested KVM,
so smolmachines integration tests will skip there for the same
structural reason the Docker bringup tests do (no real
isolation primitive available on the runner). The skip
predicate becomes `not (smolvm_available() and
(platform.system() == "Darwin" or kvm_available()))`. CI
coverage for this backend will come from local runs on the
maintainer's macOS host until a Darwin runner is wired up;
ack that as a known gap.
7. **Active bottle discovery.** Docker uses container labels to
enumerate active bottles (`list_active` queries the daemon).
smolmachines's enumeration story is `smolvm machine list`; the
plan is to mirror the label scheme via Smolfile metadata
(`labels = { "claude-bottle" = "1" }`-style entries, if the
format supports it; otherwise via a deterministic name prefix
`claude-bottle-<slug>`).
## References
- `docs/research/smolmachines-as-vm-backend.md` — primary research
note recommending this adoption; PRD 0023's design hypothesis.
- `docs/research/agent-vm-isolation.md` — the broader microVM /
gvproxy / pipelock landscape this PRD lands inside of.
- `docs/research/agent-sandbox-landscape.md` — identifies
`"runtime": "microvm"`-style opt-in as the borrowable idea;
smolmachines is the concrete implementation.
- PRD 0003 (`docs/prds/0003-bottle-backend-abstraction.md`) — the
backend abstraction this PRD is the first non-Docker consumer
of.
- PRD 0017 (`docs/prds/0017-egress-proxy-via-mitmproxy.md`) — the
egress sidecar the host-side relocation reuses verbatim, only
with a different transport.
- PRD 0022
(`docs/prds/0022-sandbox-escape-integration-test.md`) — the
acceptance gate for this PRD; the suite already runs through
`get_bottle_backend()` so the env-var flip is the only change
needed to exercise the smolmachines path.