Files
bot-bottle/docs/prds/0023-smolmachines-backend.md
T
didericis a2ac124d5c
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m7s
docs(prd-0023): smolmachines bottle backend
Specs a second concrete BottleBackend selectable via
CLAUDE_BOTTLE_BACKEND=smolmachines: per-agent libkrun microVM on
macOS, sidecars relocated to host-side loopback ports plumbed via
Smolfile env, PRD 0022's sandbox-escape suite as the acceptance
gate (the env-var flip is the only change required). Docker
backend ships unchanged and remains default.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 23:19:08 -04:00

21 KiB

PRD 0023: smolmachines bottle backend

  • Status: Draft
  • Author: didericis
  • Created: 2026-05-26

Summary

Ship a second concrete BottleBackendSmolmachinesBottleBackend, selected via CLAUDE_BOTTLE_BACKEND=smolmachines — that runs a bottle inside a per-agent libkrun microVM on macOS (and KVM on Linux, opportunistically). The egress topology moves out of an internal Docker network and onto libkrun's TSI ("Transport Socket Interface") allowlist plus a host-side pipelock/egress/git-gate/supervise stack listening on per-bottle loopback ports. The Docker backend ships unchanged; this is opt-in via the existing env-var selector.

The acceptance gate is PRD 0022's tests/integration/test_sandbox_escape.py running green against CLAUDE_BOTTLE_BACKEND=smolmachines.

Problem

agent-vm-isolation.md argues for hardware-isolated microVMs over container-based bottles on macOS; smolmachines-as-vm-backend.md concludes that smolmachines is the most plausible concrete VMM for this project. Today, the only backend in the registry is Docker (claude_bottle/backend/__init__.py:_BACKENDS = {"docker": ...}), and three things motivate a second one now:

  • Isolation ceiling. On macOS the Docker backend's agent container shares Docker Desktop's host VM with every other bottle. Container escape from claude-code lands the agent inside that shared VM. A per-bottle libkrun microVM gets hardware page tables via Hypervisor.framework; cross-bottle isolation becomes enforced by the CPU's MMU instead of namespace bookkeeping.
  • PRD 0022 is backend-agnostic by design but currently only exercises the Docker backend. The suite was written with CLAUDE_BOTTLE_BACKEND selection in mind precisely so the smolmachines path could be validated against the same five attacks. Until a second backend exists, the abstraction is unproven.
  • CI carve-outs. Most bottle-bringup integration tests skip under GITEA_ACTIONS=true because act_runner shares the host Docker socket but not the host filesystem. A smolmachines path doesn't share that constraint shape (it has its own, but different), so adding the backend forces the abstraction to be clean in places where Docker-specific assumptions have been tolerated.

The smolmachines research note's ## Recommendation ("adopt smolmachines as the bottle VM backend on macOS; keep pipelock DIY") is the design hypothesis under test here.

Goals / Success Criteria

The feature works when all of the following are observable on a macOS host with smolmachines installed:

  • CLAUDE_BOTTLE_BACKEND=smolmachines python3 cli.py start <agent> brings up a microVM, runs claude-code inside it, and tears it down on exit. Same y/N preflight UX as Docker — only the resolved-runtime line differs.
  • The sandbox-escape suite in tests/integration/test_sandbox_escape.py runs green against the smolmachines backend (all five attack categories blocked).
  • Selecting the backend on a host without smolvm installed dies at startup with an install pointer; no silent fall-through to Docker.
  • Active bottles show up under python3 cli.py list-bottles regardless of backend.
  • python3 cli.py stop <bottle> and orphan cleanup work for both Docker bottles and smolmachines bottles via the same CLI surface.

The feature is done when all of the following ship:

  • A new claude_bottle/backend/smolmachines/ subpackage exists, mirroring the layout of claude_bottle/backend/docker/ (backend.py, bottle.py, bottle_plan.py, bottle_cleanup_plan.py, prepare.py, launch.py, cleanup.py, util.py, and a provision/ subpackage for the five provision_* methods).
  • SmolmachinesBottleBackend registered under the "smolmachines" key in claude_bottle/backend/__init__.py:_BACKENDS.
  • Per-bottle Smolfile generation: a runtime-rendered TOML written to the bottle's stage dir, analogous to the compose file the Docker backend writes today. The Smolfile pins command, env, --outbound-localhost-only, and the per-bottle DNS allowlist.
  • Host-side sidecar relocation: pipelock, egress, git-gate, and supervise each run as host processes (one set per bottle), bound to 127.0.0.1 on per-bottle dynamically-allocated ports. The agent's environment carries the resolved URLs (e.g. HTTPS_PROXY=http://127.0.0.1:<pipelock-port>).
  • The agent guest image is produced from the existing Dockerfile (or a thin variant), exported as an OCI archive, and consumed by smolvm machine create. The image build step is part of prepare, analogous to docker_mod.build_image.
  • The PRD 0022 sandbox-escape suite, run with CLAUDE_BOTTLE_BACKEND=smolmachines, passes locally on a smolmachines-capable host. The suite is updated to skip cleanly on hosts that can't reach smolmachines (same shape as the existing GITEA_ACTIONS == "true" skip), not to fail.
  • README + CLAUDE.md updated to document the env-var selection, the macOS-only scope for v1, and the smolvm install prerequisite.

Non-goals

  • No Linux KVM support shipped in this PRD. smolmachines works on Linux via KVM, but the abstraction win is biggest on macOS where Docker's shared-VM topology hurts most. Linux can come later behind the same selector.
  • No removal of the Docker backend. Both backends ship side by side. Selection stays env-driven; the manifest does not gain a backend field.
  • No default-backend change. docker remains the default value of CLAUDE_BOTTLE_BACKEND; smolmachines is strictly opt-in until it has been load-bearing on at least one operator's workflow for a release cycle.
  • No host bind mounts. The smolmachines research note flagged that -v HOST:GUEST mounts via virtiofs would defeat the isolation goal. The manifest already has no concept of host mounts; this PRD does not introduce one. If a future PRD wants agent-side access to host files, it must come through a controlled channel (vsock relay, OCI overlay, supervise sidecar endpoint).
  • No HTTP API mode. smolvm serve is the long-term-clean control plane, but v1 drives smolmachines via CLI subprocess invocations — the lower-overhead first iteration the research note already endorses.
  • No custom kernel / initrd. smolmachines uses libkrunfw only; the agent image is an OCI ref, not a kernel + rootfs pair.
  • No warm-pool or snapshot/restore. Each bottle gets a fresh microVM; cold-start cost is paid up front.
  • No supervise/agent-credential rewrites for the new backend. Provisioning logic ports as-is; only the transport (host-side port URLs instead of in-network DNS names) changes.

Scope

In scope

  • New claude_bottle/backend/smolmachines/ subpackage with the full set of BottleBackend overrides.
  • Smolfile generator (TOML), analogous to backend/docker/compose.py's bottle_plan_to_compose.
  • A host-side sidecar process manager that owns the lifecycle of pipelock + egress + git-gate + supervise for one bottle, binding them to per-bottle loopback ports and tearing them down with the bottle. This is the smolmachines-specific replacement for docker compose up/down.
  • Per-bottle CA install path: the egress sidecar's CA cert lands inside the microVM via smolvm machine exec after start (analogous to the existing provision_ca for Docker).
  • DNS allowlist plumbing: every host in bottle.egress.allowlist goes into the Smolfile's DNS filter section (vsock port 6002), so the VMM-layer DNS filter and the bottle's policy stay in sync — agent can't dig its way out via raw IP literals (TSI
    • CIDR allowlist enforces this; DNS filter denies hostname resolution).
  • Preflight smolvm check: if the user selects this backend and smolvm isn't on $PATH, die with an install pointer (brew tap
    • version pin TBD in implementation; see open question 3).
  • Manifest validation: refuse any bottle field this backend can't honor (today there are none, since the Docker backend already rejects host mounts; this is a forward-compat check).
  • Tests:
    • Smoke unit-level test: Smolfile renderer produces the expected TOML for a fixture bottle.
    • Integration test: prepare → launch → exec("echo hi") → teardown on a smolmachines-capable host (skips otherwise via the same env/platform gate the Docker integration tests use).
    • PRD 0022 suite, re-run with the env var flipped, passes.

Out of scope

  • VM image caching across bottles (each prepare rebuilds from the OCI archive; layer reuse is whatever smolmachines provides).
  • Cross-host bottle relocation (the OCI archive is local-only).
  • Operator-facing knobs for vCPU / memory / overlay size (use sensible defaults; expose as manifest fields in a later PRD if needed).
  • Integration with the supervise plane's permission-prompt UX beyond port plumbing — supervise already speaks HTTP and binds to whatever loopback the backend hands it.

Proposed Design

Backend layout

claude_bottle/backend/smolmachines/
  __init__.py            re-exports SmolmachinesBottleBackend
  backend.py             SmolmachinesBottleBackend façade
  bottle.py              SmolmachinesBottle (exec_claude / exec / cp_in / close)
  bottle_plan.py         SmolmachinesBottlePlan + .print()
  bottle_cleanup_plan.py SmolmachinesBottleCleanupPlan
  prepare.py             resolve_plan(spec, stage_dir, ...) -> SmolmachinesBottlePlan
  launch.py              @contextmanager launch(plan) -> SmolmachinesBottle
  cleanup.py             prepare_cleanup / cleanup / list_active
  smolfile.py            bottle_plan_to_smolfile(...) -> dict + render
  sidecars.py            host-side pipelock/egress/git-gate/supervise lifecycle
  smolvm.py              thin subprocess wrapper: machine create/start/exec/stop
  util.py                slugify, port allocation, OCI archive helpers
  provision/             ca.py, prompt.py, skills.py, git.py, supervise.py

Network + egress topology

  ┌── macOS host ─────────────────────────────────────────────┐
  │                                                           │
  │  ┌── per-bottle host sidecars (one set per microVM) ─┐    │
  │  │ pipelock     127.0.0.1:<p1>                       │    │
  │  │ egress       127.0.0.1:<p2>                       │    │
  │  │ git-gate     127.0.0.1:<p3>                       │    │
  │  │ supervise    127.0.0.1:<p4>                       │    │
  │  └───────────────────────────────────────────────────┘    │
  │                          ▲                                │
  │                          │ TSI passthrough (localhost)    │
  │                          │                                │
  │  ┌── libkrun microVM (per bottle) ───────────────────┐    │
  │  │ env: HTTPS_PROXY=http://127.0.0.1:<p1>            │    │
  │  │      EGRESS_URL=http://127.0.0.1:<p2>             │    │
  │  │      GIT_GATE_URL=http://127.0.0.1:<p3>           │    │
  │  │      MCP_SUPERVISE_URL=http://127.0.0.1:<p4>      │    │
  │  │ --outbound-localhost-only                         │    │
  │  │ DNS filter (vsock:6002) → host allowlist          │    │
  │  └───────────────────────────────────────────────────┘    │
  │                                                           │
  └───────────────────────────────────────────────────────────┘

Two changes vs. the Docker backend:

  1. Sidecars are host processes, not sibling containers. No internal Docker network; isolation comes from TSI plus the per-bottle loopback port set.
  2. The "internal" allowlist becomes localhost-only. Egress out to the public internet still happens through pipelock + egress — the same scanning + DLP + auth-injection chain — but the agent's first hop is 127.0.0.1:<p1> reached via TSI, not a sidecar's IP on a Docker-managed bridge.

Lifecycle

SmolmachinesBottleBackend.prepare(spec, stage_dir):

  1. Cross-backend validation via BottleBackend._validate (skills, git identity files).
  2. Allocate four loopback ports (bind, get free port, release; record on plan).
  3. Resolve the agent OCI archive path (build if missing, cache by Dockerfile + agent-name hash).
  4. Render the per-bottle Smolfile to stage_dir/smolfile.toml, pinning command/env/--outbound-localhost-only + DNS allowlist.
  5. Resolve the in-VM CA paths so launch knows where to copy pipelock's CA after start.
  6. Return a SmolmachinesBottlePlan carrying the slug, port map, OCI archive path, Smolfile path, and host sidecar specs.

SmolmachinesBottleBackend.launch(plan):

  1. Start the four host sidecars in dependency order (pipelock → egress → git-gate → supervise), bound to the plan's allocated ports. Register teardown callbacks in reverse order.
  2. smolvm machine create --smolfile <path> and smolvm machine start <name>.
  3. Provisioning: CA install → prompt → skills → git → supervise config, each via smolvm machine exec (analogous to docker exec).
  4. Yield a SmolmachinesBottle whose exec_claude / exec / cp_in all funnel through smolvm machine exec / smolvm machine cp.
  5. Teardown: stop and remove the VM, then stop the sidecars (in reverse start order).

Data model

No manifest schema change. bottles[] continues to carry egress.allowlist, env, git, skills references, etc.; the smolmachines backend reads the same fields as the docker backend. The DNS allowlist plumbed into the Smolfile is just bottle.egress.allowlist re-encoded as TOML.

The BottleSpec dataclass and the Bottle ABC do not change.

Selection wiring

In claude_bottle/backend/__init__.py:

from .docker import DockerBottleBackend
from .smolmachines import SmolmachinesBottleBackend

_BACKENDS: dict[str, BottleBackend[Any, Any]] = {
    "docker": DockerBottleBackend(),
    "smolmachines": SmolmachinesBottleBackend(),
}

The existing "unknown backend" die() path stays as-is.

External dependencies

  • smolvm CLI binary on $PATH (one new external dep, gated by the preflight check). Pinned version policy is deferred to the open questions; v1 reads smolvm --version and refuses to launch outside a known-good range.
  • No new Python packages. Subprocess + stdlib tomllib/tomli_w for Smolfile authoring. (tomli_w is the only candidate module; if it's not stdlib in the target Python, render TOML by hand from a dict[str, Any] — Smolfile shape is small.)

Acceptance test plan

  • Unit: tests/unit/test_smolfile.py verifies the renderer produces the expected TOML for a fixture bottle (allowlist → DNS rules, env → env =, command line, outbound-localhost flag).
  • Integration smoke: tests/integration/test_smolmachines_smoke.py with prepare → launch → exec → teardown, guarded by a smolvm presence check + macOS / KVM platform check.
  • PRD 0022 re-run: with CLAUDE_BOTTLE_BACKEND=smolmachines, all five attack categories return sandbox-block markers and the suite passes. The test code does not change beyond the env-var flip — that's the contract the PRD 0022 abstraction was designed for.

Sizing — into chunks

  1. Backend skeleton + selection + Smolfile renderer. Subpackage layout, _resolve_plan stub that emits a TOML file but doesn't launch anything, _BACKENDS registration, preflight smolvm check. Unit test on the renderer. No VM bringup yet.
  2. VM lifecycle + OCI archive build. smolvm.py subprocess wrapper, prepare-time image build (existing Dockerfile → OCI archive), launch path that creates + starts + stops a VM with no sidecars wired. Smoke integration test: exec("echo hi") inside a started VM.
  3. Host-side sidecar relocation. sidecars.py: per-bottle pipelock + egress + git-gate + supervise as host processes on loopback. Port allocator. Teardown ordering. No provisioning yet beyond what the sidecars need.
  4. Provisioning parity with Docker. CA install via smolvm machine exec, prompt/skills/.git copy-in, supervise MCP config. End-to-end start works for a real agent manifest.
  5. PRD 0022 sandbox-escape suite green. Skip-guard update, small adjustments to test helpers if any (the test uses bottle.exec(script) and inspects returncode + body for sandbox markers — should be transport-agnostic, but verify). Document the macOS-only scope in README.

Open questions

  1. Sidecar locality: host process vs in-VM init. This PRD defaults to host-process sidecars (proposed design above). The alternative — bake pipelock + egress + git-gate + supervise into the OCI image and start them via init in the same VM — would simplify port plumbing (the agent reaches sidecars over localhost inside the VM, not over TSI) but expands the trust boundary of the agent VM. Default A unless someone identifies a TSI loopback edge case during chunk 3.
  2. smolvm install policy. Pin via brew formula version, or build-from-source step, or vendored binary checked into the repo. v1 most likely runs smolvm --version at preflight and accepts a documented range; vendoring is heavier but reduces "works on my Mac" drift.
  3. CA install inside the OCI overlay. Two paths: bake at prepare time (one OCI archive per CA fingerprint, big cache key) vs. inject at start time via smolvm machine exec after the VM is up. PRD 0006 chose the runtime path for Docker (docker-cp + update-ca-certificates); smolvm has the same shape via machine exec. Default to runtime injection unless it conflicts with --outbound-localhost-only start order.
  4. DNS filter granularity. smolmachines's vsock-6002 filter accepts an allowlist of hostnames; we want to enforce both "agent can only resolve names on the bottle's allowlist" and "agent can only egress via TSI to 127.0.0.1." Confirm empirically (smoke test in chunk 2) that the allowlist applies to guest-initiated DNS only and doesn't accidentally NXDOMAIN the host-side pipelock's upstream lookups.
  5. bottle.exec(script) exit-code fidelity. The PRD 0022 test suite reads returncode + stdout + stderr from ExecResult. Confirm smolvm machine exec propagates exit codes and separated streams — the research note's "external integration is the CLI" implies yes, but the embedded SDK bug it flagged suggests we should verify before coding around it.
  6. CI gating. Gitea's act_runner is Linux without nested KVM, so smolmachines integration tests will skip there for the same structural reason the Docker bringup tests do (no real isolation primitive available on the runner). The skip predicate becomes not (smolvm_available() and (platform.system() == "Darwin" or kvm_available())). CI coverage for this backend will come from local runs on the maintainer's macOS host until a Darwin runner is wired up; ack that as a known gap.
  7. Active bottle discovery. Docker uses container labels to enumerate active bottles (list_active queries the daemon). smolmachines's enumeration story is smolvm machine list; the plan is to mirror the label scheme via Smolfile metadata (labels = { "claude-bottle" = "1" }-style entries, if the format supports it; otherwise via a deterministic name prefix claude-bottle-<slug>).

References

  • docs/research/smolmachines-as-vm-backend.md — primary research note recommending this adoption; PRD 0023's design hypothesis.
  • docs/research/agent-vm-isolation.md — the broader microVM / gvproxy / pipelock landscape this PRD lands inside of.
  • docs/research/agent-sandbox-landscape.md — identifies "runtime": "microvm"-style opt-in as the borrowable idea; smolmachines is the concrete implementation.
  • PRD 0003 (docs/prds/0003-bottle-backend-abstraction.md) — the backend abstraction this PRD is the first non-Docker consumer of.
  • PRD 0017 (docs/prds/0017-egress-proxy-via-mitmproxy.md) — the egress sidecar the host-side relocation reuses verbatim, only with a different transport.
  • PRD 0022 (docs/prds/0022-sandbox-escape-integration-test.md) — the acceptance gate for this PRD; the suite already runs through get_bottle_backend() so the env-var flip is the only change needed to exercise the smolmachines path.