From b97807ac716dbec36275b363c64958c85527ca01 Mon Sep 17 00:00:00 2001 From: didericis Date: Mon, 11 May 2026 16:32:04 -0400 Subject: [PATCH] docs(research): evaluate smolmachines as VM backend Compares smolmachines against the six subsystems in agent-vm-isolation.md. smolmachines replaces the microVM runtime, network attachment (libkrun TSI with built-in DNS-over-vsock filter), vsock control plane, and Python lifecycle wrapper. Pipelock stays; disk-image story shifts to OCI + writable overlay. Recommends adopting smolmachines as the macOS VM backend after smoke-testing TSI passthrough to a host-side pipelock. --- docs/research/smolmachines-as-vm-backend.md | 102 ++++++++++++++++++++ 1 file changed, 102 insertions(+) create mode 100644 docs/research/smolmachines-as-vm-backend.md diff --git a/docs/research/smolmachines-as-vm-backend.md b/docs/research/smolmachines-as-vm-backend.md new file mode 100644 index 0000000..3faa2df --- /dev/null +++ b/docs/research/smolmachines-as-vm-backend.md @@ -0,0 +1,102 @@ +# smolmachines as a VM backend for claude-bottle + +Evaluation of whether [smolmachines](https://smolmachines.com/) would +simplify the macOS agent-VM-isolation work spelled out in +[`agent-vm-isolation.md`](agent-vm-isolation.md). + +Research conducted 2026-05-11. + +## Summary + +smolmachines replaces **four of the six subsystems** in +`agent-vm-isolation.md` cleanly, including the two hardest ones — the +`VZFileHandleNetworkDeviceAttachment` + gvproxy wiring and the PyObjC +lifecycle wrapper. Pipelock stays. The disk image story changes from +"sealed `.img`" to "OCI image + writable overlay," which is fine for the +isolation goal as long as `-v` host mounts are forbidden in any bottle +that maps to a smolmachine. + +Recommendation: adopt smolmachines as the macOS VM backend; keep +pipelock DIY; wire the two via `--outbound-localhost-only` plus +`HTTPS_PROXY` in the Smolfile, after smoke-testing that TSI passes +through 127.0.0.1 traffic to a host-side pipelock. + +## What smolmachines actually is + +- libkrun VMM linked as a library (no daemon); rides directly on Apple + Hypervisor.framework on macOS and KVM on Linux. +- Custom kernel is **not** supported — you get libkrunfw only. Day-to-day + knobs are `command` and `env` in a TOML Smolfile. +- Networking model: libkrun **TSI** ("Transport Socket Interface") — + userspace socket hijacking inside the VMM library itself. DNS + filtering is built in via vsock port 6002 — the guest's + `/etc/resolv.conf` points at `127.0.0.1` and a guest-side DNS proxy + tunnels queries over vsock to the host, which returns NXDOMAIN for + anything not allow-listed. +- vsock control plane is fully implemented with well-known ports: 5000 + workload control, 5001 log streaming, 6000 agent OCI ops, 6001 SSH + agent, 6002 DNS filter. +- External integration is the CLI (`smolvm machine create/start/stop/exec`) + or the HTTP API (`smolvm serve`). No Python SDK yet; Node.js embedded + SDK exists but has a known bug where machines aren't visible to the + CLI. + +## Subsystem-by-subsystem comparison + +| # | Subsystem | DIY recipe (today) | smolmachines | Verdict | Caveats | +|---|---|---|---|---|---| +| 1 | MicroVM runtime | vfkit or PyObjC + Virtualization.framework, minimal device model | libkrun (library, no daemon) over Hypervisor.framework / KVM. libkrunfw kernel only. | Replaces | No custom kernel/initrd. | +| 2 | Network attachment | `VZFileHandleNetworkDeviceAttachment` + unixgram socket → gvproxy userspace stack; DNS NXDOMAIN by default | libkrun TSI — userspace socket hijacking inside the VMM. CIDR allowlist enforced at the VMM layer; guest cannot bypass by dialing IPs. DNS filter via vsock port 6002. | Replaces | TSI is enabled when `--allow-cidr` / `--allow-host` is used; the alternative `virtio-net` backend does not support policy. | +| 3 | Egress proxy (pipelock) | pipelock at `127.0.0.1:8888`, HTTPS MITM + DLP + allowlist | No analogue. Integration: `--outbound-localhost-only` + `env = ["HTTPS_PROXY=http://127.0.0.1:8888"]` in the Smolfile. | Irrelevant — keep pipelock | Whether TSI passes 127.0.0.1 traffic through to a host-side proxy is *unverified*; smoke test required. | +| 4 | Control plane (vsock) | `VZVirtioSocketDeviceConfiguration` + `AF_VSOCK` in guest, Unix socket on host | Full vsock plane built in. External use via `smolvm machine exec` or the `smolvm serve` HTTP API. | Replaces | The well-known vsock ports are internal to smolmachines. Custom task protocols must use the HTTP API or open a fresh vsock port inside the guest. | +| 5 | Disk image | Sealed virtio-blk raw image, no host mounts | OCI image + writable overlay (default 2 GiB, `--overlay` to tune). `-v HOST:GUEST` mounts use virtiofs. `.smolmachine` packs the whole rootfs. | Partial | Overlay is writable and lives on the host. For "no host filesystem visible to the guest," forbid `-v` mounts in bottles that map to smolmachines. | +| 6 | Lifecycle wrapper | ~100 lines PyObjC + `subprocess.Popen` for gvproxy | CLI or `smolvm serve` HTTP API. | Replaces | No Python SDK yet. Drive via `httpx` to the HTTP API, or shell out to the CLI. Embedded Node.js SDK has a known bug (machines invisible to CLI) — avoid for now. | + +## Caveats worth flagging before commitment + +- **No custom kernel.** If the agent-vm-isolation work assumed a + hand-rolled kernel cmdline, that flexibility goes away. Smolfile `env` + and `command` cover the everyday cases. +- **`--allow-host` semantics.** Hostnames are resolved at VM start time + and stored as `/32` CIDRs. All ports on resolved IPs are permitted — + there is no destination-port filtering at the smolmachines layer. + For the pipelock integration path this is acceptable because the + right flag is `--outbound-localhost-only`, not `--allow-host`. +- **TSI passthrough to `127.0.0.1`.** The TSI code path for localhost + isn't explicitly documented. Validate with a pipelock instance + *before* building around it: curl-from-guest → pipelock-on-host + should succeed; curl to any other host should be blocked. +- **Embedded SDK bug.** Machines created via the Node.js embedded SDK + are currently invisible to the CLI. Use the HTTP API instead. +- **Volume policy.** "No host filesystem visible to the guest" needs to + be a manifest-validation rule (no `-v` mounts in microvm-backed + bottles), not just a documentation note. + +## Recommendation + +**Adopt smolmachines as the bottle VM backend on macOS; keep pipelock +DIY.** + +The work in `agent-vm-isolation.md` is mostly the network-attachment +plumbing and the PyObjC wrapper — exactly the parts smolmachines +eliminates. What remains (pipelock integration, picking the right +networking flag, deciding on volumes vs. sealed overlay) is the work +that needs doing regardless of the VM backend. + +This aligns with the borrowable idea identified in +[`agent-sandbox-landscape.md`](agent-sandbox-landscape.md) — a +`"runtime": "microvm"` opt-in field on a bottle. smolmachines is the +most plausible concrete implementation of that field on macOS today. + +### Prereqs before this becomes more than a research note + +1. **Smoke-test TSI → pipelock on localhost.** Confirm the guest can + reach `127.0.0.1:8888` on the host through TSI when launched with + `--outbound-localhost-only`, and that all other hosts are blocked. +2. **Decide volume policy.** Add a manifest-validation rule disallowing + `-v` mounts in any bottle with `"runtime": "microvm"`. +3. **Decide control-plane shape.** Either drive smolmachines via the + HTTP API (`smolvm serve` as a long-running sidecar) or via CLI + subprocess invocation per bottle. The HTTP API is the cleaner + long-term path; CLI subprocesses are the lower-overhead first + iteration.