# PRD prd-new: smolmachines backend on Linux - **Status:** Draft - **Author:** Claude - **Created:** 2026-06-25 - **Issue:** #283 ## Summary Make the `smolmachines` backend (PRD 0023) runnable on Linux, not just macOS. `smolvm` already supports Linux via KVM (`/dev/kvm`); the gap is entirely in bot-bottle's host-side glue, which hard-codes macOS assumptions in three places: 1. **Preflight** only checks that `smolvm` is on `PATH` — it never checks the Linux KVM prerequisite, so a misconfigured host fails deep in the launch flow with an opaque `smolvm` error. 2. **The TSI allowlist enforcement** (`force_allowlist`) — the security property that confines the agent VM to its sidecar bundle's `/32` — **no-ops on Linux today, failing _open_**. The smolvm state-DB path it patches is hard-coded to macOS's `~/Library/Application Support/...`. 3. **Per-bottle loopback scoping** (`allocate`) returns the shared `127.0.0.1` on Linux, which would let the agent VM reach every service on host loopback — a downgrade from the per-bottle alias isolation macOS gets. This PRD closes all three so a bottle launched with `BOT_BOTTLE_BACKEND=smolmachines` on Linux gets the same isolation guarantee it gets on macOS, and documents the Linux/NixOS host setup. The primary validation target is NixOS, but the changes are distro-agnostic. ## Problem The smolmachines backend runs each bottle's agent inside a libkrun microVM via `smolvm`, with egress confined by TSI's `--allow-cidr` allowlist set to a single `/32` — the sidecar bundle's loopback address. Everything else (host loopback, LAN, internet) is denied at the VMM layer. That security property is the entire reason the backend exists. libkrun runs on Hypervisor.framework (macOS) **and** KVM (Linux), and `smolvm` ships Linux x86_64 / aarch64 builds that require `/dev/kvm`. So the microVM layer already works on Linux. What does not work is bot-bottle's host integration, which PRD 0023 explicitly scoped to macOS-only for v1. Three concrete blockers: - **No KVM preflight.** On a Linux host without `/dev/kvm` (kernel module not loaded) or without access to it (user not in the `kvm` group), the failure surfaces as a cryptic `smolvm` non-zero exit mid-launch instead of an actionable message. - **TSI enforcement fails open on Linux.** `force_allowlist` early-returns on non-macOS. It exists because `smolvm` 0.8.0 silently drops `--allow-cidr` when combined with `--from`, so the allowlist has to be patched into smolvm's persisted state DB before `machine start`. On Linux that patch never runs **and** the DB path is the macOS path, so the booted VM's TSI allowlist is whatever smolvm defaulted to — potentially all of `127.0.0.0/8`. That is the exact sandbox-escape the backend is supposed to prevent. - **No per-bottle loopback isolation on Linux.** `allocate` returns `127.0.0.1` on Linux. Even with a correct allowlist, `127.0.0.1/32` is shared by every service on host loopback, so the agent could reach other bottles' published ports and host services. On macOS this is solved with per-bottle `127.0.0.16..31` aliases added via `sudo ifconfig lo0 alias`. On Linux the whole `127.0.0.0/8` is already routed to `lo`, so docker can publish to `127.0.0.` with **no `ifconfig`/sudo step at all** — the isolation is actually cheaper to achieve than on macOS. ## Goals / Success Criteria - `BOT_BOTTLE_BACKEND=smolmachines ./cli.py start ` launches, runs, and tears down a bottle on a Linux host with `/dev/kvm`. - The TSI allowlist is enforced on Linux: PRD 0022's `tests/integration/test_sandbox_escape.py` passes against `BOT_BOTTLE_BACKEND=smolmachines` on Linux (the acceptance gate). - Each Linux bottle is scoped to its own `127.0.0./32`, matching the macOS per-bottle isolation property. - A clear, actionable preflight error when `/dev/kvm` is missing or inaccessible, with remediation (load `kvm-intel`/`kvm-amd`, join the `kvm` group). - **Fail-closed:** if bot-bottle cannot positively confirm the TSI allowlist was persisted for a machine (DB missing, row missing, patch didn't take), it `die()`s before `machine start` rather than booting a VM with an unverified allowlist. - macOS behavior is unchanged. - README documents Linux + NixOS host setup. ## Non-goals - Rootless / non-KVM fallbacks (e.g. software emulation). Linux smolmachines requires `/dev/kvm`, full stop. - Removing Docker as a host dependency — the sidecar bundle and image-build pipeline still use Docker on Linux, same as macOS. - Auto-installing `smolvm` or configuring KVM on the operator's behalf. Preflight reports; the operator remediates. - Nested-virtualization tuning for running the runner itself inside a VM (documented as a caveat, not solved here). ## Design ### Platform detection Reuse the existing `platform.system()` check already in `loopback_alias.py` (`_is_macos()`). "Linux" is "not macOS" for every branch below; no new third-platform path. ### Preflight: KVM gate (`util.smolmachines_preflight`) After the existing `smolvm`-on-`PATH` check, add a Linux-only gate: - `/dev/kvm` must exist → else `die()` with "enable KVM (`kvm-intel`/`kvm-amd` kernel module)". - `/dev/kvm` must be readable + writable by the current user (`os.access(..., R_OK | W_OK)`) → else `die()` with "add your user to the `kvm` group (and re-login)". macOS is unaffected (Hypervisor.framework needs no device node). ### smolvm state-DB path (platform-aware) `loopback_alias._SMOLVM_DB_PATH` becomes platform-derived: - macOS: `~/Library/Application Support/smolvm/server/smolvm.db` (unchanged). - Linux: `$XDG_DATA_HOME/smolvm/server/smolvm.db`, defaulting to `~/.local/share/smolvm/server/smolvm.db`. > **Verification note:** the Linux DB location is inferred from > smolvm's documented `~/.local/share` install layout and the XDG > base-dir spec. It must be confirmed on a real Linux smolvm install; > if smolvm uses a different path or schema, the fail-closed check > below turns that into a clear `die()` at launch rather than a silent > escape. ### TSI enforcement: cross-platform + fail-closed (`force_allowlist`) Rework `force_allowlist(machine_name, allowed_cidrs)` to run on **both** platforms and to fail closed: 1. Resolve the state DB; if the file is missing, `die()` (cannot confirm enforcement → refuse to launch). 2. Read the machine's persisted row; if the row is missing, `die()`. 3. If the row's `allowed_cidrs` already equals the requested list (e.g. a newer `smolvm` that honors `--allow-cidr` at create), do nothing — no write. 4. Otherwise patch `allowed_cidrs` (the existing BLOB-encoded write) and re-read. 5. If, after the patch, `allowed_cidrs` still does not equal the requested list, `die()`. This is robust across smolvm versions: it works whether `--allow-cidr` is silently dropped (0.8.0) or honored (newer), and it never boots a VM whose persisted allowlist it could not confirm. It is a strict improvement on macOS too (today's code writes unconditionally and never verifies). > The persisted-row check confirms our write took, not that smolvm's > runtime TSI enforces it. The runtime guarantee is covered by the > sandbox-escape acceptance test; the persisted check is the cheap > fail-closed guard at launch. ### Per-bottle loopback scoping on Linux (`allocate`) `allocate` runs the same docker-state-driven allocation on Linux as on macOS (`_allocate_locked`, the file lock, and `_aliases_in_use` via `docker inspect` are all already cross-platform). The only macOS-only step, `ensure_pool` (the `sudo ifconfig lo0 alias` dance), stays macOS-only: on Linux `127.0.0.0/8` is already loopback, so docker can publish bundle ports directly on `127.0.0.` with no setup. Net effect: Linux bottles get per-bottle `127.0.0.16..31/32` scoping identical to macOS, without sudo. ### Launch flow `launch.py` needs no structural change — `_allocate_resources` already calls `ensure_pool()` (now a Linux no-op) then `allocate()` (now per-bottle on Linux), and `_launch_vm` already calls `force_allowlist()` (now active on Linux). Only the macOS-specific docstrings are updated to describe the cross-platform behavior. ## Implementation chunks 1. **Preflight KVM gate** — `util.smolmachines_preflight` + unit tests for the missing-device and no-access branches. 2. **Platform-aware DB path + fail-closed `force_allowlist`** — `loopback_alias.py`; update/extend `TestForceAllowlist`. 3. **Cross-platform `allocate`** — drop the Linux early-return; update `TestAllocate` / `TestAllocateLock` for the new Linux behavior. 4. **Docstring + comment cleanup** in `launch.py` and module headers. 5. **Docs** — README requirements + a Linux/NixOS host-setup section. ## Testing Strategy - **Unit (CI, any OS):** the suite mocks `platform.system()` / `subprocess` and patches `_SMOLVM_DB_PATH`, so the new Linux branches are testable on the macOS/Linux CI runner without `smolvm` or KVM. Covers: KVM preflight branches, fail-closed `force_allowlist` (DB missing, row missing, patch-doesn't-take), per-bottle Linux allocation + locking, platform-derived DB path. - **Integration (Linux host with KVM — the acceptance gate):** `tests/integration/test_sandbox_escape.py` against `BOT_BOTTLE_BACKEND=smolmachines`. This cannot run on the macOS dev box and must be executed on NixOS before merge. ## Open questions / verification pending - **Confirm the Linux smolvm state-DB path and schema** on a real install (the `~/.local/share/...` inference above). - **Confirm whether the current smolvm Linux build still drops `--allow-cidr` with `--from`** (the 0.8.0 bug). The fail-closed design handles either answer, but knowing lets us drop the DB patch if upstream fixed it. - **Confirm docker publishing to `127.0.0.` on Linux** behaves as expected end-to-end with TSI (high confidence; standard loopback behavior, but unverified on the target host). ## References - PRD 0023 — smolmachines bottle backend (macOS v1). - PRD 0022 — `test_sandbox_escape.py` acceptance gate. - PRD 0024 — sidecar bundle image. - smolvm: https://github.com/smol-machines/smolvm