Design for porting the smolmachines backend off macOS-only: KVM preflight, platform-aware smolvm state-DB path, fail-closed TSI allowlist enforcement, and per-bottle loopback scoping on Linux. NixOS is the primary validation target. Issue: #283 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
10 KiB
PRD prd-new: smolmachines backend on Linux
- Status: Draft
- Author: Claude
- Created: 2026-06-25
- Issue: #283
Summary
Make the smolmachines backend (PRD 0023) runnable on Linux, not
just macOS. smolvm already supports Linux via KVM (/dev/kvm);
the gap is entirely in bot-bottle's host-side glue, which hard-codes
macOS assumptions in three places:
- Preflight only checks that
smolvmis onPATH— it never checks the Linux KVM prerequisite, so a misconfigured host fails deep in the launch flow with an opaquesmolvmerror. - The TSI allowlist enforcement (
force_allowlist) — the security property that confines the agent VM to its sidecar bundle's/32— no-ops on Linux today, failing open. The smolvm state-DB path it patches is hard-coded to macOS's~/Library/Application Support/.... - Per-bottle loopback scoping (
allocate) returns the shared127.0.0.1on Linux, which would let the agent VM reach every service on host loopback — a downgrade from the per-bottle alias isolation macOS gets.
This PRD closes all three so a bottle launched with
BOT_BOTTLE_BACKEND=smolmachines on Linux gets the same isolation
guarantee it gets on macOS, and documents the Linux/NixOS host
setup. The primary validation target is NixOS, but the changes are
distro-agnostic.
Problem
The smolmachines backend runs each bottle's agent inside a libkrun
microVM via smolvm, with egress confined by TSI's --allow-cidr
allowlist set to a single /32 — the sidecar bundle's loopback
address. Everything else (host loopback, LAN, internet) is denied at
the VMM layer. That security property is the entire reason the
backend exists.
libkrun runs on Hypervisor.framework (macOS) and KVM (Linux), and
smolvm ships Linux x86_64 / aarch64 builds that require /dev/kvm.
So the microVM layer already works on Linux. What does not work is
bot-bottle's host integration, which PRD 0023 explicitly scoped to
macOS-only for v1. Three concrete blockers:
-
No KVM preflight. On a Linux host without
/dev/kvm(kernel module not loaded) or without access to it (user not in thekvmgroup), the failure surfaces as a crypticsmolvmnon-zero exit mid-launch instead of an actionable message. -
TSI enforcement fails open on Linux.
force_allowlistearly-returns on non-macOS. It exists becausesmolvm0.8.0 silently drops--allow-cidrwhen combined with--from, so the allowlist has to be patched into smolvm's persisted state DB beforemachine start. On Linux that patch never runs and the DB path is the macOS path, so the booted VM's TSI allowlist is whatever smolvm defaulted to — potentially all of127.0.0.0/8. That is the exact sandbox-escape the backend is supposed to prevent. -
No per-bottle loopback isolation on Linux.
allocatereturns127.0.0.1on Linux. Even with a correct allowlist,127.0.0.1/32is shared by every service on host loopback, so the agent could reach other bottles' published ports and host services. On macOS this is solved with per-bottle127.0.0.16..31aliases added viasudo ifconfig lo0 alias. On Linux the whole127.0.0.0/8is already routed tolo, so docker can publish to127.0.0.<N>with noifconfig/sudo step at all — the isolation is actually cheaper to achieve than on macOS.
Goals / Success Criteria
BOT_BOTTLE_BACKEND=smolmachines ./cli.py start <agent>launches, runs, and tears down a bottle on a Linux host with/dev/kvm.- The TSI allowlist is enforced on Linux: PRD 0022's
tests/integration/test_sandbox_escape.pypasses againstBOT_BOTTLE_BACKEND=smolmachineson Linux (the acceptance gate). - Each Linux bottle is scoped to its own
127.0.0.<N>/32, matching the macOS per-bottle isolation property. - A clear, actionable preflight error when
/dev/kvmis missing or inaccessible, with remediation (loadkvm-intel/kvm-amd, join thekvmgroup). - Fail-closed: if bot-bottle cannot positively confirm the TSI
allowlist was persisted for a machine (DB missing, row missing,
patch didn't take), it
die()s beforemachine startrather than booting a VM with an unverified allowlist. - macOS behavior is unchanged.
- README documents Linux + NixOS host setup.
Non-goals
- Rootless / non-KVM fallbacks (e.g. software emulation). Linux
smolmachines requires
/dev/kvm, full stop. - Removing Docker as a host dependency — the sidecar bundle and image-build pipeline still use Docker on Linux, same as macOS.
- Auto-installing
smolvmor configuring KVM on the operator's behalf. Preflight reports; the operator remediates. - Nested-virtualization tuning for running the runner itself inside a VM (documented as a caveat, not solved here).
Design
Platform detection
Reuse the existing platform.system() check already in
loopback_alias.py (_is_macos()). "Linux" is "not macOS" for every
branch below; no new third-platform path.
Preflight: KVM gate (util.smolmachines_preflight)
After the existing smolvm-on-PATH check, add a Linux-only gate:
/dev/kvmmust exist → elsedie()with "enable KVM (kvm-intel/kvm-amdkernel module)"./dev/kvmmust be readable + writable by the current user (os.access(..., R_OK | W_OK)) → elsedie()with "add your user to thekvmgroup (and re-login)".
macOS is unaffected (Hypervisor.framework needs no device node).
smolvm state-DB path (platform-aware)
loopback_alias._SMOLVM_DB_PATH becomes platform-derived:
- macOS:
~/Library/Application Support/smolvm/server/smolvm.db(unchanged). - Linux:
$XDG_DATA_HOME/smolvm/server/smolvm.db, defaulting to~/.local/share/smolvm/server/smolvm.db.
Verification note: the Linux DB location is inferred from smolvm's documented
~/.local/shareinstall layout and the XDG base-dir spec. It must be confirmed on a real Linux smolvm install; if smolvm uses a different path or schema, the fail-closed check below turns that into a cleardie()at launch rather than a silent escape.
TSI enforcement: cross-platform + fail-closed (force_allowlist)
Rework force_allowlist(machine_name, allowed_cidrs) to run on
both platforms and to fail closed:
- Resolve the state DB; if the file is missing,
die()(cannot confirm enforcement → refuse to launch). - Read the machine's persisted row; if the row is missing,
die(). - If the row's
allowed_cidrsalready equals the requested list (e.g. a newersmolvmthat honors--allow-cidrat create), do nothing — no write. - Otherwise patch
allowed_cidrs(the existing BLOB-encoded write) and re-read. - If, after the patch,
allowed_cidrsstill does not equal the requested list,die().
This is robust across smolvm versions: it works whether --allow-cidr
is silently dropped (0.8.0) or honored (newer), and it never boots a
VM whose persisted allowlist it could not confirm. It is a strict
improvement on macOS too (today's code writes unconditionally and
never verifies).
The persisted-row check confirms our write took, not that smolvm's runtime TSI enforces it. The runtime guarantee is covered by the sandbox-escape acceptance test; the persisted check is the cheap fail-closed guard at launch.
Per-bottle loopback scoping on Linux (allocate)
allocate runs the same docker-state-driven allocation on Linux as on
macOS (_allocate_locked, the file lock, and _aliases_in_use via
docker inspect are all already cross-platform). The only macOS-only
step, ensure_pool (the sudo ifconfig lo0 alias dance), stays
macOS-only: on Linux 127.0.0.0/8 is already loopback, so docker can
publish bundle ports directly on 127.0.0.<N> with no setup.
Net effect: Linux bottles get per-bottle 127.0.0.16..31/32 scoping
identical to macOS, without sudo.
Launch flow
launch.py needs no structural change — _allocate_resources already
calls ensure_pool() (now a Linux no-op) then allocate() (now
per-bottle on Linux), and _launch_vm already calls
force_allowlist() (now active on Linux). Only the macOS-specific
docstrings are updated to describe the cross-platform behavior.
Implementation chunks
- Preflight KVM gate —
util.smolmachines_preflight+ unit tests for the missing-device and no-access branches. - Platform-aware DB path + fail-closed
force_allowlist—loopback_alias.py; update/extendTestForceAllowlist. - Cross-platform
allocate— drop the Linux early-return; updateTestAllocate/TestAllocateLockfor the new Linux behavior. - Docstring + comment cleanup in
launch.pyand module headers. - Docs — README requirements + a Linux/NixOS host-setup section.
Testing Strategy
- Unit (CI, any OS): the suite mocks
platform.system()/subprocessand patches_SMOLVM_DB_PATH, so the new Linux branches are testable on the macOS/Linux CI runner withoutsmolvmor KVM. Covers: KVM preflight branches, fail-closedforce_allowlist(DB missing, row missing, patch-doesn't-take), per-bottle Linux allocation + locking, platform-derived DB path. - Integration (Linux host with KVM — the acceptance gate):
tests/integration/test_sandbox_escape.pyagainstBOT_BOTTLE_BACKEND=smolmachines. This cannot run on the macOS dev box and must be executed on NixOS before merge.
Open questions / verification pending
- Confirm the Linux smolvm state-DB path and schema on a real
install (the
~/.local/share/...inference above). - Confirm whether the current smolvm Linux build still drops
--allow-cidrwith--from(the 0.8.0 bug). The fail-closed design handles either answer, but knowing lets us drop the DB patch if upstream fixed it. - Confirm docker publishing to
127.0.0.<N>on Linux behaves as expected end-to-end with TSI (high confidence; standard loopback behavior, but unverified on the target host).
References
- PRD 0023 — smolmachines bottle backend (macOS v1).
- PRD 0022 —
test_sandbox_escape.pyacceptance gate. - PRD 0024 — sidecar bundle image.
- smolvm: https://github.com/smol-machines/smolvm