feat(smolmachines): run backend on Linux
Port the smolmachines backend so BOT_BOTTLE_BACKEND=smolmachines works on Linux (KVM), not just macOS: - Preflight gates /dev/kvm presence + accessibility on Linux with actionable remediation (kvm module, kvm group). - smolvm state-DB path is platform-derived (XDG on Linux). - force_allowlist runs on both platforms and is fail-closed: it verifies the persisted TSI allowlist and dies rather than booting a VM whose egress confinement it can't confirm. Previously it no-oped on Linux, failing OPEN. - allocate() does per-bottle 127.0.0.<N> scoping on Linux too (no ifconfig needed — all of 127/8 is already loopback); only ensure_pool's lo0 aliasing stays macOS-only. - README documents Linux + NixOS host setup. Linux/KVM integration (the sandbox-escape acceptance gate) is pending verification on a NixOS host; unit tests cover the new platform branches. Issue: #283 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
This commit is contained in:
@@ -141,10 +141,12 @@ def _allocate_resources(
|
||||
) -> tuple[str, str]:
|
||||
"""Reserve a loopback alias and create the per-bottle docker bridge.
|
||||
|
||||
macOS only routes 127.0.0.1 by default; the per-bottle alias
|
||||
scopes TSI's allowlist to this bottle's published ports so the
|
||||
agent can't reach other bottles' or host services' ports on
|
||||
loopback. No-op on Linux."""
|
||||
The per-bottle alias scopes TSI's allowlist to this bottle's
|
||||
published ports so the agent can't reach other bottles' or host
|
||||
services' ports on loopback. On macOS `ensure_pool` first
|
||||
sudo-aliases the pool on `lo0`; on Linux that's a no-op since
|
||||
all of 127.0.0.0/8 is already loopback, but the per-bottle
|
||||
allocation runs on both."""
|
||||
_loopback.ensure_pool()
|
||||
loopback_ip = _loopback.allocate(plan.slug)
|
||||
network = _bundle.bundle_network_name(plan.slug)
|
||||
@@ -190,9 +192,11 @@ def _discover_urls(
|
||||
return the plan with URLs + guest_env stamped in.
|
||||
|
||||
Docker container IPs (192.168.x.x in the daemon's bridge)
|
||||
aren't reachable from the smolvm guest on macOS — TSI uses
|
||||
macOS networking, and macOS sees the daemon's bridge via the
|
||||
published-port loopback forward only.
|
||||
aren't reachable from the smolvm guest — TSI proxies the
|
||||
guest's connects through the host, and the host reaches the
|
||||
bundle only via its published-port loopback forward (the
|
||||
daemon's bridge isn't on the TSI allowlist). The agent dials
|
||||
the published port on the per-bottle loopback alias.
|
||||
|
||||
NO_PROXY includes the per-bottle loopback alias so the
|
||||
supervise + git-gate URLs bypass HTTPS_PROXY."""
|
||||
@@ -252,10 +256,11 @@ def _launch_vm(
|
||||
"""Create, patch, and start the smolvm VM; register teardown.
|
||||
|
||||
--allow-cidr is the per-bottle loopback alias so the guest can
|
||||
only reach this bottle's bundle ports. force_allowlist patches
|
||||
smolvm 0.8.0's silent-drop of --allow-cidr when combined with
|
||||
--from. Smolfile isn't usable here — smolvm 0.8.0 makes --from
|
||||
and --smolfile mutually exclusive."""
|
||||
only reach this bottle's bundle ports. force_allowlist then
|
||||
confirms the allowlist persisted (patching smolvm 0.8.0's
|
||||
silent-drop of --allow-cidr when combined with --from) and
|
||||
fails closed if it can't. Smolfile isn't usable here — smolvm
|
||||
0.8.0 makes --from and --smolfile mutually exclusive."""
|
||||
_smolvm.machine_create(
|
||||
plan.machine_name,
|
||||
from_path=agent_from_path,
|
||||
@@ -263,9 +268,10 @@ def _launch_vm(
|
||||
env=plan.guest_env,
|
||||
)
|
||||
stack.callback(_smolvm.machine_delete, plan.machine_name)
|
||||
# Workaround smolvm 0.8.0: `--allow-cidr` is silently dropped
|
||||
# when combined with `--from`. Patch the persisted state DB
|
||||
# before start so the booted VM's TSI actually enforces.
|
||||
# Confirm the booted VM's TSI allowlist will actually enforce the
|
||||
# /32 before start (smolvm 0.8.0 silently drops `--allow-cidr`
|
||||
# with `--from`, so the persisted state DB is patched if needed).
|
||||
# Fails closed if enforcement can't be confirmed.
|
||||
_loopback.force_allowlist(plan.machine_name, [f"{loopback_ip}/32"])
|
||||
_smolvm.machine_start(plan.machine_name)
|
||||
stack.callback(_smolvm.machine_stop, plan.machine_name)
|
||||
@@ -275,7 +281,9 @@ def _init_vm(plan: SmolmachinesBottlePlan) -> None:
|
||||
"""Repair filesystem ownership and wait for exec channel readiness.
|
||||
|
||||
Ownership repair: smolvm's pack process remaps files to the host
|
||||
invoker's uid (501 on macOS). /home/node must be node:node so
|
||||
invoker's uid (e.g. 501 on macOS, 1000 on Linux). The chowns use
|
||||
names not numbers so they're correct on either. /home/node must
|
||||
be node:node so
|
||||
Claude Code can write ~/.claude.json; /tmp + /var/tmp need root
|
||||
mode 1777 so non-root processes can create per-uid scratch dirs.
|
||||
All folded into one sh -c to avoid back-to-back exec calls
|
||||
|
||||
@@ -33,10 +33,13 @@ sudo-add the missing pool on first use per boot — the aliases
|
||||
persist on `lo0` until reboot, so subsequent launches don't
|
||||
prompt.
|
||||
|
||||
Linux native daemons share the host's network namespace; the
|
||||
whole `127.0.0.0/8` is reachable by default and aliases are
|
||||
unnecessary. The pool logic detects native-Linux and skips sudo
|
||||
entirely; the DB patch is also gated on macOS.
|
||||
On Linux the whole `127.0.0.0/8` is already routed to `lo`, so
|
||||
docker can publish a bundle's ports directly on `127.0.0.<N>`
|
||||
with no `ifconfig`/sudo step. `ensure_pool` is therefore a no-op
|
||||
on Linux, but per-bottle alias *allocation* and the TSI allowlist
|
||||
DB patch run on both platforms — the isolation property is
|
||||
identical, it's just cheaper to set up on Linux. The state-DB
|
||||
path differs per platform (see `_smolvm_db_path`).
|
||||
|
||||
Allocation is coordinated by inspecting running bundle
|
||||
containers' published host IPs — each bottle's bundle owns the
|
||||
@@ -47,6 +50,7 @@ from __future__ import annotations
|
||||
|
||||
import fcntl
|
||||
import json
|
||||
import os
|
||||
import platform
|
||||
import re
|
||||
import sqlite3
|
||||
@@ -57,20 +61,34 @@ from typing import Iterable
|
||||
from ...log import die, info
|
||||
|
||||
|
||||
# smolvm's persistent VM state on macOS — a SQLite DB whose `vms`
|
||||
# table holds one JSON BLOB per machine. The Linux path is
|
||||
# different, but smolmachines is macOS-only in v1 (PRD 0023) so
|
||||
# we hard-code this. If the file moves under us we'll see a
|
||||
# clear FileNotFoundError; not worth defensive cross-platform
|
||||
# detection until the backend actually needs Linux.
|
||||
_SMOLVM_DB_PATH = (
|
||||
Path.home()
|
||||
/ "Library"
|
||||
/ "Application Support"
|
||||
/ "smolvm"
|
||||
/ "server"
|
||||
/ "smolvm.db"
|
||||
)
|
||||
def _smolvm_db_path() -> Path:
|
||||
"""smolvm's persistent VM state — a SQLite DB whose `vms` table
|
||||
holds one JSON BLOB per machine. macOS stores it under
|
||||
`Application Support`; Linux follows the XDG base-dir spec
|
||||
(`$XDG_DATA_HOME`, default `~/.local/share`).
|
||||
|
||||
NOTE: the Linux location is inferred from smolvm's documented
|
||||
`~/.local/share` install layout and must be confirmed against a
|
||||
real Linux smolvm install. If it's wrong, `force_allowlist`'s
|
||||
fail-closed check turns it into a clear launch-time error rather
|
||||
than a silent escape."""
|
||||
if platform.system() == "Darwin":
|
||||
return (
|
||||
Path.home()
|
||||
/ "Library"
|
||||
/ "Application Support"
|
||||
/ "smolvm"
|
||||
/ "server"
|
||||
/ "smolvm.db"
|
||||
)
|
||||
xdg_data = os.environ.get("XDG_DATA_HOME")
|
||||
base = Path(xdg_data) if xdg_data else Path.home() / ".local" / "share"
|
||||
return base / "smolvm" / "server" / "smolvm.db"
|
||||
|
||||
|
||||
# Resolved once at import: the host platform doesn't change within a
|
||||
# process. Tests patch this attribute directly.
|
||||
_SMOLVM_DB_PATH = _smolvm_db_path()
|
||||
|
||||
|
||||
# Sixteen aliases by default. Tunable for hosts that want more
|
||||
@@ -131,51 +149,74 @@ def ensure_pool() -> None:
|
||||
|
||||
|
||||
def force_allowlist(machine_name: str, allowed_cidrs: list[str]) -> None:
|
||||
"""Patch smolvm's persistent VM-state DB to set the machine's
|
||||
`allowed_cidrs` to the given list. Workaround for smolvm
|
||||
0.8.0's silent-drop of `--allow-cidr` when used with `--from`.
|
||||
"""Ensure the machine's persisted TSI allowlist equals
|
||||
`allowed_cidrs`, failing **closed** if that can't be confirmed.
|
||||
|
||||
Must run AFTER `smolvm machine create` (the row has to
|
||||
exist) and BEFORE `smolvm machine start` (smolvm reads the
|
||||
row on start; in-flight VMs don't pick up changes). Once
|
||||
smolvm honors the CLI flag upstream this whole function is
|
||||
redundant — flag-respecting create + remove this call from
|
||||
launch.
|
||||
Runs on both macOS and Linux. It exists because smolvm 0.8.0
|
||||
silently drops `--allow-cidr` when combined with `--from`, so
|
||||
the allowlist has to be written into smolvm's persistent state
|
||||
DB before `machine start`. Rather than assume the flag was
|
||||
dropped, we read the persisted row and only patch when it
|
||||
doesn't already match — so a newer smolvm that honors the flag
|
||||
is left untouched.
|
||||
|
||||
No-op on non-macOS — the DB path differs and the Linux
|
||||
smolmachines code path isn't exercised in v1."""
|
||||
if not _is_macos():
|
||||
return
|
||||
Must run AFTER `smolvm machine create` (the row has to exist)
|
||||
and BEFORE `smolvm machine start` (smolvm reads the row on
|
||||
start; in-flight VMs don't pick up changes).
|
||||
|
||||
Fail-closed: if the state DB is missing, the row is missing, or
|
||||
the allowlist still doesn't match after patching, we `die()`
|
||||
rather than boot a VM whose egress confinement we can't verify
|
||||
— an unconfirmed allowlist is a sandbox-escape risk (the agent
|
||||
VM could reach all of host loopback)."""
|
||||
want = list(allowed_cidrs)
|
||||
if not _SMOLVM_DB_PATH.is_file():
|
||||
die(
|
||||
f"smolvm state DB not found at {_SMOLVM_DB_PATH}. "
|
||||
f"smolvm 0.8.0 expected? `smolvm --version` to check."
|
||||
f"smolvm state DB not found at {_SMOLVM_DB_PATH}; cannot "
|
||||
f"confirm the TSI allowlist is enforced. Refusing to launch "
|
||||
f"(fail-closed). Check `smolvm --version` and the DB "
|
||||
f"location for your platform."
|
||||
)
|
||||
con = sqlite3.connect(str(_SMOLVM_DB_PATH))
|
||||
try:
|
||||
cur = con.cursor()
|
||||
row = cur.execute(
|
||||
"SELECT data FROM vms WHERE name = ?", (machine_name,),
|
||||
).fetchone()
|
||||
if row is None:
|
||||
die(
|
||||
f"smolvm DB has no row for machine {machine_name!r} — "
|
||||
f"machine_create must run before force_allowlist."
|
||||
cfg = _read_machine_cfg(con, machine_name)
|
||||
if cfg.get("allowed_cidrs") != want:
|
||||
cfg["allowed_cidrs"] = want
|
||||
# Write as BLOB (the column type smolvm uses) — passing a
|
||||
# plain str makes sqlite store it as Text and smolvm then
|
||||
# fails to read it.
|
||||
con.execute(
|
||||
"UPDATE vms SET data = ? WHERE name = ?",
|
||||
(sqlite3.Binary(json.dumps(cfg).encode()), machine_name),
|
||||
)
|
||||
con.commit()
|
||||
cfg = _read_machine_cfg(con, machine_name)
|
||||
if cfg.get("allowed_cidrs") != want:
|
||||
die(
|
||||
f"could not enforce TSI allowlist {want!r} for machine "
|
||||
f"{machine_name!r} (persisted value is "
|
||||
f"{cfg.get('allowed_cidrs')!r}). Refusing to launch "
|
||||
f"(fail-closed)."
|
||||
)
|
||||
cfg = json.loads(row[0])
|
||||
cfg["allowed_cidrs"] = list(allowed_cidrs)
|
||||
# Write as BLOB (the column type smolvm uses) — passing a
|
||||
# plain str makes sqlite store it as Text and smolvm then
|
||||
# fails to read it.
|
||||
cur.execute(
|
||||
"UPDATE vms SET data = ? WHERE name = ?",
|
||||
(sqlite3.Binary(json.dumps(cfg).encode()), machine_name),
|
||||
)
|
||||
con.commit()
|
||||
finally:
|
||||
con.close()
|
||||
|
||||
|
||||
def _read_machine_cfg(con: sqlite3.Connection, machine_name: str) -> dict[str, object]:
|
||||
"""Read + JSON-decode a machine's `data` BLOB from the smolvm
|
||||
state DB. Dies (fail-closed) if the row is missing — the caller
|
||||
can't confirm enforcement without it."""
|
||||
row = con.execute(
|
||||
"SELECT data FROM vms WHERE name = ?", (machine_name,),
|
||||
).fetchone()
|
||||
if row is None:
|
||||
die(
|
||||
f"smolvm DB has no row for machine {machine_name!r} — "
|
||||
f"machine_create must run before force_allowlist."
|
||||
)
|
||||
return json.loads(row[0])
|
||||
|
||||
|
||||
def allocate(_slug: str) -> str:
|
||||
"""Pick the lowest-numbered alias from the pool not already
|
||||
in use by a running smolmachines bundle. Bails when the pool
|
||||
@@ -184,16 +225,17 @@ def allocate(_slug: str) -> str:
|
||||
used (no on-disk reservation, allocation is purely
|
||||
docker-state-driven).
|
||||
|
||||
On non-macOS the whole `127.0.0.0/8` is loopback by default;
|
||||
`127.0.0.1` is fine to share and we skip the alias dance.
|
||||
This still returns a deterministic address so launch.py's
|
||||
callers don't have to branch on platform.
|
||||
Runs on both platforms: the allocation logic (docker-state
|
||||
inspection + the file lock) is platform-independent. macOS
|
||||
needs `ensure_pool` to have aliased the addresses on `lo0`
|
||||
first; on Linux all of `127.0.0.0/8` is already loopback, so
|
||||
docker can publish on the chosen `127.0.0.<N>` with no setup.
|
||||
Per-bottle scoping (so the agent can't reach other bottles' or
|
||||
host services' loopback ports) therefore holds on both.
|
||||
|
||||
An exclusive file lock serialises concurrent calls so two
|
||||
simultaneous launches don't read the same docker state and
|
||||
claim the same alias."""
|
||||
if not _is_macos():
|
||||
return "127.0.0.1"
|
||||
_ALLOC_LOCK_PATH.parent.mkdir(parents=True, exist_ok=True)
|
||||
with open(_ALLOC_LOCK_PATH, "w", encoding="utf-8") as lf:
|
||||
fcntl.flock(lf, fcntl.LOCK_EX)
|
||||
|
||||
@@ -5,26 +5,58 @@ unit-tested without importing the docker subprocess paths."""
|
||||
from __future__ import annotations
|
||||
|
||||
import hashlib
|
||||
import os
|
||||
import platform
|
||||
import shutil
|
||||
|
||||
from ...log import die
|
||||
|
||||
# libkrun's Linux backend drives the guest through KVM, so the host
|
||||
# must expose `/dev/kvm` and the invoking user must be able to open
|
||||
# it. macOS uses Hypervisor.framework and needs no device node.
|
||||
_KVM_DEVICE = "/dev/kvm"
|
||||
|
||||
|
||||
def smolmachines_preflight() -> None:
|
||||
"""Ensure `smolvm` is on PATH before the launch flow runs.
|
||||
Called from `_resolve_plan`; gives the operator a clear
|
||||
install pointer rather than a cryptic FileNotFoundError
|
||||
later. `gvproxy` is no longer required — see the PRD's design
|
||||
pivot section."""
|
||||
if shutil.which("smolvm") is not None:
|
||||
return
|
||||
die(
|
||||
"BOT_BOTTLE_BACKEND=smolmachines requires `smolvm` on "
|
||||
"PATH. Install with: "
|
||||
"curl -sSL https://smolmachines.com/install.sh | sh. "
|
||||
"To use the legacy Docker backend instead, set "
|
||||
"BOT_BOTTLE_BACKEND=docker or pass --backend=docker."
|
||||
)
|
||||
"""Ensure the host can run the smolmachines backend before the
|
||||
launch flow starts. Called from `_resolve_plan`; surfaces a
|
||||
clear, actionable error instead of a cryptic `smolvm` failure
|
||||
deep in launch.
|
||||
|
||||
Checks `smolvm` is on PATH (both platforms) and, on Linux,
|
||||
that `/dev/kvm` exists and is accessible. `gvproxy` is no
|
||||
longer required — see the PRD's design pivot section."""
|
||||
if shutil.which("smolvm") is None:
|
||||
die(
|
||||
"BOT_BOTTLE_BACKEND=smolmachines requires `smolvm` on "
|
||||
"PATH. Install with: "
|
||||
"curl -sSL https://smolmachines.com/install.sh | sh. "
|
||||
"To use the legacy Docker backend instead, set "
|
||||
"BOT_BOTTLE_BACKEND=docker or pass --backend=docker."
|
||||
)
|
||||
if platform.system() == "Linux":
|
||||
_preflight_kvm()
|
||||
|
||||
|
||||
def _preflight_kvm() -> None:
|
||||
"""Linux-only: libkrun needs `/dev/kvm`. Distinguish 'KVM not
|
||||
enabled' from 'no permission' so the operator knows which to
|
||||
fix."""
|
||||
if not os.path.exists(_KVM_DEVICE):
|
||||
die(
|
||||
f"BOT_BOTTLE_BACKEND=smolmachines needs {_KVM_DEVICE} on "
|
||||
"Linux but it is missing. Enable KVM: load the kvm-intel "
|
||||
"or kvm-amd kernel module (and confirm virtualization is "
|
||||
"enabled in BIOS/firmware). To use the legacy Docker "
|
||||
"backend instead, set BOT_BOTTLE_BACKEND=docker."
|
||||
)
|
||||
if not os.access(_KVM_DEVICE, os.R_OK | os.W_OK):
|
||||
die(
|
||||
f"{_KVM_DEVICE} exists but is not readable/writable by the "
|
||||
"current user. Add your user to the `kvm` group "
|
||||
"(`sudo usermod -aG kvm \"$USER\"`) and re-login, or run "
|
||||
"with access to the device."
|
||||
)
|
||||
|
||||
|
||||
def smolmachines_bundle_subnet(slug: str) -> tuple[str, str, str]:
|
||||
|
||||
Reference in New Issue
Block a user