Commit Graph

11 Commits

Author SHA1 Message Date
didericis-claude b3c6d66850 style(smolmachines): address PR #83 review comments
test / unit (push) Successful in 27s
test / integration (push) Successful in 42s
- bottle.py:_PTY_RESIZE_SCRIPT docstring: strip the speculative
  cwd-dependence explanation. The real reason to use absolute
  path is just that the wrapper is self-contained; the original
  rationale (tmux pane cwd) was a hypothesis we never confirmed
  and wasn't load-bearing once we found the libkrun race.

- pty_resize.py:main: drop the long comment duplicating
  `_STARTUP_SYNC_DELAY_SEC`'s docstring. Keep a one-liner
  pointing at the constant + the operational note about
  daemon=True.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 21:05:33 -04:00
didericis-claude 794e8666e1 fix(smolmachines): invoke pty_resize by absolute path, not python -m
test / unit (pull_request) Successful in 25s
test / integration (pull_request) Successful in 40s
The dashboard's launch path crashed inside tmux but worked
outside it. Root cause: `python -m
claude_bottle.backend.smolmachines.pty_resize` needs the
`claude_bottle` package on `sys.path`, which by default comes
from cwd. The outside-tmux path is `subprocess.run(...)` —
inherits the dashboard process's cwd (the repo root, where
`claude_bottle/` lives), so the import resolves. The
inside-tmux path is `tmux split-window / respawn-pane <argv>`,
and tmux opens the new pane with the pane's OWN cwd, not the
cwd of the process invoking split-window. If the operator
started their tmux pane anywhere outside the repo (typical:
`$HOME`), the wrapper hit `ModuleNotFoundError: No module
named 'claude_bottle'` and tmux closed the pane immediately.

Sidestep the cwd dependence by invoking the wrapper as
`python <absolute-path-to-pty_resize.py>` instead of
`python -m <dotted-path>`. The wrapper has no
`claude_bottle.*` imports — it's stdlib-only — so it runs as
a standalone script anywhere on the filesystem. The absolute
path comes from `pty_resize.__file__` at module-load time.

Tests:
- `test_pty_resize_wrapper_prefix`: updated to assert the
  absolute-script-path shape rather than the `-m <dotted>`
  shape.
- `test_no_wrapper_when_tty_false`: the substring check now
  uses `any("pty_resize" in a for a in argv)` instead of
  string-joining (so the absolute path's "pty_resize.py"
  filename match still catches a regression).

636 unit tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 20:26:42 -04:00
didericis-claude 3fb305f654 fix(smolmachines): bridge host SIGWINCH into the VM PTY (issue #82)
test / unit (pull_request) Successful in 28s
test / integration (pull_request) Successful in 41s
`smolvm 0.8.0 machine exec -t` allocates an in-VM PTY but never
forwards the host terminal's window size — the PTY starts at
`0 0` and host resizes (tmux pane resize, terminal window
resize) go unnoticed, so the claude TUI inside a smolmachines
bottle renders for whatever tiny box it last saw and ignores
operator resizes. `docker exec -it` propagates window-size
changes automatically; smolvm doesn't.

Workaround: a small Python wrapper
(`backend/smolmachines/pty_resize.py`) that interposes between
the operator's terminal and `smolvm machine exec`. It spawns
smolvm as a child, traps host SIGWINCH, and on every resize
(plus once at startup) runs a side-channel
`smolvm machine exec --name <M> -- sh -c 'for f in /dev/pts/*;
do stty -F $f cols X rows Y; done'`. The kernel delivers
SIGWINCH to the in-VM foreground process group when the slave
PTY's size changes, so claude picks up the new dimensions
without extra signalling.

`SmolmachinesBottle.claude_argv` prepends
`[sys.executable, -m, claude_bottle.backend.smolmachines.
pty_resize, <machine>, --, ...]` to the existing smolvm argv
in TTY mode. Non-TTY mode (provisioning shell-outs) skips the
wrapper — no PTY to resize.

The wrapper survives the dashboard's
`_build_resume_argv_with_fallback` shell-wrap because the
split-at-`claude` token still finds the right position — the
wrapper's prefix wraps the entire smolvm-exec framing.

Tests:
- `test_smolmachines_pty_resize.py` (new): argv parsing, the
  side-channel command shape (cols/rows / for-loop over
  /dev/pts/*), and `_read_winsize`'s fallback across
  stdin/stdout/stderr including the smolvm-allocated-PTY-
  reports-`0 0` ironic case.
- `test_smolmachines_bottle.py`: updated TTY-mode assertions
  to unwrap the pty_resize prefix; added `TestClaudeArgvNoTTY`
  to lock the non-TTY skip.

636 unit tests pass.

Removable when smolvm grows native SIGWINCH forwarding.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 20:15:11 -04:00
didericis-claude 3103266053 fix(dashboard): hoist claude_argv to Bottle ABC so smolmachines pane attach works
test / unit (pull_request) Successful in 27s
test / integration (pull_request) Successful in 42s
test / unit (push) Successful in 26s
test / integration (push) Successful in 45s
Launching a smolmachines agent from the dashboard inside tmux
crashed with

  AttributeError: 'SmolmachinesBottle' object has no attribute
  'claude_docker_argv'

because the tmux pane-respawn path called
`bottle.claude_docker_argv(...)` directly — a method that only
existed on DockerBottle. The foreground-handoff path (curses
endwin → subprocess.run → restore) doesn't hit it; it goes
through `bottle.exec_claude` which is on the ABC.

- Move the argv builder onto the `Bottle` ABC as
  `claude_argv(argv, *, tty=True) -> list[str]`. Both backends
  implement it; both `exec_claude` impls collapse to
  `subprocess.run(self.claude_argv(argv, tty=tty), check=False)`.

- DockerBottle: rename `claude_docker_argv` → `claude_argv`,
  body unchanged.

- SmolmachinesBottle: extract the argv-building from
  `exec_claude` into `claude_argv`; the new method returns the
  full `smolvm machine exec --name … -- runuser -u node --
  claude …` argv. The `runuser` switch lives on the
  exec-framing prefix so the dashboard's
  `_build_resume_argv_with_fallback` split-at-"claude" trick
  keeps the UID switch when wrapping the claude tail in
  `sh -c "… --continue || …"`.

- Dashboard: drop the docker-specific wording — local + helper
  arg names `docker_argv` → `claude_argv`; docstrings on
  `_build_resume_argv_with_fallback`, `_build_split_pane_argv`,
  `_build_respawn_pane_argv` now say "backend-exec argv". The
  shell-fallback wrap is unchanged; the existing logic works
  for smolmachines because `claude` is still the marker token.

Tests:
- `tests/unit/test_smolmachines_bottle.py` (new): locks down
  the smolmachines argv shape — prompt-file flag injection,
  guest-env `-e K=V` forwarding, TTY toggle, runuser-precedes-
  claude invariant.
- `test_docker_bottle.py`: TestClaudeDockerArgv →
  TestClaudeArgv; method renames follow.
- `test_dashboard_active_agents.py`: docstring follow.

615 unit tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 19:52:02 -04:00
didericis-claude 91955ec59f fix(smolmachines): forward guest env on every exec + chown /home/node
test / unit (pull_request) Successful in 26s
test / integration (pull_request) Successful in 40s
Two issues kept claude's TUI from drawing after launch:

1. smolvm pack remaps OCI-layer ownership to the host invoker's
   uid (501 on macOS) instead of preserving the image's
   USER node (uid 1000). /home/node ends up owned by some uid
   that doesn't exist in the VM, so when claude runs as node it
   can't appendFileSync to ~/.claude.json on startup — fails
   with ENOENT and the TUI hangs. Fix: chown -R node:node
   /home/node after machine_start, before provision.

2. smolvm machine_create -e sets env on PID 1 but it doesn't
   propagate to fresh exec process trees (verified empirically:
   `smolvm machine exec -- printenv` shows none of the
   machine_create env vars). Claude was running with no
   HTTPS_PROXY / CLAUDE_CODE_OAUTH_TOKEN / NODE_EXTRA_CA_CERTS,
   so even the auth-validation step bailed silently. Fix:
   thread `guest_env` through to the SmolmachinesBottle handle
   and re-pass every entry via `-e K=V` on every machine_exec
   call (interactive claude and shell exec both).

Also fills in the same `CLAUDE_CODE_OAUTH_TOKEN=egress-
placeholder` + telemetry-off env the docker backend's
forwarded_env carries, plus the NODE_EXTRA_CA_CERTS /
SSL_CERT_FILE / REQUESTS_CA_BUNDLE trust trio.

Verified end-to-end on Docker Desktop / macOS: claude's TUI
renders cleanly with the bypass-permissions banner.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 15:18:21 -04:00
didericis-claude 35edf50f21 fix(smolmachines): drop runuser -l in favor of UID switch + explicit HOME/USER
test / unit (pull_request) Successful in 27s
test / integration (pull_request) Successful in 39s
Interactive claude session hung silently after
`attaching interactive claude session...` — `runuser -l` invokes
a login shell that triggers PAM session setup / /etc/profile
sourcing, and the minimal Debian agent VM doesn't have the PAM
config files for that to complete cleanly. claude never got to
draw its TUI.

Switch UID via plain `runuser -u <user> --` (no `-l`) and inject
HOME / USER through `smolvm machine exec -e` so the child
process sees them. Avoids login-shell wiring entirely. Same
pattern in `exec_claude` and `exec(script)`.

`_HOME_FOR` maps the two users the codebase currently asks for
(`node`, `root`); anything else falls back to `/home/<user>`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 15:03:51 -04:00
didericis-claude af65c10361 refactor: Bottle.exec takes a user= kwarg, default node
test / unit (pull_request) Successful in 26s
test / integration (pull_request) Successful in 41s
Promote the user-switch from a hardcoded `node` to a keyword arg
so callers can opt into root (or any other user) when needed.
Default stays `node` — matches the docker image's USER and the
smolmachines runuser default.

Lifts the change through the base ABC, docker, and smolmachines
backends:
- Base: `def exec(self, script, *, user="node")`.
- Docker: adds `-u <user>` to `docker exec` (no-op when user is
  node, the image's default).
- Smolmachines: `runuser -l <user> -c <script>` — `runuser -l
  root` is the trivial no-op form when the caller asked for root.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 15:00:13 -04:00
didericis-claude e26d459a97 fix(smolmachines): run claude + shell exec as the node user
test / unit (pull_request) Successful in 26s
test / integration (pull_request) Successful in 44s
`smolvm machine exec` runs commands as root in the VM, but the
agent image's USER is `node`. claude-code refuses
`--dangerously-skip-permissions` when invoked as root, killing
the interactive session right after `attaching interactive claude
session...`:

  --dangerously-skip-permissions cannot be used with root/sudo
    privileges for security reasons

Wrap both `exec_claude` and `exec(script)` in
`runuser -l node -c ...` so commands run as the node user with
node's $HOME / $USER (login shell). The docker backend gets
this behavior for free via the image's USER directive; this
restores parity.

shlex-quote each claude argv element when stitching the runuser
-c shell command so paths / flags with shell-special chars
survive the parse.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 14:58:34 -04:00
didericis-claude 9e3b7e441e feat(smolmachines): provision_prompt + provision_skills (PRD 0023 chunk 4a)
test / unit (pull_request) Successful in 21s
test / integration (pull_request) Successful in 43s
First slice of chunk 4: implement the two provisioning methods
that don't depend on agent-image tooling beyond `cp` and
`mkdir`. provision_ca / provision_git / provision_supervise
land once the agent-image gap is solved (chunk 4b+) — they need
update-ca-certificates, git, and the claude binary respectively,
none of which the chunk-2d alpine placeholder provides.

What this PR ships:

- `claude_bottle/backend/smolmachines/provision/` subpackage
  with `prompt.py` + `skills.py`. Each routes through
  `smolvm.machine_cp` / `machine_exec`. provision_prompt mirrors
  the docker contract (file always copied; return value drives
  --append-system-prompt-file iff the agent has a non-empty
  prompt). provision_skills mkdir + cp per skill, matching
  the docker backend's loop.
- prepare.py now writes the prompt file under
  agent_state_dir(slug) with the agent's `prompt` body, mode
  0o600. The in-guest path is `/root/.claude-bottle-prompt.txt`
  (alpine has no `node` user; will become `/home/node/...` once
  the real claude-bottle image lands).
- launch.py calls `provision(plan, machine_name)` after
  machine_start. The returned prompt path threads to
  SmolmachinesBottle so exec_claude can add
  --append-system-prompt-file when the agent has a prompt.
- backend.py: provision_prompt / provision_skills now real;
  provision_git is a deliberate stub (waiting on the git-gate
  inner Plan + git in the agent image). provision_supervise
  stays the chunk-2d stub.

Tests:
- 7 new unit cases (test_smolmachines_provision.py): argv
  shape (mocked smolvm.machine_cp / .machine_exec),
  prompt return-value contract, no-op-with-no-skills,
  CLAUDE_BOTTLE_GUEST_SKILLS_DIR override, fail-on-missing-skill.
- 1 new integration case in test_smolmachines_launch.py:
  end-to-end verification that the prompt file lands in the
  alpine guest at /root/.claude-bottle-prompt.txt with the
  expected content (via `bottle.exec("cat ...")`). The smoke +
  the two TSI probes stay green.

552 unit + 4 integration (Darwin+smolvm+docker gated) passing.

What's left in chunk 4:
- 4b: thread the inner Plans (PipelockProxyPlan / EgressPlan /
  GitGatePlan / SupervisePlan) through prepare + launch so the
  bundle daemons actually run (currently daemons_csv="").
- 4c: the agent-image-conversion gap — get claude-code + git +
  curl + ca-certificates into the guest image (build a
  .smolmachine via `pack create --from-vm` after manual setup,
  or push the docker image to a registry smolvm can pull).
- 4d: provision_ca + provision_git + provision_supervise once
  4b + 4c land.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 05:08:17 -04:00
didericis-claude 9f65b137b9 feat(smolmachines): end-to-end launch + Bottle.exec + smoke + probes (PRD 0023 chunk 2d)
test / unit (pull_request) Successful in 21s
test / integration (pull_request) Successful in 41s
test / unit (push) Successful in 22s
test / integration (push) Successful in 41s
End-to-end launch flow for the smolmachines backend. Brings up
the per-bottle docker bridge + sidecar bundle, creates and
starts the smolvm guest pointed at the bundle's pinned IP via
TSI's `--allow-cidr <bundle-ip>/32`, yields a SmolmachinesBottle
handle that routes exec/cp through `smolvm machine exec / cp`,
tears everything down on context exit.

launch.py:
- ExitStack-managed: create_bundle_network → start_bundle →
  machine_create → machine_start (each registered for reverse
  teardown).
- daemons_csv="" for chunk 2d — bundle init logs "no daemons
  selected" and idles. Real daemon bringup with inner-Plan-driven
  env + volumes lands in chunk 4.

bottle.py:
- SmolmachinesBottle.exec → smolvm.machine_exec (captured).
- SmolmachinesBottle.exec_claude → direct subprocess.run with
  inherited TTY for interactive sessions.
- SmolmachinesBottle.cp_in → smolvm.machine_cp.

Architecture pivots forced by smolvm 0.8.0's CLI shape:
1. `--from <smolmachine>` and `--smolfile <toml>` are MUTUALLY
   EXCLUSIVE in smolvm 0.8.0. We need --from to avoid the
   registry-pull race that bit us on machine_start (libkrun
   agent's network attempt got refused by macOS with
   "connect: permission denied" on IPv6). So Smolfile is dropped
   entirely; per-bottle env + allow_cidrs flow as CLI flags
   (`--allow-cidr CIDR`, `-e K=V`) directly to machine_create.
2. `smolvm pack create --image` doesn't pull from the local
   docker daemon — only OCI registries via crane. The real
   claude-bottle:latest image lives in the local docker daemon
   and isn't reachable that way. Chunk 2d ships with an alpine
   placeholder; the agent-image-conversion gap belongs to
   chunk 4 (push the image to a registry, or smolvm grows a
   docker-daemon transport).

Other changes:
- machine_create grew `image=` / `from_path=` / `allow_cidrs=`
  / `env=` kwargs; smolfile= dropped.
- bottle_plan: smolfile_path → agent_from_path + guest_env.
- prepare: pack_create against `alpine:latest`, cached under
  ~/.cache/claude-bottle/smolmachines/ keyed by image ref.
- Deleted smolfile.py + test_smolfile.py (dead code now).

Tests:
- Unit: 540 passing (smolvm wrapper grew 4 new flag forms; one
  test renamed to reflect --from + --allow-cidr + -e combo).
- Integration: 3 new cases in tests/integration/
  test_smolmachines_launch.py, gated on Darwin + smolvm on PATH
  + docker + not GITEA_ACTIONS:
    * smoke: bottle.exec("echo hello-from-vm") round-trips with
      the correct stdout + returncode.
    * localhost-reach probe: agent dials 127.0.0.1:9 → connect
      refused (TSI's <bundle-ip>/32 allowlist doesn't include
      loopback). The regression test for the gap the PRD design
      pivot was about.
    * egress-port-bypass probe: agent dials <bundle-ip>:9099
      (egress's port) → connect refused. Chunk 2d has no
      daemons running so nothing's listening anyway; chunk 3
      will preserve this property once egress is up but bound
      to 127.0.0.1 inside the bundle.

End-to-end smoke + both probes green locally on macOS with
smolvm 0.8.0.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 04:39:52 -04:00
didericis 20f411b22e feat(smolmachines): backend skeleton + Smolfile/gvproxy renderers (PRD 0023 chunk 1)
test / unit (pull_request) Successful in 22s
test / integration (pull_request) Successful in 43s
Ships the smolmachines backend's prepare side: subpackage layout,
`_BACKENDS` registration under "smolmachines", preflight check
for `smolvm` + `gvproxy` on PATH, and the two config-file
renderers (Smolfile TOML + gvproxy YAML). Launch raises
NotImplementedError until chunk 2.

New module layout (mirrors backend/docker/):
  claude_bottle/backend/smolmachines/
    __init__.py            re-exports SmolmachinesBottleBackend
    backend.py             SmolmachinesBottleBackend façade
    bottle.py              SmolmachinesBottle stub (NotImpl until ch2)
    bottle_plan.py         SmolmachinesBottlePlan + .print()
    bottle_cleanup_plan.py SmolmachinesBottleCleanupPlan stub
    prepare.py             resolve_plan: writes both config files
    smolfile.py            TOML renderer (stdlib, no tomli_w dep)
    gvproxy_config.py      YAML renderer (same shape as pipelock_yaml)
    util.py                preflight + per-slug subnet + loopback port

The renderers are pure functions. `resolve_plan` runs the
preflight, allocates one host-side loopback port per active
sidecar (pipelock always; git-gate / supervise conditional),
derives a per-slug gvproxy subnet (hash-mod-254, skipping the
docker-default 17), and writes:

  - <stage>/gvproxy.yaml: subnet + DNS rule resolving only
    `proxy.internal` + port_forwards (one per active sidecar).
  - <stage>/smolfile.toml: guest command/env + virtio-net device
    backed by gvproxy's unixgram socket. No TSI flags — see
    PRD 0023 "Why gvproxy, not TSI".

The agent's HTTPS_PROXY etc. point at `proxy.internal:<gateway-
port>` so the guest dials through gvproxy. gvproxy resolves only
`proxy.internal` → the gateway IP, and forwards exactly the
listed ports to the host-side sidecar bundle (PRD 0024); every
other destination — host LAN, host loopback, public internet
directly — is unreachable by construction.

29 new unit tests covering renderer correctness, subnet
derivation stability + collision-avoidance, loopback port
allocation, and preflight error paths. Full unit suite: 532
passing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 02:22:08 -04:00