d0712fb757dd35df57c4d1fa8fdc05e285d8d472
456 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
d0712fb757 |
docs(readme): document git_user manifest field (issue #86)
Add a `git_user:` block to the example bottle frontmatter with a one-paragraph note on what it does + that either field can be set independently. Other doc surfaces (manifest module docstring, provisioner module docstrings) were updated alongside the implementation commits. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
c9cdd41110 |
feat(smolmachines): apply git_user via git config --global on provision (issue #86)
Mirror the docker backend's third provisioning subcase in `backend/smolmachines/provision/git.py`: _provision_git_user(plan, target) Runs `smolvm machine exec --name <M> -e HOME=/home/node -e USER=node -- runuser -u node -- git config --global user.<X> <value>` for each git_user field. No-op when `git_user.is_empty()`. `runuser -u node --` switches the UID without invoking a login shell (matching the existing `Bottle.exec_claude` pattern). HOME / USER are forced via `smolvm -e` because bare runuser inherits root's HOME=/root, which would put --global in /root/.gitconfig instead of /home/node/.gitconfig (where the existing `_provision_git_gate_config` writes). 4 unit tests in test_smolmachines_provision.TestProvisionGitUser: no-op, both-set (asserts runuser prefix + HOME/USER env), name-only, email-only. 661 unit tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
9e69aaa99a |
feat(docker): apply git_user via git config --global on provision (issue #86)
Add a third provisioning subcase to
`backend/docker/provision/git.py`:
_provision_git_user(plan, target)
Runs `docker exec -u node <container> git config --global
user.{name,email} <value>` for each field the bottle's
`git_user` declares. No-op when `git_user.is_empty()`.
`-u node` so `--global` lands in /home/node/.gitconfig (matching
the existing `_provision_git_gate_config` write location, so
agent-side `git` reads both configs from the same dotfile).
Name and email apply independently — a bottle declaring only
name runs just the user.name line, etc.
4 unit tests in `test_docker_provision_git_user.py`: no-op,
both-set, name-only, email-only. 657 unit tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|
|
689675160a |
feat(manifest): add git_user bottle field (issue #86)
Per-bottle `git config --global user.name` / `user.email` pair
so the agent's commits inside the bottle land with a known
identity rather than the agent image's default (no user, or
whatever the image dropped in).
Schema:
git_user:
name: "Eric Bauerfeld"
email: "eric+claude@dideric.is"
Either field can be set independently — name-only / email-only
configs are valid and apply just the field that's set. An
explicit `git_user:` block with both fields empty dies at parse
time rather than silently no-op'ing; an omitted block is the
no-op path (default GitUser is empty, provisioner skips).
Parse-time validation:
- Unknown sub-keys die (e.g., typo of `username`).
- Non-string name/email dies.
- Both-empty dies (half-finished edit hint).
11 unit tests in `test_manifest_git_user.py`; 653 unit tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|
|
574551e2eb |
fix(sidecar_init): strip EGRESS_TOKEN_* from non-egress daemons' env (issue #84)
Pipelock was 403-blocking legitimate egress cred-injected traffic with 'blocked: request header contains secret'. The chain is `agent → egress → pipelock → internet`: egress injects `Authorization: Bearer <token>` for routes with an `auth_scheme`, then forwards upstream to pipelock. Pipelock has `scan_env: true` + `scan_headers: true` + `header_mode: all`, and the bundle supervisor spawned every daemon (egress, pipelock, git-gate, supervise) inheriting the bundle container's full env — including the `EGRESS_TOKEN_<n>` slots set via `docker run -e`. So pipelock had the token value egress injected sitting in its own env, matched it in the request headers, and blocked. The agent itself runs in a different machine and never sees `EGRESS_TOKEN_*`, so stripping these from non-egress daemons' env loses no DLP coverage — pipelock can't catch the exfil of a value the agent doesn't have in the first place. New helper `_env_for_daemon(name, base_env)` returns the unchanged base for `egress` and a copy with `EGRESS_TOKEN_*` filtered for everyone else. `_spawn` now passes the scoped env to `subprocess.Popen`. Prefix-based filter (not exact-match) so future egress-only env slots don't have to update this code. Tests: - `TestEnvForDaemon`: egress gets full env, pipelock / git-gate / supervise lose `EGRESS_TOKEN_0` + `EGRESS_TOKEN_1` but keep `PATH`, `EGRESS_UPSTREAM_PROXY`, `SUPERVISE_PORT`. - Independent-dict invariant locked so callers can't accidentally mutate the supervisor's env. 642 unit tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
b3c6d66850 |
style(smolmachines): address PR #83 review comments
- bottle.py:_PTY_RESIZE_SCRIPT docstring: strip the speculative cwd-dependence explanation. The real reason to use absolute path is just that the wrapper is self-contained; the original rationale (tmux pane cwd) was a hypothesis we never confirmed and wasn't load-bearing once we found the libkrun race. - pty_resize.py:main: drop the long comment duplicating `_STARTUP_SYNC_DELAY_SEC`'s docstring. Keep a one-liner pointing at the constant + the operational note about daemon=True. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
aa5aa1f031 |
fix(smolmachines): defer pty_resize startup sync to dodge libkrun's bringup race
The |
||
|
|
9c83ea6428 |
chore(smolmachines): re-add pty_resize debug log (temp, for issue diagnosis)
User reports the launch still crashes in tmux after b9853ae's stdin=DEVNULL fix. Re-instrument to capture the next failure mode (argv, ppid, sync size, child exit, Popen tracebacks). Removable once the inside-tmux launch is confirmed stable. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
b9853ae0c7 |
fix(smolmachines): give pty_resize side-channel DEVNULL stdin so it survives under tmux
Inside tmux the dashboard's smolmachines launch crashed within
~100ms of the wrapper Popen-ing the main smolvm exec child —
sometimes with rc=137 (SIGKILL), sometimes with smolvm
spitting a runc-style "load `config.json`: cannot parse the
data: parse error: trailing garbage" and exiting 1. The same
wrapper ran fine outside tmux. Diagnostic logs showed the
SIGKILL landed ~100ms after the wrapper kicked off its
initial `sync()` (which fires the side-channel smolvm exec).
Root cause: the side-channel `subprocess.run([smolvm, machine,
exec, --, sh, -c, ...])` did not specify `stdin=`, so it
inherited the wrapper's stdin — the tmux pane PTY. The main
smolvm child (the agent session) also had that PTY as stdin.
Two concurrent smolvm processes sharing the PTY's
foreground-process-group / input plumbing caused smolvm to
abort one of them. iTerm's PTY plumbing apparently tolerated
this; tmux's didn't.
Fix is one line in `_push_size`: `stdin=subprocess.DEVNULL`.
The side-channel never needs stdin — it runs a fire-and-forget
`stty` and exits. Verified end-to-end: pre-fix the wrapper
crashed under `tmux respawn-pane` against a live VM; post-fix
the same invocation completes cleanly.
Also drop the diagnostic log added in
|
||
|
|
37bd11b375 |
chore(smolmachines): instrument pty_resize wrapper for crash diagnosis
User reports launch crashing only inside tmux (works outside). The wrapper itself runs fine in standalone tmux repros, so the break is in some interaction we can't see — curses eats stderr, default tmux remain-on-exit is off, and the pane closes before the operator can read anything. Add an always-on per-pid log at ~/.claude-bottle/pty_resize.log: - start record: argv, cwd, PATH, TMUX status - sync record: window size observed - child pid + exit rc - any KeyboardInterrupt forwarding - Popen failure traceback if it dies Append-mode, small overhead, easy to grep + share. Removable (along with the wrapper itself) once smolvm forwards SIGWINCH natively. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
794e8666e1 |
fix(smolmachines): invoke pty_resize by absolute path, not python -m
The dashboard's launch path crashed inside tmux but worked
outside it. Root cause: `python -m
claude_bottle.backend.smolmachines.pty_resize` needs the
`claude_bottle` package on `sys.path`, which by default comes
from cwd. The outside-tmux path is `subprocess.run(...)` —
inherits the dashboard process's cwd (the repo root, where
`claude_bottle/` lives), so the import resolves. The
inside-tmux path is `tmux split-window / respawn-pane <argv>`,
and tmux opens the new pane with the pane's OWN cwd, not the
cwd of the process invoking split-window. If the operator
started their tmux pane anywhere outside the repo (typical:
`$HOME`), the wrapper hit `ModuleNotFoundError: No module
named 'claude_bottle'` and tmux closed the pane immediately.
Sidestep the cwd dependence by invoking the wrapper as
`python <absolute-path-to-pty_resize.py>` instead of
`python -m <dotted-path>`. The wrapper has no
`claude_bottle.*` imports — it's stdlib-only — so it runs as
a standalone script anywhere on the filesystem. The absolute
path comes from `pty_resize.__file__` at module-load time.
Tests:
- `test_pty_resize_wrapper_prefix`: updated to assert the
absolute-script-path shape rather than the `-m <dotted>`
shape.
- `test_no_wrapper_when_tty_false`: the substring check now
uses `any("pty_resize" in a for a in argv)` instead of
string-joining (so the absolute path's "pty_resize.py"
filename match still catches a regression).
636 unit tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|
|
3fb305f654 |
fix(smolmachines): bridge host SIGWINCH into the VM PTY (issue #82)
`smolvm 0.8.0 machine exec -t` allocates an in-VM PTY but never forwards the host terminal's window size — the PTY starts at `0 0` and host resizes (tmux pane resize, terminal window resize) go unnoticed, so the claude TUI inside a smolmachines bottle renders for whatever tiny box it last saw and ignores operator resizes. `docker exec -it` propagates window-size changes automatically; smolvm doesn't. Workaround: a small Python wrapper (`backend/smolmachines/pty_resize.py`) that interposes between the operator's terminal and `smolvm machine exec`. It spawns smolvm as a child, traps host SIGWINCH, and on every resize (plus once at startup) runs a side-channel `smolvm machine exec --name <M> -- sh -c 'for f in /dev/pts/*; do stty -F $f cols X rows Y; done'`. The kernel delivers SIGWINCH to the in-VM foreground process group when the slave PTY's size changes, so claude picks up the new dimensions without extra signalling. `SmolmachinesBottle.claude_argv` prepends `[sys.executable, -m, claude_bottle.backend.smolmachines. pty_resize, <machine>, --, ...]` to the existing smolvm argv in TTY mode. Non-TTY mode (provisioning shell-outs) skips the wrapper — no PTY to resize. The wrapper survives the dashboard's `_build_resume_argv_with_fallback` shell-wrap because the split-at-`claude` token still finds the right position — the wrapper's prefix wraps the entire smolvm-exec framing. Tests: - `test_smolmachines_pty_resize.py` (new): argv parsing, the side-channel command shape (cols/rows / for-loop over /dev/pts/*), and `_read_winsize`'s fallback across stdin/stdout/stderr including the smolvm-allocated-PTY- reports-`0 0` ironic case. - `test_smolmachines_bottle.py`: updated TTY-mode assertions to unwrap the pty_resize prefix; added `TestClaudeArgvNoTTY` to lock the non-TTY skip. 636 unit tests pass. Removable when smolvm grows native SIGWINCH forwarding. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
a3a9ec065e |
feat(cleanup): walk every backend, reap smolmachines orphans too
`./cli.py cleanup` previously called only the env-var-selected backend's `prepare_cleanup` / `cleanup` — so a leftover smolvm machine + bundle container + bundle network from a crashed smolmachines bottle would survive a default `docker`-mode cleanup indefinitely. Smolmachines now has a real `cleanup` module (alongside `enumerate.py` from issue #77) that walks: - smolvm machines named `claude-bottle-*` (via `smolvm machine ls --json`) - bundle containers `claude-bottle-sidecars-*` - bundle networks `claude-bottle-bundle-*` Cleanup runs stop+delete on the machines, force-rm on the containers, network rm on the networks. Each step is best-effort so a failed rm doesn't block the rest. `cli.py cleanup` walks every backend in `known_backend_names()` and runs each backend's `cleanup` after a single y/N prompt that shows a combined plan. State dirs (`~/.claude-bottle/state/<slug>/`) are shared layout with the docker backend, which still owns the orphan-state-dir bucket. It now consults `enumerate_active_bottles()` for the cross-backend live identity set so a running smolmachines bottle's state dir isn't reaped during a cleanup. Tests: smolmachines cleanup (prepare + cleanup ordering + failure handling); cross-backend orphan protection on the docker state-dir check; CLI cmd_cleanup walks both backends, short- circuits on all-empty, aborts on N. 617 unit tests pass. End-to-end verified on this host: $ smolvm machine ls --json | jq '.[].name' "claude-bottle-researcher-m3hxd" $ ./cli.py cleanup --- smolmachines backend --- smolvm machine: claude-bottle-researcher-m3hxd remove all of the above? [y/N] Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
3103266053 |
fix(dashboard): hoist claude_argv to Bottle ABC so smolmachines pane attach works
Launching a smolmachines agent from the dashboard inside tmux crashed with AttributeError: 'SmolmachinesBottle' object has no attribute 'claude_docker_argv' because the tmux pane-respawn path called `bottle.claude_docker_argv(...)` directly — a method that only existed on DockerBottle. The foreground-handoff path (curses endwin → subprocess.run → restore) doesn't hit it; it goes through `bottle.exec_claude` which is on the ABC. - Move the argv builder onto the `Bottle` ABC as `claude_argv(argv, *, tty=True) -> list[str]`. Both backends implement it; both `exec_claude` impls collapse to `subprocess.run(self.claude_argv(argv, tty=tty), check=False)`. - DockerBottle: rename `claude_docker_argv` → `claude_argv`, body unchanged. - SmolmachinesBottle: extract the argv-building from `exec_claude` into `claude_argv`; the new method returns the full `smolvm machine exec --name … -- runuser -u node -- claude …` argv. The `runuser` switch lives on the exec-framing prefix so the dashboard's `_build_resume_argv_with_fallback` split-at-"claude" trick keeps the UID switch when wrapping the claude tail in `sh -c "… --continue || …"`. - Dashboard: drop the docker-specific wording — local + helper arg names `docker_argv` → `claude_argv`; docstrings on `_build_resume_argv_with_fallback`, `_build_split_pane_argv`, `_build_respawn_pane_argv` now say "backend-exec argv". The shell-fallback wrap is unchanged; the existing logic works for smolmachines because `claude` is still the marker token. Tests: - `tests/unit/test_smolmachines_bottle.py` (new): locks down the smolmachines argv shape — prompt-file flag injection, guest-env `-e K=V` forwarding, TTY toggle, runuser-precedes- claude invariant. - `test_docker_bottle.py`: TestClaudeDockerArgv → TestClaudeArgv; method renames follow. - `test_dashboard_active_agents.py`: docstring follow. 615 unit tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
5e0130b56f |
fix(smolmachines): build agent image in launch, not prepare
When starting a smolmachines agent from the dashboard the docker-build output rendered on top of the curses preflight modal — the build was kicked off before the operator had confirmed launch. The docker backend's `prepare` is pure resolution (no docker calls); smolmachines was inconsistent because `prepare` called `_ensure_smolmachine` which ran `docker build` → `docker save` → `crane push` → `smolvm pack create`, several seconds of stderr noise rendered before the y/N prompt. Move the pipeline: - `_ensure_smolmachine` (+ `_SMOLMACHINE_CACHE_DIR` + `_REPO_DIR` + the local-registry / smolvm imports) moves from `backend/smolmachines/prepare.py` to `backend/smolmachines/launch.py`. Called right before `_smolvm.machine_create` so the resulting `.smolmachine` sidecar path lands as a local in `launch`, not on the plan. - `SmolmachinesBottlePlan.agent_from_path: Path` becomes `agent_image_ref: str`. `prepare` stashes only the docker tag (`$CLAUDE_BOTTLE_IMAGE` || `claude-bottle:latest`); `launch` resolves it into the artifact at bringup. This puts smolmachines on the same prepare-vs-launch boundary the docker backend uses: the preflight summary in the dashboard prints, the operator confirms, then `launch` runs — and its stderr is routed via `_route_op_to_right_pane` (in tmux) or via `curses.endwin` (foreground handoff) so the build output lands cleanly. Tests: - `tests/unit/test_smolmachines_prepare_image.py` → `tests/unit/test_smolmachines_launch_image.py`, updated to import `_ensure_smolmachine` from `launch` rather than `prepare`. - `test_smolmachines_provision.py`: plan fixture switches `agent_from_path` → `agent_image_ref`. 593 unit tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
5d740a6948 |
style(backend): drop stale "moved/removed" pointer comments
PR #78 review comments 580, 582, 584. Each was a comment describing what the previous refactor removed or relocated — information that's in git history, not load-bearing for a reader of the file as-is. - claude_bottle/backend/docker/cleanup.py: drop trailing "enumerate_active moved to enumerate.py" note. - tests/unit/test_dashboard_active_agents.py: drop module docstring paragraph about which tests moved where. - tests/unit/test_docker_enumerate_active.py: drop "noop-when-docker-missing lives at the cross-backend gate now" trailing comment. 607 tests still pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
3b418580a9 |
refactor(backend): has_backend() helper + docker/enumerate split + ActiveAgent rename
Addresses PR #78 review feedback: - New `has_backend(name)` on the backend package + abstract `BottleBackend.is_available()` on each concrete subclass. Replaces inline `shutil.which("docker") is None` checks in docker/cleanup.py:178 and smolmachines/enumerate.py:73. Docker → `shutil.which("docker") is not None`; smolmachines → `smolvm.is_available()`. Cross-backend `enumerate_active_ agents()` skips backends whose `is_available()` is False so a docker-only host doesn't fail when iterating past smolmachines (and vice versa). - Move docker `enumerate_active` + parser helpers out of cleanup.py into a new `backend/docker/enumerate.py`, mirroring the smolmachines/enumerate.py layout. cleanup.py is now purely about prepare_cleanup / cleanup; the active-listing concern owns its own file. - Drop the `ActiveAgent = ActiveBottle` alias in dashboard.py. The canonical name is `ActiveAgent` (the thing running inside a bottle is always called "agent" in this codebase; the bottle is the container). Renamed `enumerate_active_bottles` → `enumerate_active_agents` to match. Tests: - `test_backend_selection.TestEnumerateActiveAgents .test_skips_unavailable_backends` locks down the `is_available()`-gated iteration. - New `TestHasBackend` covers `has_backend("docker")` consulting the backend's `is_available`, and unknown-name → False. - Existing tests follow the rename; the docker-availability- side-effect test in `test_docker_enumerate_active` moves up to the cross-backend layer (where the gate lives now). 607 unit tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
adff1263d8 |
feat(cli): cross-backend list active + --backend flag + dashboard picker (issue #77)
CLI and dashboard now share one cross-backend abstraction for listing + launching bottles, so adding a backend (docker / smolmachines) lights up in both places without separate wiring. Backend abstraction: - New `ActiveBottle` dataclass (`backend_name`, `slug`, `agent_name`, `started_at`, `services`) replaces the docker-specific `ActiveAgent`. Same field surface for the existing dashboard consumers; `ActiveAgent` becomes a typed alias for source-compat. - New `BottleBackend.enumerate_active() -> Sequence[ActiveBottle]` replaces the old `list_active() -> None` (which printed and returned nothing). Docker implements it via the existing compose query; smolmachines implements it via `smolvm machine ls --json` cross-referenced with each bundle container's `CLAUDE_BOTTLE_SIDECAR_DAEMONS` env (`backend/smolmachines/ enumerate.py`). - New `enumerate_active_bottles()` and `known_backend_names()` module-level helpers fold every backend into one call. - `get_bottle_backend(name=None)` takes an optional explicit name (precedence: arg > $CLAUDE_BOTTLE_BACKEND > "docker"). CLI: - `./cli.py list active` enumerates every backend, prints tab-separated `<backend>\t<slug>\t<agent>\t<services>`. The smolmachines bottle the user was looking for now shows up. - `./cli.py start` grows `--backend=<docker|smolmachines>` (choices pulled live from `known_backend_names()`). Threaded through `prepare_with_preflight(backend_name=...)` so the resume path picks up the flag too. Dashboard: - Active agents pane lists both backends (the row formatter now prefixes `[docker]` / `[smolmachines]`). - New-agent flow inserts a backend picker modal between agent pick and preflight (`_backend_picker_modal`). Short-circuits when only one backend is registered. - `discover_active_agents()` collapses to `enumerate_active_bottles()`; `_parse_services_by_project` and `_query_services_by_project` move to `backend/docker/cleanup.py` where the docker enumerator owns them. Tests: parser + enumerate-active tests relocated to `test_docker_enumerate_active.py`. New `test_backend_selection.py` covers `get_bottle_backend`, `known_backend_names`, `enumerate_active_bottles`. New `test_cli_start_backend_flag.py` covers `--backend`'s argparse shape + the explicit-over-env precedence. 605 unit tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
1e82aed54b | Merge pull request 'feat(smolmachines): per-bottle loopback alias scopes TSI to single /32' (#76) from smolmachines-loopback-alias-scoping into main | ||
|
|
2f143c7142 |
fix(smolmachines): include per-bottle alias in NO_PROXY
claude's HTTPS_PROXY was catching the supervise MCP URL (`http://<alias>:<port>/`) because NO_PROXY was hardcoded to `localhost,127.0.0.1` and didn't include the per-bottle loopback alias. Claude proxied the MCP POST through egress, egress had no route for the alias, and the connection failed — `/mcp` showed "supervise · ✘ failed" inside the bottle. Append the loopback alias to NO_PROXY in launch.py so direct MCP calls bypass the proxy. The git-gate URL uses `git://`, which proxies don't touch, so this only affects MCP / HTTP paths to the bundle. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
7eda2a66ec |
feat(smolmachines): patch smolvm state DB to actually enforce per-bottle allowlist
Earlier commit framed this PR as "infrastructure landed, TSI enforcement blocked on upstream smolvm 0.8.0." Found a clean workaround that lets us enforce now. Smolvm persists each machine's config (including `allowed_cidrs`) as a JSON BLOB in `~/Library/Application Support/smolvm/server/smolvm.db`, `vms.data`. `machine create --allow-cidr X/32` silently writes `allowed_cidrs: null` to that row when combined with `--from`, but smolvm reads the row at `machine start` — so patching the row between create and start sets the allowlist for real. New `loopback_alias.force_allowlist(machine_name, cidrs)` opens the SQLite DB, JSON-decodes the row, sets `allowed_cidrs`, and writes back as BLOB (Text type silently corrupts smolvm's later reads). launch.py calls it immediately after `machine_create` and before `machine_start`. Verified end-to-end on macOS / Docker Desktop: VM allowlist after start: ["127.0.0.16/32"] VM → 127.0.0.1:3000 → BLOCKED (Permission denied) VM → 8.8.8.8:53 → BLOCKED (Permission denied) VM → 127.0.0.16:<bundle> → CONNECTED The DB-patch hack is correct only because smolvm reads `allowed_cidrs` from the row at start time (not derived in- process). When upstream honors `--allow-cidr` with `--from`, the call becomes redundant — drop the call and the workaround is gone. Tests: 4 new for `force_allowlist` (BLOB round-trip; Linux no-op; missing DB; missing row). Total 593 unit tests pass. README + PRD updated to reflect the fix landed (no longer "infrastructure pending upstream"). gitea#75 can close. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
a919268d5e |
docs: honest framing of upstream smolvm 0.8.0 allowlist bug
PR #76 originally claimed the per-bottle alias scoping closed gitea#75 ("agent can reach host loopback"). Verified empirically that's not actually true: `smolvm 0.8.0 machine create --from <smolmachine> --net --allow-cidr X/32` silently drops the allowlist (`agent.config.json` shows `allowed_cidrs: null`, and the running VM reaches all of `127.0.0.0/8` regardless). So the alias-allocation + alias-bind infrastructure is correct pre-work, but the actual TSI enforcement is blocked on an upstream smolvm bug. README + PRD 0023 + the module docstring get reworded to say so plainly. gitea#75 stays open. Workarounds tried (all dead-ends): - `machine update --allow-cidr` doesn't exist - stop-edit-`agent.config.json`-restart fails (smolvm removes the file on stop) - `--smolfile` is mutually exclusive with `--from` - `--image localhost:<port>/...` fails because smolvm's agent process can't reach host loopback during pull When upstream lands a fix, our existing code (alias allocation, port-bind, --allow-cidr in launch) will scope correctly without further changes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
2edc1abb9a |
feat(smolmachines): per-bottle loopback alias scopes TSI to single /32
PR #74's Docker-Desktop fix routed the agent through `127.0.0.1:<random>` loopback forwards, but TSI filters by IP only — so the allowlist `127.0.0.1/32` let the agent VM reach **any** host service on macOS loopback (postgres, dev servers, other bottles' published ports, mDNSResponder, ...). Real downgrade vs the docker backend's `--internal` network. Resolution: per-bottle loopback alias. - New `loopback_alias` module manages a pool of `127.0.0.16` .. `127.0.0.31` on `lo0`. macOS only routes `127.0.0.1` by default; the extras need `sudo ifconfig lo0 alias`. `ensure_pool()` lazily adds the missing entries via one sudo prompt on first launch per reboot — aliases persist on `lo0` until reboot, so subsequent launches skip the prompt entirely. - `allocate(slug)` picks the lowest-numbered unused alias by inspecting running bundle containers' port-binding HostIps. No on-disk reservation — docker is the source of truth. - Bundle bringup binds published ports to the allocated alias (`docker run -p <alias>::<port>`) instead of `127.0.0.1`. - TSI allowlist becomes the alias's /32 — narrows reachability to this bottle's bundle only. - Linux native daemons share the host's network namespace; `127.0.0.0/8` works without aliases, so the module no-ops on non-Darwin and returns `127.0.0.1` from `allocate`. Tracking issue closed: gitea/issues/75. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
bad195e910 | Merge pull request 'feat(smolmachines): PRD 0022 sandbox-escape suite green under smolmachines (PRD 0023 chunk 5)' (#73) from prd-0023-chunk-5-sandbox-suite-smolmachines into main | ||
|
|
d7cef27584 |
feat(smolmachines): PRD 0022 sandbox-escape suite green under smolmachines (PRD 0023 chunk 5)
Final PRD 0023 chunk. The PRD 0022 attack suite was already backend-agnostic — it goes through get_bottle_backend(), so the right dispatch happens based on CLAUDE_BOTTLE_BACKEND. Two cleanups to make it actually run cleanly under CLAUDE_BOTTLE_BACKEND=smolmachines: - setUpClass raises unittest.SkipTest with a useful message when CLAUDE_BOTTLE_BACKEND=smolmachines but smolvm isn't on PATH, or when the host isn't macOS (libkrun + TSI single-IP allowlist is macOS-only in v1). Without this, the test would die deep inside backend.prepare's smolmachines_preflight rather than skipping. - test_5_readme_push_blocked switches from a hardcoded `git://git-gate/...` remote URL (only resolvable on docker via the bundle's short alias) to the bottle's declared upstream URL (`ssh://git@unreachable.invalid:22/throwaway.git`). The agent's ~/.gitconfig insteadOf rewrite — set up by provision_git on both backends — transparently redirects to the gate, so the same test exercises docker's `git://git-gate/...` and smolmachines's `git://<bundle_ip>:9418/...` URLs without branching on backend. README gets a "Backend selection" subsection under Quickstart documenting CLAUDE_BOTTLE_BACKEND, the macOS-only v1 scope for smolmachines, and the `curl -sSL .../install.sh | sh` install prerequisite — per PRD 0023's acceptance criteria. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
eceba96c68 | Merge pull request 'fix(smolmachines): docker push fails on Docker Desktop — daemon-side route differs from host loopback' (#74) from fix-local-registry-docker-desktop into main | ||
|
|
d02fe50193 |
fix(smolmachines): run claude mcp add as node so config lands in node's home
provision_supervise dispatched `claude mcp add --scope user` through `smolvm machine_exec`, which runs as root by default. The MCP entry got written to root's ~/.claude.json — but the agent's claude reads /home/node/.claude.json, so `/mcp` showed "No MCP servers configured" inside the bottle. Wrap the exec in `runuser -u node -- env HOME=/home/node ...` so the config writes to the right home. Same pattern as the interactive exec_claude / Bottle.exec wrappers — `smolvm machine_exec` is always root, so any command that touches user state has to switch UID + set HOME explicitly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
515306cd4a |
fix(smolmachines): restore /tmp + /var/tmp perms after smolvm pack remap
smolvm's pack process remaps OCI-layer ownership to the host invoker's uid for *every* directory, not just /home/node — so /tmp lands as `0755 501:dialout` instead of the standard `1777 root:root`. Non-root processes can't create per-uid scratch dirs in there. Claude-code's first Bash tool call fails with `EACCES: permission denied, mkdir '/tmp/claude-1000'`. Same workaround folded into the existing perms-repair sh -c: `chown root:root /tmp /var/tmp && chmod 1777 /tmp /var/tmp` next to the /home/node chown. One machine_exec round trip total. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
45c821a8f3 |
docs(smolmachines): note loopback-scope limitation + tracking issue
PR #74's Docker-Desktop pivot widened the smolmachines TSI allowlist from `<bundle-ip>/32` to `127.0.0.1/32` (TSI can't filter by port, and docker bridge IPs aren't reachable from macOS networking). The agent VM can therefore reach any service on macOS's loopback while the bottle is running — not just the bundle's published ports. README gets a "Smolmachines backend" subsection under Quickstart spelling this out as a known v1 limitation. PRD 0023 grows a new open question #8 with the proposed v2 fix (per-bottle loopback alias + TSI allowlist scoped to that /32, via sudo `ifconfig lo0 alias`). Tracking issue: gitea.dideric.is/didericis/claude-bottle/issues/75. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
5486170be1 |
fix(smolmachines): route agent through egress when routes declared, wait for VM warm-up
Two related bugs: 1. Auth chain bypassed egress. After the Docker-Desktop port pivot, the agent always dialed pipelock directly — meaning egress (which holds the real OAuth token and rewrites the Authorization header) wasn't in the request path. Bearer placeholder reached anthropic verbatim → 401 "Invalid bearer token". Fix: when the bottle declares egress.routes, the agent's first hop is egress (publish egress port 9099 to host loopback, leave pipelock bundle-internal). Without routes, the agent dials pipelock directly. Same hop order as the docker backend. 2. provision_ca's update-ca-certificates SIGKILLed at ~100ms on Docker Desktop. Back-to-back `smolvm machine exec` calls immediately after machine_start hit a VM warm-up race in libkrun's exec channel; the second exec's child got SIGKILL'd before producing more than the first line of stdout. The agent's trust store never got the egress MITM CA's hash symlink, so curl/openssl couldn't validate the TLS chain. Fix: 1.5s sleep after machine_start (empirically enough), plus fold provision_ca's chown + chmod + update-ca-certificates into one `sh -c` so we only pay one exec round trip. Bail with a clear error if update-ca- certificates doesn't report "1 added" (failing silently was how the original SIGKILL went unnoticed). Net effect on Docker Desktop / macOS: claude's HTTPS_PROXY is `http://127.0.0.1:<egress port>`, egress rewrites auth, pipelock allowlists + DLPs, request reaches api.anthropic.com with a real token. End-to-end verified. Also drops the PRD-0023-chunk-3 EGRESS_LISTEN_HOST=127.0.0.1 mitigation. The original concern (agent bypassing pipelock by dialing egress's port on the bundle IP) doesn't apply in this topology: the agent can only reach whatever port we publish on host loopback, and egress is the only HTTP/HTTPS chokepoint that gets published. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
4f136a9932 |
fix(smolmachines): agent dials bundle via host loopback ports, not docker bridge IP
Claude hung on outbound network calls under CLAUDE_BOTTLE_BACKEND=smolmachines: Unable to connect to API (FailedToOpenSocket) Root cause: the PRD-0023 design pinned the bundle at a docker bridge IP (192.168.X.2) and set the smolvm guest's TSI allowlist to `<bundle-ip>/32`. On native Linux this works — host shares the docker bridge's network namespace, TSI's syscall impersonation reaches the bridge IP directly. On Docker Desktop (macOS), the daemon runs in its own Linux VM and docker bridge IPs aren't reachable from macOS networking, so the smolvm guest's TSI requests die "Network is unreachable" before they hit pipelock. Fix: publish each agent-facing bundle daemon's port on host loopback (-p 127.0.0.1::PORT), discover the random host-side ports after start, and route the agent through `127.0.0.1:<host port>` instead of the bridge IP. macOS loopback is the surface Docker Desktop's gvproxy forwards into the daemon's VM, so the chain (guest TSI -> macOS loopback -> daemon VM port-forward -> bundle container) works on both Docker Desktop and native Linux. Concrete changes: - BundleLaunchSpec: add `ports_to_publish` so start_bundle adds `-p 127.0.0.1::PORT` for the agent-facing ports (pipelock always; git-gate when upstreams declared; supervise when enabled). Egress's port stays bundle-internal. - sidecar_bundle.bundle_host_port(): wrap `docker port <bundle> <container_port>/tcp` so launch can look up the random host-side mapping after start. - launch.py: discover the host ports, build URLs of the form `http://127.0.0.1:<host port>` / `git://127.0.0.1:<host port>`, stamp onto guest_env + new agent_*_url fields on the plan. - launch.py: TSI allow_cidrs flips to `["127.0.0.1/32"]`. The bundle IP is no longer the agent's target. - prepare.py: stop synthesizing HTTPS_PROXY / GIT_GATE_URL / MCP_SUPERVISE_URL at prepare time — launch owns those now (the values depend on a port docker hasn't assigned yet). - provision_git: gate_host from plan.agent_git_gate_host. - provision_supervise: URL from plan.agent_supervise_url. End-to-end verified on Docker Desktop / macOS: guest dials pipelock through TSI, pipelock forwards to api.anthropic.com, the API responds with 401 (i.e. it received the request). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
da1e5e1ba8 |
fix(smolmachines): pass --net explicitly when allow_cidrs is set
smolvm 0.8.0 docs say `--allow-cidr` implies `--net`, but empirically the implication only fires when no `--from` is set. `--from PATH --allow-cidr X/32` silently produces a machine with network: false and no routes in the guest — claude lands inside with HTTPS_PROXY pointing at the bundle's pinned IP but every connect fails with "Network is unreachable" / FailedToOpenSocket in claude's UI. Reproduce + verify: $ smolvm machine create --from <pack> --allow-cidr X/32 nettest $ smolvm machine ls --json | jq '.[].network' # false $ smolvm machine create --from <pack> --net --allow-cidr X/32 nettest2 $ smolvm machine ls --json | jq '.[].network' # true Add `--net` whenever `allow_cidrs` is non-empty. No change to the no-allow-cidr code path. Test added to lock down both branches. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
91955ec59f |
fix(smolmachines): forward guest env on every exec + chown /home/node
Two issues kept claude's TUI from drawing after launch: 1. smolvm pack remaps OCI-layer ownership to the host invoker's uid (501 on macOS) instead of preserving the image's USER node (uid 1000). /home/node ends up owned by some uid that doesn't exist in the VM, so when claude runs as node it can't appendFileSync to ~/.claude.json on startup — fails with ENOENT and the TUI hangs. Fix: chown -R node:node /home/node after machine_start, before provision. 2. smolvm machine_create -e sets env on PID 1 but it doesn't propagate to fresh exec process trees (verified empirically: `smolvm machine exec -- printenv` shows none of the machine_create env vars). Claude was running with no HTTPS_PROXY / CLAUDE_CODE_OAUTH_TOKEN / NODE_EXTRA_CA_CERTS, so even the auth-validation step bailed silently. Fix: thread `guest_env` through to the SmolmachinesBottle handle and re-pass every entry via `-e K=V` on every machine_exec call (interactive claude and shell exec both). Also fills in the same `CLAUDE_CODE_OAUTH_TOKEN=egress- placeholder` + telemetry-off env the docker backend's forwarded_env carries, plus the NODE_EXTRA_CA_CERTS / SSL_CERT_FILE / REQUESTS_CA_BUNDLE trust trio. Verified end-to-end on Docker Desktop / macOS: claude's TUI renders cleanly with the bypass-permissions banner. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
35edf50f21 |
fix(smolmachines): drop runuser -l in favor of UID switch + explicit HOME/USER
Interactive claude session hung silently after `attaching interactive claude session...` — `runuser -l` invokes a login shell that triggers PAM session setup / /etc/profile sourcing, and the minimal Debian agent VM doesn't have the PAM config files for that to complete cleanly. claude never got to draw its TUI. Switch UID via plain `runuser -u <user> --` (no `-l`) and inject HOME / USER through `smolvm machine exec -e` so the child process sees them. Avoids login-shell wiring entirely. Same pattern in `exec_claude` and `exec(script)`. `_HOME_FOR` maps the two users the codebase currently asks for (`node`, `root`); anything else falls back to `/home/<user>`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
af65c10361 |
refactor: Bottle.exec takes a user= kwarg, default node
Promote the user-switch from a hardcoded `node` to a keyword arg so callers can opt into root (or any other user) when needed. Default stays `node` — matches the docker image's USER and the smolmachines runuser default. Lifts the change through the base ABC, docker, and smolmachines backends: - Base: `def exec(self, script, *, user="node")`. - Docker: adds `-u <user>` to `docker exec` (no-op when user is node, the image's default). - Smolmachines: `runuser -l <user> -c <script>` — `runuser -l root` is the trivial no-op form when the caller asked for root. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
e26d459a97 |
fix(smolmachines): run claude + shell exec as the node user
`smolvm machine exec` runs commands as root in the VM, but the
agent image's USER is `node`. claude-code refuses
`--dangerously-skip-permissions` when invoked as root, killing
the interactive session right after `attaching interactive claude
session...`:
--dangerously-skip-permissions cannot be used with root/sudo
privileges for security reasons
Wrap both `exec_claude` and `exec(script)` in
`runuser -l node -c ...` so commands run as the node user with
node's $HOME / $USER (login shell). The docker backend gets
this behavior for free via the image's USER directive; this
restores parity.
shlex-quote each claude argv element when stitching the runuser
-c shell command so paths / flags with shell-special chars
survive the parse.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|
|
906c9fd1bb |
fix(smolmachines): preflight print uses plan-level egress routes
`SmolmachinesBottlePlan.print` iterated over
`bottle.egress.routes` (the manifest's capitalized-attribute form
on `manifest.EgressRoute`) but accessed `r.host` (lowercase).
Worked when no egress routes were declared; AttributeError
("EgressRoute has no attribute 'host'") on the first bottle with
a route.
Switch to `self.egress_plan.routes` — the resolved plan-level
EgressRoute (lowercase `host`), same source the docker backend's
print uses.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|
|
47eb56bd10 |
fix(smolmachines): use containerized crane to push, bypassing docker daemon's HTTPS preference
The previous fix (`host.docker.internal:<port>` for daemon-side push) still failed: Get "https://host.docker.internal:53958/v2/": http: server gave HTTP response to HTTPS client `host.docker.internal` is reachable from Docker Desktop's daemon VM but isn't in the daemon's default insecure-registries CIDRs (only `::1/128` and `127.0.0.0/8` are), so docker push tries HTTPS, hits a plain-HTTP registry, and refuses. The daemon.json fix (`"insecure-registries": ["host.docker.internal"]`) works but is a one-time manual step in Docker Desktop's UI — not something we can do for the user. Sidestep the daemon push entirely: 1. docker build (as before) — local layer cache makes no-change rebuilds cheap. 2. docker save the image to a per-digest tarball alongside the cached `.smolmachine`. 3. Start an ephemeral registry container on a per-session docker network, with `-p :5000` so the host can also reach it for the pack step. 4. docker run a one-shot crane container on the SAME network, mount the tarball, `crane push --insecure /img.tar <registry-container>:5000/...`. Container DNS resolves the registry on the network; `--insecure` forces plain HTTP. 5. `smolvm pack create --image localhost:<host port>/...` from the host. smolvm's bundled crane auto-falls-back to HTTP for localhost addresses, so no insecure-registries config is needed on that side. 6. Tear down everything; reap the tarball (registries hold the same bytes, no need to keep both around). Net effect: the docker daemon never does an HTTP/HTTPS-policy decision on our behalf. `docker push` is gone from the prepare path; `docker save`, `docker network create`, `docker run` (for registry + crane) replace it. Tested end-to-end on Docker Desktop / macOS: `_ensure_smolmachine ("claude-bottle:latest")` produces a 204MB `.smolmachine.smolmachine` artifact. Adds: - backend/docker/util.py:save() — thin docker save wrapper. - local_registry.crane_push_tarball() — one-shot crane run on the registry's network. - CRANE_IMAGE constant pinned by digest (gcr.io/go-containerregistry/crane@sha256:0ae17ecb...). Removes: - backend/docker/util.py:tag() / push() — unused without daemon push. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
f4026ea3ae |
fix(smolmachines): docker push fails on Docker Desktop — daemon-side route differs from host loopback
`./cli.py start <agent>` under CLAUDE_BOTTLE_BACKEND=smolmachines died at `docker push localhost:<port>/claude-bottle:<id>` with `Get "http://localhost:<port>/v2/": context deadline exceeded`. Cause: chunk 4c bound the ephemeral registry to `127.0.0.1::5000` and used `localhost:<port>` as the only image-ref hostname. On Docker Desktop the daemon runs inside its own Linux VM — its `localhost` is the VM's loopback, not the host's, so the daemon cannot reach a registry bound to the host's 127.0.0.1. Fix: bind the registry to all interfaces (`-p :5000`) so it's reachable from both sides, and yield two endpoints: - `daemon_endpoint` — `host.docker.internal:<port>` on Docker Desktop (daemon-side hostname for the host VM gateway), `localhost:<port>` on a native Linux daemon that shares the host's network namespace. Used for `docker tag` + `docker push`. - `host_endpoint` — always `localhost:<port>`. Used for `smolvm pack create`, which runs as a host process. The registry stores images by repo+tag, so a push to `host.docker.internal:<port>/cb:<id>` and a pull from `localhost:<port>/cb:<id>` resolve to the same blob — the hostname in a ref is just routing. Detection uses `docker info --format '{{.OperatingSystem}}'`, which returns "Docker Desktop" on macOS/Windows Desktop and the host's OS name on native daemons. Trade-off: all-interface binding briefly publishes the registry on every interface (~5-10s during prepare). The pushed image is built from the public repo Dockerfile (no secrets), the port is random, and the window is short — acceptable for v1 of a personal dev tool. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
ac8c7ba696 |
feat(smolmachines): provision_ca + provision_git + provision_supervise (PRD 0023 chunk 4d)
End-to-end provisioning parity with the docker backend. After this
chunk a smolmachines bottle has a working trust store, git-gate
gitconfig, and supervise MCP registration — same shape as docker,
dispatched via `smolvm machine cp` / `smolvm machine exec` instead
of `docker cp` / `docker exec`.
Adds three new provision modules:
- ca.py: select egress vs pipelock CA (same logic as
docker), machine cp + update-ca-certificates,
log sha256 fingerprint.
- git.py: copy host .git when --cwd was passed; render
~/.gitconfig with insteadOf URLs. URL prefix is
`git://<bundle_ip>:9418/...` (no DNS in the
TSI-allowlisted guest) vs docker's
`git://git-gate/...`.
- supervise.py: `claude mcp add` via machine_exec; URL is
`http://<bundle_ip>:9100/`. Failure is logged but
non-fatal (matches docker).
Shared render: `render_git_gate_gitconfig` moves out of
backend/docker/provision/git.py into the platform-neutral
claude_bottle/git_gate.py (renamed to git_gate_render_gitconfig
for consistency with the existing git_gate_render_* helpers),
parameterized on a `gate_host` argument so both backends use the
same logic with different addresses.
Path/user fixups for the post-chunk-4c agent image (real
claude-bottle image, USER node, $HOME=/home/node):
- prompt.py default path moves from /root/... to
/home/node/.claude-bottle-prompt.txt; chown + chmod after
machine cp.
- skills.py default skills dir moves from /root/.claude/skills to
/home/node/.claude/skills; chown -R per skill.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|
|
1fa17d1822 |
feat(smolmachines): build agent image from repo Dockerfile (PRD 0023 chunk 4c)
Replaces the alpine:latest placeholder with a real claude-bottle agent image, converted into a .smolmachine artifact via an ephemeral local OCI registry. Why the registry hop: smolvm pack create only accepts OCI registry refs. Empirically it rejects docker-daemon://, oci-layout://, docker-archive: tarballs, and every other transport tested — the crane backend treats anything with a scheme prefix as a registry hostname. To convert a locally-built docker image into a .smolmachine we have to push it somewhere smolvm can pull from. Smallest path: bring up registry:2.8.3 bound to 127.0.0.1:<random>, docker tag + docker push into it, smolvm pack create --image localhost:<port>/claude-bottle:<id>, tear down the registry. The .smolmachine is cached under ~/.cache/claude-bottle/smolmachines/ keyed by the docker image ID (first 16 hex chars of the sha256), so a Dockerfile change picks up a new image ID and invalidates the cache. Unchanged rebuilds skip the whole build → registry → pack pipeline. This puts `docker build` in smolmachines prepare (the docker backend defers it to launch). Necessary because pack_create needs the image ID to derive the cache key, and prepare is the only hook ahead of launch that runs once per slug. Adds: - claude_bottle/backend/docker/util.py: image_id / tag / push helpers (thin docker CLI wrappers). - claude_bottle/backend/smolmachines/local_registry.py: ephemeral_registry() context manager; pins registry:2.8.3 by digest, binds 127.0.0.1::5000 (loopback-only), force-removes on exit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
4ac61a563b | Merge pull request 'feat(smolmachines): thread inner Plans + bundle daemons run (PRD 0023 chunk 4b)' (#70) from prd-0023-chunk-4b-inner-plans into main | ||
|
|
519a71f2e7 |
refactor(docker): drop legacy names from capability_apply teardown
Last of the per-sidecar legacy names. `_per_bottle_container_names` used to list the four pre-bundle sidecars (cred-proxy, pipelock, git-gate, supervise) so capability-apply's teardown would force-rm them on remediation. None of those containers exist anymore — the four daemons run in the sidecar bundle (PRD 0024), so the list collapses to the agent + the bundle. Integration test follows: the fake supervise-sidecar setup, which existed to give teardown an extra container to clean up, switches to a fake sidecar bundle with the current name. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
727f30d422 |
refactor(docker): drop legacy per-sidecar container_name functions
Same line of cleanup as the supervise rename: the per-sidecar container names (`claude-bottle-pipelock-<slug>`, `claude-bottle-egress-<slug>`, `claude-bottle-git-gate-<slug>`) were docker-network aliases pointing at the bundle, kept so legacy URLs would keep resolving. Replaces them with short hostnames (`pipelock`, `egress`, `git-gate`) matching the existing `EGRESS_HOSTNAME` pattern, and inlines the bundle-loopback URL (`http://127.0.0.1:8888`) for the in-bundle egress→pipelock hop — matching what smolmachines already does. Drops the three `*_container_name` functions, `pipelock_proxy_url`, and `git_gate_host`. Their callers move to the new constants: - `PIPELOCK_HOSTNAME = "pipelock"` (claude_bottle/pipelock.py) - `GIT_GATE_HOSTNAME = "git-gate"` (claude_bottle/git_gate.py) - `BUNDLE_LOCAL_PIPELOCK_URL` (backend/docker/pipelock.py) The agent's HTTP_PROXY now reads `http://pipelock:8888` (vs the old `http://claude-bottle-pipelock-<slug>:8888`); the gitconfig insteadOf rewrites become `git://git-gate/<repo>.git`. The prepare- time orphan probe is collapsed onto the bundle container name (`claude-bottle-sidecars-<slug>`) instead of the four legacy per-sidecar names that no backend creates anymore. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
8ecba2b458 |
refactor(docker): drop legacy supervise_container_name alias
Supervise runs inside the sidecar bundle (PRD 0024), not in its own container. The `claude-bottle-supervise-<slug>` per-sidecar name only existed as a docker-network alias on the bundle so legacy code paths that referenced the old name would still resolve. Nothing inside the project relies on that resolution anymore — the short `supervise` alias is the one all consumers use — so the legacy long-form is dead. Drops the function entirely, plus its registration as a network alias and as an orphan probe in prepare. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
73dc0d4a40 |
refactor(sidecars): instantiate sidecar ABCs directly from any backend
The four sidecar prepare-time helpers (PipelockProxy, Egress, GitGate, Supervise) had docker-flavored subclasses that existed only as instantiation shims for ABCs that already had no abstract methods. PipelockProxy.prepare() reached for class-level CA path constants that were only defined on the docker subclass — so smolmachines had to import DockerPipelockProxy to render pipelock yaml, reaching across the backend boundary for what's actually a platform-neutral operation. This moves the universal in-container CA paths (PIPELOCK_CA_CERT_IN_CONTAINER / PIPELOCK_CA_KEY_IN_CONTAINER) to claude_bottle/pipelock.py, drops the class-attr indirection on the ABC, and deletes the four empty docker subclasses. Both backends now instantiate the ABCs directly; the docker-side modules keep the docker-flavored helpers (image pin, container naming, host CA mint) and re-export the moved pipelock constants for compat. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
1dfc359141 |
feat(smolmachines): thread inner Plans + bundle daemons run (PRD 0023 chunk 4b)
Bundle daemons (pipelock, egress, optionally git-gate + supervise)
now actually start with their config files bind-mounted from the
inner Plans the docker backend already produces. Chunks 2d + 3
ran with daemons_csv="" so the bundle's init supervisor idled;
chunk 4b wires up the real path: agent → pipelock → egress →
internet (when routes declared) is now functional, modulo agent-
image gaps (claude-code / TLS-trust-store / git in the guest)
that chunk 4c addresses.
bottle_plan.py — added the four inner Plan fields:
proxy_plan: PipelockProxyPlan
git_gate_plan: GitGatePlan
egress_plan: EgressPlan
supervise_plan: SupervisePlan | None
Same shape the docker backend's plan uses. Docker-network-only
fields (internal_network, egress_network) stay at dataclass
defaults — the smolmachines bundle is on a per-bottle bridge
with a pinned IP, not docker's --internal + egress topology.
prepare.py — instantiates DockerPipelockProxy / DockerEgress /
DockerGitGate / DockerSupervise and calls their .prepare()
methods to write the per-bottle config files (pipelock.yaml,
routes.yaml, git-gate entrypoint/hooks, supervise queue dir)
under the per-bottle state dir. (The "Docker" prefix on the
class names is a misnomer here — .prepare() is platform-neutral,
inherited from each sidecar's ABC. A future cleanup could factor
the prepare logic out of the docker subpackage.)
launch.py — major rewrite:
- pipelock_tls_init at launch (always); egress_tls_init only
when the bottle declares routes (otherwise the CA files
aren't bind-mounted and openssl runs would be wasted).
- Inner Plans updated in place with launch-time CA paths +
EGRESS_UPSTREAM_PROXY = http://127.0.0.1:8888 (egress's
upstream is pipelock on the bundle's own loopback; same
container's network namespace).
- BundleLaunchSpec env + volumes built from the inner Plans:
pipelock.yaml + CA + key (always); egress routes + CAs +
upstream env + token-slot bare names (when routes); git-gate
entrypoint + hooks + per-upstream identity files (when
upstreams); supervise queue dir + env (when enabled).
- daemons_csv = ["egress", "pipelock"] + ["git-gate"] (if
upstreams) + ["supervise"] (if enabled).
- Token env values resolved from host env via
`egress_resolve_token_values` and threaded into the
docker-run subprocess env (bare-name -e entries in spec
inherit from there — values never land on argv).
Tests:
- 552 unit passing (no new unit cases; fixture updated to
populate the new plan fields).
- 5 integration cases passing locally (Darwin + smolvm + docker
+ not GITEA_ACTIONS):
* test_smoke_exec_echo — still works.
* test_localhost_reach_probe — host loopback still refused.
* test_egress_port_bypass_probe — <bundle-ip>:9099 still
refused, NOW WITH EGRESS ACTUALLY RUNNING (chunk 3's
127.0.0.1 bind-address is doing its job).
* test_prompt_file_lands_in_guest — still works.
* test_pipelock_answers_on_bundle_ip — NEW. From inside the
guest, wget to <bundle-ip>:8888 gets an HTTP response
(not "connection refused") — proves pipelock is actually
listening and the bind-mount + CA generation path works.
What's left in chunk 4:
- 4c: agent-image-conversion (claude-code + git + curl +
ca-certificates in the guest). Chunk 2d's alpine placeholder
stays for now.
- 4d: provision_ca + provision_git + provision_supervise once
the agent image has the required tools.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|
|
085a0c1923 |
style(smolmachines): provision_git stub uses pass not del
Addresses PR #69 review comment: `del plan, target` was just a silence-the-unused-arg gesture but reads oddly for a stub. `pass` is the standard "this is a stub" sentinel. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
9e3b7e441e |
feat(smolmachines): provision_prompt + provision_skills (PRD 0023 chunk 4a)
First slice of chunk 4: implement the two provisioning methods
that don't depend on agent-image tooling beyond `cp` and
`mkdir`. provision_ca / provision_git / provision_supervise
land once the agent-image gap is solved (chunk 4b+) — they need
update-ca-certificates, git, and the claude binary respectively,
none of which the chunk-2d alpine placeholder provides.
What this PR ships:
- `claude_bottle/backend/smolmachines/provision/` subpackage
with `prompt.py` + `skills.py`. Each routes through
`smolvm.machine_cp` / `machine_exec`. provision_prompt mirrors
the docker contract (file always copied; return value drives
--append-system-prompt-file iff the agent has a non-empty
prompt). provision_skills mkdir + cp per skill, matching
the docker backend's loop.
- prepare.py now writes the prompt file under
agent_state_dir(slug) with the agent's `prompt` body, mode
0o600. The in-guest path is `/root/.claude-bottle-prompt.txt`
(alpine has no `node` user; will become `/home/node/...` once
the real claude-bottle image lands).
- launch.py calls `provision(plan, machine_name)` after
machine_start. The returned prompt path threads to
SmolmachinesBottle so exec_claude can add
--append-system-prompt-file when the agent has a prompt.
- backend.py: provision_prompt / provision_skills now real;
provision_git is a deliberate stub (waiting on the git-gate
inner Plan + git in the agent image). provision_supervise
stays the chunk-2d stub.
Tests:
- 7 new unit cases (test_smolmachines_provision.py): argv
shape (mocked smolvm.machine_cp / .machine_exec),
prompt return-value contract, no-op-with-no-skills,
CLAUDE_BOTTLE_GUEST_SKILLS_DIR override, fail-on-missing-skill.
- 1 new integration case in test_smolmachines_launch.py:
end-to-end verification that the prompt file lands in the
alpine guest at /root/.claude-bottle-prompt.txt with the
expected content (via `bottle.exec("cat ...")`). The smoke +
the two TSI probes stay green.
552 unit + 4 integration (Darwin+smolvm+docker gated) passing.
What's left in chunk 4:
- 4b: thread the inner Plans (PipelockProxyPlan / EgressPlan /
GitGatePlan / SupervisePlan) through prepare + launch so the
bundle daemons actually run (currently daemons_csv="").
- 4c: the agent-image-conversion gap — get claude-code + git +
curl + ca-certificates into the guest image (build a
.smolmachine via `pack create --from-vm` after manual setup,
or push the docker image to a registry smolvm can pull).
- 4d: provision_ca + provision_git + provision_supervise once
4b + 4c land.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|
|
554d60324d | Merge pull request 'feat(sidecars): egress binds 127.0.0.1 when EGRESS_LISTEN_HOST is set (PRD 0023 chunk 3)' (#68) from prd-0023-chunk-3-egress-bind-localhost into main |