`./cli.py cleanup` previously called only the env-var-selected
backend's `prepare_cleanup` / `cleanup` — so a leftover smolvm
machine + bundle container + bundle network from a crashed
smolmachines bottle would survive a default `docker`-mode cleanup
indefinitely.
Smolmachines now has a real `cleanup` module (alongside
`enumerate.py` from issue #77) that walks:
- smolvm machines named `claude-bottle-*` (via
`smolvm machine ls --json`)
- bundle containers `claude-bottle-sidecars-*`
- bundle networks `claude-bottle-bundle-*`
Cleanup runs stop+delete on the machines, force-rm on the
containers, network rm on the networks. Each step is best-effort
so a failed rm doesn't block the rest.
`cli.py cleanup` walks every backend in `known_backend_names()`
and runs each backend's `cleanup` after a single y/N prompt that
shows a combined plan.
State dirs (`~/.claude-bottle/state/<slug>/`) are shared layout
with the docker backend, which still owns the orphan-state-dir
bucket. It now consults `enumerate_active_bottles()` for the
cross-backend live identity set so a running smolmachines
bottle's state dir isn't reaped during a cleanup.
Tests: smolmachines cleanup (prepare + cleanup ordering + failure
handling); cross-backend orphan protection on the docker
state-dir check; CLI cmd_cleanup walks both backends, short-
circuits on all-empty, aborts on N. 617 unit tests pass.
End-to-end verified on this host:
$ smolvm machine ls --json | jq '.[].name'
"claude-bottle-researcher-m3hxd"
$ ./cli.py cleanup
--- smolmachines backend ---
smolvm machine: claude-bottle-researcher-m3hxd
remove all of the above? [y/N]
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Launching a smolmachines agent from the dashboard inside tmux
crashed with
AttributeError: 'SmolmachinesBottle' object has no attribute
'claude_docker_argv'
because the tmux pane-respawn path called
`bottle.claude_docker_argv(...)` directly — a method that only
existed on DockerBottle. The foreground-handoff path (curses
endwin → subprocess.run → restore) doesn't hit it; it goes
through `bottle.exec_claude` which is on the ABC.
- Move the argv builder onto the `Bottle` ABC as
`claude_argv(argv, *, tty=True) -> list[str]`. Both backends
implement it; both `exec_claude` impls collapse to
`subprocess.run(self.claude_argv(argv, tty=tty), check=False)`.
- DockerBottle: rename `claude_docker_argv` → `claude_argv`,
body unchanged.
- SmolmachinesBottle: extract the argv-building from
`exec_claude` into `claude_argv`; the new method returns the
full `smolvm machine exec --name … -- runuser -u node --
claude …` argv. The `runuser` switch lives on the
exec-framing prefix so the dashboard's
`_build_resume_argv_with_fallback` split-at-"claude" trick
keeps the UID switch when wrapping the claude tail in
`sh -c "… --continue || …"`.
- Dashboard: drop the docker-specific wording — local + helper
arg names `docker_argv` → `claude_argv`; docstrings on
`_build_resume_argv_with_fallback`, `_build_split_pane_argv`,
`_build_respawn_pane_argv` now say "backend-exec argv". The
shell-fallback wrap is unchanged; the existing logic works
for smolmachines because `claude` is still the marker token.
Tests:
- `tests/unit/test_smolmachines_bottle.py` (new): locks down
the smolmachines argv shape — prompt-file flag injection,
guest-env `-e K=V` forwarding, TTY toggle, runuser-precedes-
claude invariant.
- `test_docker_bottle.py`: TestClaudeDockerArgv →
TestClaudeArgv; method renames follow.
- `test_dashboard_active_agents.py`: docstring follow.
615 unit tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Addresses PR #78 review feedback:
- New `has_backend(name)` on the backend package + abstract
`BottleBackend.is_available()` on each concrete subclass.
Replaces inline `shutil.which("docker") is None` checks in
docker/cleanup.py:178 and smolmachines/enumerate.py:73.
Docker → `shutil.which("docker") is not None`; smolmachines
→ `smolvm.is_available()`. Cross-backend `enumerate_active_
agents()` skips backends whose `is_available()` is False so a
docker-only host doesn't fail when iterating past
smolmachines (and vice versa).
- Move docker `enumerate_active` + parser helpers out of
cleanup.py into a new `backend/docker/enumerate.py`, mirroring
the smolmachines/enumerate.py layout. cleanup.py is now
purely about prepare_cleanup / cleanup; the active-listing
concern owns its own file.
- Drop the `ActiveAgent = ActiveBottle` alias in dashboard.py.
The canonical name is `ActiveAgent` (the thing running inside
a bottle is always called "agent" in this codebase; the bottle
is the container). Renamed `enumerate_active_bottles` →
`enumerate_active_agents` to match.
Tests:
- `test_backend_selection.TestEnumerateActiveAgents
.test_skips_unavailable_backends` locks down the
`is_available()`-gated iteration.
- New `TestHasBackend` covers `has_backend("docker")` consulting
the backend's `is_available`, and unknown-name → False.
- Existing tests follow the rename; the docker-availability-
side-effect test in `test_docker_enumerate_active` moves up
to the cross-backend layer (where the gate lives now).
607 unit tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
CLI and dashboard now share one cross-backend abstraction for
listing + launching bottles, so adding a backend (docker /
smolmachines) lights up in both places without separate wiring.
Backend abstraction:
- New `ActiveBottle` dataclass (`backend_name`, `slug`,
`agent_name`, `started_at`, `services`) replaces the
docker-specific `ActiveAgent`. Same field surface for the
existing dashboard consumers; `ActiveAgent` becomes a typed
alias for source-compat.
- New `BottleBackend.enumerate_active() -> Sequence[ActiveBottle]`
replaces the old `list_active() -> None` (which printed and
returned nothing). Docker implements it via the existing
compose query; smolmachines implements it via `smolvm machine
ls --json` cross-referenced with each bundle container's
`CLAUDE_BOTTLE_SIDECAR_DAEMONS` env (`backend/smolmachines/
enumerate.py`).
- New `enumerate_active_bottles()` and `known_backend_names()`
module-level helpers fold every backend into one call.
- `get_bottle_backend(name=None)` takes an optional explicit
name (precedence: arg > $CLAUDE_BOTTLE_BACKEND > "docker").
CLI:
- `./cli.py list active` enumerates every backend, prints
tab-separated `<backend>\t<slug>\t<agent>\t<services>`. The
smolmachines bottle the user was looking for now shows up.
- `./cli.py start` grows `--backend=<docker|smolmachines>`
(choices pulled live from `known_backend_names()`). Threaded
through `prepare_with_preflight(backend_name=...)` so the
resume path picks up the flag too.
Dashboard:
- Active agents pane lists both backends (the row formatter now
prefixes `[docker]` / `[smolmachines]`).
- New-agent flow inserts a backend picker modal between agent
pick and preflight (`_backend_picker_modal`). Short-circuits
when only one backend is registered.
- `discover_active_agents()` collapses to
`enumerate_active_bottles()`; `_parse_services_by_project` and
`_query_services_by_project` move to
`backend/docker/cleanup.py` where the docker enumerator owns
them.
Tests: parser + enumerate-active tests relocated to
`test_docker_enumerate_active.py`. New
`test_backend_selection.py` covers `get_bottle_backend`,
`known_backend_names`, `enumerate_active_bottles`. New
`test_cli_start_backend_flag.py` covers `--backend`'s argparse
shape + the explicit-over-env precedence.
605 unit tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When a fresh proposal arrives, the dashboard now also:
- Runs `tmux select-pane -t \$TMUX_PANE` (the dashboard's own
pane id, captured at startup) so tmux focus jumps to the
dashboard from wherever the operator was (typically claude
in the right pane).
- Flips internal focus to PANE_PROPOSALS so j/k navigates the
queued items immediately.
- Lands the selected cursor on the first new proposal —
proposals are sorted by arrival ascending, so the earliest
new arrival in the batch gets the cursor.
Stacks with the bell + label highlight from the previous
commit. The operator gets:
1. Audible bell (or tmux activity marker)
2. Tmux focus on the dashboard pane
3. Dashboard's internal focus on the proposals list
4. Cursor on the actual new proposal
5. Pane label flashing `(new!)` in bold green
— all without leaving the keyboard.
When a fresh proposal lands in the supervise queue, the
dashboard:
1. Rings the terminal bell via `curses.beep()` so tmux's
`monitor-bell` (or the terminal's own bell-on-activity)
surfaces a notice in the dashboard pane even when the
operator is focused on claude in the right pane.
2. Bolds + green-attrs the `proposals:` pane label and
suffixes it with `(new!)` so a glance at the dashboard
screen catches the alert at a glance.
The highlight tracks the existing per-row green-highlight
window (`_NEW_PROPOSAL_HIGHLIGHT_SEC`). The bell only fires for
NEWLY arrived proposals after the first tick — pre-existing
queue entries on dashboard startup don't ring.
The new-agent (`n`) flow's tmux branch was leaving keyboard
focus in the dashboard pane after compose-up + provision
finished and claude landed in the right pane — same situation
as Enter re-attach before its `focus_right_pane` fix. The
operator just spun an agent up; they want to type at it.
Pass `focus_right_pane=True` to `_attach_in_tmux` from the
new-agent flow. `tmux select-pane` runs after the respawn.
`--continue` exits non-zero when an agent has been spun up but
never typed at — there's no transcript to resume. Re-attaching
to such an agent via Enter (tmux mode) was crashing the pane.
Wrap the resume invocation in `sh -c '<cmd> --continue || <cmd>'`
so a failed `--continue` cleanly falls through to a fresh
claude. The shell adds microseconds and the fallback only
kicks in when --continue would have failed anyway.
New `_build_resume_argv_with_fallback(bottle)` builds the
shell-wrapped docker exec argv with proper shlex quoting (so
paths-with-spaces in `--append-system-prompt-file` survive).
Only the tmux re-attach path uses it; first-attach + foreground
handoff are unchanged.
489 unit tests pass (4 new for the fallback builder).
The Enter key on a focused agents-pane row is the operator's
explicit "I want to interact with this agent" signal — after
respawning the right pane with claude, move tmux's keyboard
focus to that pane so the operator can start typing
immediately. Without this, every Enter required a manual tmux
nav (C-b →) to actually use the session.
Mechanics:
- `_attach_in_tmux` gains `focus_right_pane: bool = False`.
- When True, runs `tmux select-pane -t <pane_id>` after the
respawn.
- `_attach_to_bottle` (the Enter handler's helper) passes
True.
- Other callers (new-agent flow, stop's auto-attach) leave
it False so the operator stays in the dashboard for
follow-up navigation.
`_tmux_select_pane` is a small subprocess wrapper, best-effort
on failure.
After `x` stops a dashboard-owned bottle, slide focus to the
next agent in the agents pane (the one filling the stopped
row, or the new last row if the stopped was last) and respawn
the right pane with that agent's claude session via `--continue`.
If no agents remain, close the right pane via `tmux kill-pane`.
Two new helpers:
- `_tmux_close_right_pane(tmux_state)` — kills the tracked
pane (if it exists) and clears pane_id / slug.
- `_pick_next_after_stop(agents_before, selected_index,
stopped_slug)` — pure chooser returning (new_index, agent)
or None. Tested directly.
Outside tmux, only the selected_agent index slides; no
auto-attach (foreground handoff would take over the terminal,
disruptive). 485 unit tests pass (6 new for the pick helper).
The dashboard is primarily an agent-management surface
(PRD 0020 + 0021); landing on the proposals pane was a holdover
from when proposals were the only thing the dashboard showed.
Default focus is now `PANE_AGENTS`, so j/k navigates the agents
list immediately on launch — the operator Tabs to proposals
when something queues. Focus choice still persists across
operations.
Both `_new_agent_flow` (bringup) and `_stop_bottle_flow`
(teardown) were doing the same five-step dance: open the log
path, mkdir parents, empty the file, ensure the right pane is
tailing it, redirect fd 2 to the same file. Extract into a
context manager:
with _route_op_to_right_pane(tmux_state, slug, log_name) as routed:
if routed:
<run op>
Yields True when routing succeeded (fd 2 redirected, pane
tailing), False on fallback conditions (not in tmux, no
tmux_state, or tmux failed to spawn a pane). The fallback
paths still differ between callers — bringup follows up
with `_attach_in_tmux`, teardown does the curses-endwin
compose-down — so the helper stops at "is stderr routed
or not" and lets callers branch from there. Net diff:
~60 lines deleted, the routing-to-right-pane concept now
lives in one place.
PRD 0021 follow-up. Mirrors the bringup-into-right-pane fix
on the explicit-stop path: when `\$TMUX` is set, the stop
flow respawns the right pane with `tail -F
state/<slug>/teardown.log` (via `_ensure_right_pane` —
reuses the existing right pane if it's the agent's claude
session) and redirects fd 2 to that log for the duration of
`capture_session_state` + `cm.__exit__`. compose-down +
network-remove messages stream into the right pane.
After `settle_state` removes the state dir, the tail keeps
its buffered output visible (tail -F handles file removal
gracefully); the next attach respawns the pane with claude.
Falls back to the existing curses-endwin path on tmux
failure, or when the dashboard isn't in tmux at all.
After the operator pressed `y` on the preflight modal (or
picked an agent in the picker), the modal's curses sub-window
stayed on screen until the dashboard's main loop ticked again
— which during a 5-10s launch made it look like the
confirmation never registered.
Add `_erase_modal` (touchwin + refresh on stdscr) and call it
at every exit from `_preflight_modal` and `_picker_modal`.
The pre-modal frame buffered on stdscr immediately overwrites
the sub-window's area; the launch proceeds with a clean
dashboard underneath.
PRD 0021 follow-up. The new-agent flow was calling a dedicated
`_tmux_split_pane_tail` that ALWAYS created a new pane —
so every `n` start spawned a fresh right pane next to any
existing one, accumulating panes instead of reusing them.
Replace with a generic `_ensure_right_pane(tmux_state, argv)`
that respawns the dashboard's tracked right pane if one is
alive, splits a new one only when none is tracked or the
tracked pane was closed. Both the new-agent tail-during-
bringup path AND the existing claude-attach path now route
through this helper.
Net effect: starting a second agent reuses the same right
pane — bringup tail replaces the prior claude session,
then claude (for the new agent) replaces the tail. Closing
the right pane manually via `C-b x` still triggers a fresh
split on the next attach.
PRD 0021 follow-up. When starting a new agent via `n` while
in tmux, the dashboard now:
1. Pre-creates the right pane with `tail -F
state/<slug>/bringup.log`.
2. Redirects fd 2 (stderr) to that log file via dup2 — affects
both Python `info()` calls AND subprocess inheritors'
stderr (docker compose up, network creates, provision).
3. Runs `backend.launch().__enter__()` with the redirect in
place; everything streams into the right pane via tail.
4. Restores stderr.
5. Respawns the right pane (tail → claude session).
Net effect: dashboard pane stays uncluttered during bringup,
and the operator watches the compose-up + provision output in
the same pane that's about to hold the claude session — no
visual handoff between "starting" and "started."
Curses never needs to come down on the tmux path (the pane is
already created in the dashboard's neighbor pane, and stderr
is redirected away from the terminal entirely).
If `_tmux_split_pane_tail` fails (tmux missing, server died),
falls through to the existing curses-endwin handoff so the
operator still gets a session.
PRD 0021 chunk 4 (final). Two adjustments to close the
split-pane loop:
1. `_stop_bottle_flow` clears `tmux_state['slug']` when the
stopped bottle was the right-pane occupant. The pane itself
stays in place (claude exits with "container not found");
the operator presses Enter on a different agent to
repurpose it via respawn-pane.
2. `_render` accepts `right_pane_slug` and marks the matching
agents-pane row with a `*` prefix + A_BOLD (when it's not
also the focused row — focused selection still wins for
visibility). Gives the operator a clear visual link
between which agent the dashboard says is "active right
now" and which one is visible to their right.
Wired through `_main_loop`: passes `tmux_state` to
`_stop_bottle_flow` on `x`, and `tmux_state.get('slug')` to
`_render` on every tick.
479 unit tests pass (1 new for the tmux_state-preservation
on non-owned stop). PRD 0021 implementation complete pending
merge.
PRD 0021 chunk 3. The `n` flow (PRD 0020 chunk 2) now routes
the first claude session of a freshly-started bottle into the
right tmux pane when `\$TMUX` is set — same `_attach_in_tmux`
state machine the Enter re-attach uses, just with
`resume=False` so claude starts fresh.
Outside tmux the existing foreground handoff is unchanged.
The compose-up phase (`backend.launch.__enter__`) still drops
curses for its stderr output; we restore curses BEFORE
spawning into the right pane so the dashboard re-renders
alongside the new claude session instead of waiting for
attach to return.
PRD 0021 chunk 2. New tmux integration: when `\$TMUX` is set
and the operator presses Enter on a focused agent row, the
dashboard spawns / respawns the right pane with that bottle's
claude session instead of taking over the terminal via
curses.endwin.
Mechanics:
- `_in_tmux()` — true when `\$TMUX` is set.
- `_tmux_split_pane_create` — first attach: `tmux split-window
-h -P -F '#{pane_id}'` opens a right pane and prints its id
for tracking.
- `_tmux_respawn_pane` — subsequent attaches: `tmux
respawn-pane -k -t <id>` swaps the content without
re-splitting.
- `_tmux_pane_exists` — `tmux list-panes` check before
respawn so a manually-closed pane gracefully falls back to
a fresh split.
- `_attach_in_tmux` — owns the create-or-respawn state
machine, mutates `tmux_state` ({pane_id, slug}) so the
main loop tracks the right-pane occupant.
- `_attach_via_handoff` — the previous curses-endwin path,
extracted as the fallback when tmux is missing or fails.
- `_attach_to_bottle` dispatches: in tmux + state available →
`_attach_in_tmux`; otherwise → handoff.
Main loop gets `tmux_state: dict = {"pane_id": None, "slug":
None}`. Chunks 3 + 4 wire it through the new-agent flow and
the stop hook.
`FileNotFoundError`-safe `subprocess.run` calls around every
tmux invocation — a missing tmux binary cleanly falls back to
the handoff for that keypress. 478 unit tests pass (10 new
for the pure argv builders + `_claude_runtime_args`).
The `bottles` dict held `@contextmanager`-wrapped launch contexts.
On normal Python interpreter shutdown those context managers'
generators got GC'd, which raised GeneratorExit at the yield
point and ran the `finally` block — invoking each bottle's
teardown and tearing down the compose project. Net effect: `q`
WAS implicitly stopping every dashboard-launched bottle even
though the keypress handler just `return`'d.
`os._exit(0)` skips all Python-level cleanup (GC, atexit, etc.),
so the docker compose projects survive the dashboard exit
untouched. Curses gets explicit `endwin()` first because the
brutal exit skips curses.wrapper's normal terminal restoration.
Matches PRD 0020's resolved-question answer (`q` does NOT tear
down bottles; teardown is always explicit via `x` or
`./cli.py cleanup`).
`--resume` alone surfaces claude's session picker even when only
one session exists. `--continue` jumps to the most recent session
non-interactively, which is the actual behavior the dashboard's
Enter re-attach wants for typical bottle-with-one-session cases.
Re-entering a running bottle from the dashboard (Enter on the
agents pane) now invokes claude with `--resume` so the session
picks up the prior conversation history rather than starting a
fresh transcript. The first-attach paths (`./cli.py start` and
the dashboard's new-agent `n` flow) leave it off — the
transcript doesn't exist yet there.
`attach_claude` gains a `resume: bool = False` kwarg;
`_attach_to_bottle` in the dashboard passes `True`.
Final PRD 0020 chunk. `x` on a focused agents-pane row tears
down the selected bottle if the dashboard owns it (started via
the chunk-2 `n` flow): pops `(cm, bottle, identity)` from the
main loop's bottles map, snapshots the transcript best-effort,
calls `cm.__exit__(None, None, None)` to drive the existing
compose-down + network-remove sequence, then `settle_state` to
honor any pre-existing preserve marker.
On a non-owned slug (discovered via `list_active_slugs` but not
in the dashboard's bottles dict — i.e., previous-dashboard or
external `./cli.py start` bottle), `x` is a no-op with a status
hint pointing at `./cli.py cleanup`. Matches the PRD's
cross-dashboard re-attach model: the dashboard can re-attach
either kind, but can only tear down its own.
The PRD's chunk 5 ("quit-cleanup") is satisfied by the existing
no-op behavior of `q` — per the user's resolved-question
answer, quit leaves bottles running unchanged. No code change
needed for that.
Footer surfaces `[x] stop`. 465 unit tests pass (1 new for the
non-owned no-op path; the owned path is integration territory
because it drives a real compose-down).
PRD 0020 chunk 3. Enter on a focused agents-pane row drops to a
claude session inside the selected bottle. Works for both
dashboard-owned bottles (looks up the stored Bottle handle in
the main loop's `bottles` dict) and externally-discovered ones
(synthesizes a DockerBottle from the slug → `claude-bottle-<slug>`
container name).
For the synthesized path, the `--append-system-prompt-file`
target resolves via metadata.json + the manifest's agent prompt
if both can be read; otherwise the re-attach runs without the
flag (claude defaults to no system prompt, the bottle's other
state is untouched).
Shares the curses.endwin → attach → refresh handoff with the
chunk-2 new-agent flow via a new `_attach_to_bottle` helper.
Footer reshuffled to advertise `[Enter] view/attach`. 464 unit
tests pass (3 new for `_bottle_for_slug`).
PRD 0020 chunk 2. Pressing `n` opens a modal that lists every
agent from the manifest with `(N running)` suffixes for ones
that already have bottles up. Type to filter (substring,
case-insensitive); j/k or arrows to navigate; Enter to confirm;
Esc clears the filter on first press, exits the picker on the
second.
On confirmation, the dashboard runs:
- `prepare_with_preflight` from chunk 1 with curses-modal
render + prompt callables (the preflight modal centers the
plan summary + captures [y/N]).
- `backend.launch(plan).__enter__()` — enters but doesn't bind
the context to a `with`. The (cm, bottle, identity) tuple
lands in the main loop's `bottles` dict keyed by slug.
- `curses.endwin()` → `attach_claude(bottle)` → `stdscr.refresh()`
handoff. The agent's claude session takes over the terminal;
on exit the dashboard re-renders with the bottle now visible
in the agents pane.
Crucially the context manager is held alive in `bottles` — never
`__exit__`'d at quit. Chunk 4 will wire `x` to that exit; for
now bottles started from the dashboard stay running until
explicit cleanup. Matches the PRD's "q does not tear down"
decision.
Footer surfaces `[n] new agent`. 461 unit tests pass (8 new for
`_filter_agents` and `_running_counts`).
PRD 0020 chunk 1. `cli/start.py`'s `_launch_bottle` did three
things in one function: prepare + preflight, attach claude, and
settle state on teardown. Split them so the dashboard (PRD 0020
chunk 2+) can reuse the prepare + attach pieces piecewise
without going through the CLI's one-shot orchestrator:
- `prepare_with_preflight(spec, *, stage_dir, render_preflight,
prompt_yes, dry_run)` — injects render + prompt callables so
the CLI binds them to stderr/stdin while the dashboard binds
them to a curses modal. Returns `(plan, identity)`; identity
is set after `backend.prepare` returns so callers can reap
the prepare-time state dir on abort via `settle_state` in
their finally — preserving today's preflight-N cleanup.
- `attach_claude(bottle, *, remote_control)` — runs claude
inside the bottle and returns its exit code. The dashboard
calls this from inside a `curses.endwin` → … →
`stdscr.refresh()` handoff.
- `capture_session_state` / `settle_state` lose their leading
underscore; the dashboard will call them on
session-end + explicit-stop respectively.
`_launch_bottle` becomes a thin orchestrator over those helpers.
No behavior change; all 453 unit tests pass and `./cli.py start
implementer --dry-run` produces identical preflight output.
PRD 0019 chunk 4 (final). The `e` (routes edit) and `p` (pipelock
edit) keys now require an agent selection in the agents pane.
Pressing them with the proposals pane focused, with no active
agents, or with an out-of-range selection is a no-op with a
status hint ("no agent selected; Tab into the agents pane first").
The discover-and-prompt scaffolding inside
`_operator_edit_routes_flow` / `_operator_edit_allowlist_flow` /
`_operator_edit_flow` is gone. The flows now take an `ActiveAgent`
+ required-service name; they refuse with a clear message when
the bottle lacks the requested sidecar (e.g., `routes edit`
against a bottle with no `bottle.egress.routes` declared). The
`discover_egress_slugs` + `discover_pipelock_slugs` +
`_discover_active_with_service` helpers come out — they had no
remaining callers.
Footer now reads `[e/p] edit selected agent`.
PRD 0019 chunk 3. The TUI now has two focusable panes — proposals
and agents — and `Tab` toggles which one the `j/k`/arrow keys
move through.
Each pane keeps its own selection index. Switching panes doesn't
lose the position in the other; the cursor (`>` + reverse-video
row) appears only in the focused pane. The label line on each
pane shows "(focused)" when active.
Footer reshuffled: `[Tab] switch pane [j/k] move [Enter] view
[a/m/r] proposal [e/p] edit [q] quit`. When the agents pane is
focused and there's no status message to display, the idle
status line surfaces the currently-selected agent (or "[no
active agents]" / "[no agent selected]" fallbacks) so the
operator knows what an agent-scoped edit verb will target after
chunk 4 wires them up.
Proposal action keys (a/m/r/Enter) are gated on the proposals
pane being focused — pressing them with the agents pane focused
is a no-op. e/p still use the global discover-and-prompt flow
for one more chunk; chunk 4 swaps them to read the agents-pane
selection.
PRD 0019 chunk 2. The TUI's main render now draws two panes:
proposals on top (existing), active agents on the bottom (new).
Header counts both totals. The agents pane refreshes on the
same 1s tick — agents starting/stopping reflect without
operator action.
Each agent row shows slug, agent name, started-time (HH:MM:SS
of the metadata.json timestamp), and the bracketed list of
sidecars currently up. The `agent` service is filtered out of
the displayed list — it's always present so it'd be noise; the
sidecars are the differentiator. A bottle whose only running
service is `agent` (sidecars still warming up) renders as
`(starting)`.
No selection model yet — that's chunk 3. The cursor stays in
the proposals pane; `j/k`/arrow nav and the proposal action
keys are unchanged.
PRD 0019 chunk 1. New `discover_active_agents()` in dashboard.py
returns one `ActiveAgent(slug, agent_name, started_at, services)`
per currently-running compose project:
- Slugs come from `list_active_slugs()` (chunk-5 shared helper).
- The service set per project comes from ONE label-filtered
`docker ps` call (PRD open question #1: avoids N per-bottle
`compose ps` invocations on each 1s refresh tick).
- agent_name + started_at come from each bottle's
metadata.json; "?" / "" fallbacks when the file is missing
so the row renders rather than vanishes.
Not wired into the TUI yet — chunk 2 renders the agents pane.
The parser (`_parse_services_by_project`) is split out as a pure
function so the conditional-input shape can be unit-tested
without docker.
PRD 0018 chunk 5. The dashboard's operator-edit verbs
(`routes edit`, `pipelock edit`) enumerated running sidecars
via `docker ps --filter name=...` prefix scans. Switch to
`docker compose ls`-based discovery so the dashboard, cleanup
CLI, and launch step all agree on what's running.
Mechanics:
- `claude_bottle/backend/docker/compose.py` grows three shared
helpers: `list_compose_projects` (the JSON parse moved out
of cleanup), `slug_from_compose_project` (inverse of
`compose_project_name`), and `list_active_slugs` (sugar over
the first two for the common "what's running?" question).
- cleanup.py drops its private `_list_compose_projects` +
`_PROJECT_PREFIX` in favor of the shared ones; `list_active`
simplifies (one compose-ls call, not two).
- dashboard.py's `_discover_sidecar_slugs` becomes
`_discover_active_with_service`: cross-references the active
slug list with a label-filtered `docker ps` so only bottles
whose given service container is actually up surface in the
edit menu. Bottles without an egress sidecar (no
bottle.egress.routes) no longer appear for `routes edit`.
3 new unit tests cover the slug ↔ compose-project naming
contract; manual probe with a fake compose project confirms
both `discover_egress_slugs` and `discover_pipelock_slugs`
return the expected slug.
PRD 0018 chunk 4. `claude-bottle cleanup` now derives its work
from `docker compose ls --all --format json`, filtered to projects
whose name starts with `claude-bottle-`. Per project: one `compose
down --volumes` removes the containers + the compose-managed
networks atomically.
The plan also enumerates three fallback buckets:
- Stray containers — `claude-bottle-*` containers with no
`com.docker.compose.project` label (left over from pre-compose
code paths). Cleared via `docker rm -f`.
- Stray networks — `claude-bottle-*` networks with no compose
project label. Cleared via `docker network rm`.
- Orphan state dirs — per-bottle `~/.claude-bottle/state/<id>/`
dirs with no live project AND no `.preserve` marker. The
`.preserve` marker (capability-block or auto-preserve-on-crash)
explicitly opts-out of reaping; manual `rm -rf` is the only
path for preserved state.
cli/cleanup.py collapses to a single y/N prompt — backend.prepare_cleanup
returns everything in one plan, backend.cleanup processes everything,
no more double-prompt for state. The CLI-side state-dir enumeration
+ `_state_summary` flags from PR #25 are gone; the backend's
orphan-detection rules subsume them.
PRD 0018 chunk 2. Each sidecar's prepare-time output (pipelock yaml +
CAs, egress routes.yaml + CAs, git-gate entrypoint + hooks, supervise
current-config, agent env + prompt) now lands in
~/.claude-bottle/state/<slug>/<service>/ instead of an ephemeral
mktemp dir. The state subdirs become the stable bind-mount sources
that chunk 3's docker compose project will reference.
The SDK launch path is unchanged — `docker cp` still copies from the
plan-held paths into containers, just from new locations. start.py's
session-end cleanup is now in `finally`, which also reaps state dirs
left behind by dry-run / preflight-N / prepare-exception paths
(previously only the post-launch path settled state).
The manifest key is `egress:` now; finish the rename so the rest of
the codebase matches. Files (Dockerfile.egress, claude_bottle/egress.py
etc.), classes (Egress, EgressConfig, EgressRoute, EgressPlan,
DockerEgress), constants (EGRESS_HOSTNAME, EGRESS_ROUTES, ...),
container name prefix (claude-bottle-egress-*), docker network alias
(egress), the introspection host (_egress.local), the MCP tool IDs
(egress-block, list-egress-routes), and the preflight label all drop
the `-proxy` suffix.
Companion to the compact preflight in #31 — the JSON format was
the structured alternative to the verbose text summary. With the
new compact text already on screen, no consumer was using the
JSON shape, and the abstract `BottlePlan.to_dict` was the
biggest piece of API surface no one is implementing against.
Removed:
- `--format` CLI flag from `start` and `resume`.
- `output_format` kwarg from `_launch_bottle`.
- `BottlePlan.to_dict` abstract method.
- `DockerBottlePlan.to_dict` (60-line dict builder).
- The `_PlanView` dataclass — `print` was the only remaining
caller, so the env-name computation is inlined.
- `tests/integration/test_dry_run_plan.py` (JSON-shape
integration test).
- `tests/unit/test_cli_start_format.py` (flag-conflict unit).
Plan-introspection is still possible by reading the
`DockerBottlePlan` dataclass directly — fields like `image`,
`container_name`, `stage_dir`, `use_runsc` are all there. Tooling
that needs a stable wire shape can JSON-serialize the dataclass
themselves.
411 unit + integration tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Instead of asking the agent to compose and submit a full routes
file, the tool now takes ONE proposed route — host + optional
path_allowlist + optional auth — and the supervisor merges it
into the live routes table at approval time. The agent no longer
needs to fetch / reproduce / extend the existing allowlist; it
just describes the host it wants reachable.
Tool input (new):
- `host` (required)
- `path_allowlist` (optional, array of absolute path prefixes)
- `auth` (optional, {scheme, token_ref})
- `justification` (required)
Merge semantics (in `egress_proxy_apply._merge_single_route`):
- Host NOT in current routes → append the proposed route as a
new entry. If `auth` is set, assign the next EGRESS_PROXY_TOKEN_N
slot.
- Host already present → union the proposed `path_allowlist`
with the existing one (proposed entries appended after
existing, deduped). Existing `auth_scheme` / `token_env`
preserved; proposed `auth` ignored (operator-controlled, not
agent-controlled).
- Hostname comparison is case-insensitive.
Dashboard wiring: `approve()` on an egress-proxy-block proposal
now calls `add_route(slug, proposed_route_json)` instead of
`apply_routes_change(slug, full_file)`. add_route fetches the
current routes from the running egress-proxy, merges, and calls
apply_routes_change with the merged content — so the
pipelock-mirror + SIGHUP plumbing from chunk 3 still runs
end-to-end. Audit diff still captures the full-file before/after.
Tool description rewritten to make the new shape obvious and to
stop pointing the agent at the routes file. The
`list-egress-proxy-routes` tool stays available for agents that
want to see what's currently allowed.
Tests: 9 new `_merge_single_route` cases (host absent/present,
path-allowlist union+dedup, auth-slot indexing, case-insensitive
match, existing-auth preservation, missing-host rejection,
malformed-current rejection). 407 unit + integration pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Finishes PRD 0017. The `cred-proxy-block` MCP tool is renamed and
its remediation apply path is repointed at egress-proxy.
- `claude_bottle/supervise.py` — `TOOL_CRED_PROXY_BLOCK` →
`TOOL_EGRESS_PROXY_BLOCK`; `COMPONENT_FOR_TOOL` maps the new
tool ID to `egress-proxy` for audit-log routing.
- `claude_bottle/supervise_server.py` — tool definition renamed
+ description rewritten: "Call when egress-proxy refused your
HTTPS request ... Read the current routes.yaml from /etc/
claude-bottle/current-config/routes.yaml, compose a modified
version, pass the full new file plus a justification." The
syntactic validator dispatches on the new tool ID.
- `claude_bottle/backend/docker/egress_proxy_apply.py` — renamed
from `cred_proxy_apply.py`. Reads routes.yaml from
/etc/egress-proxy/routes.yaml via `docker exec cat`; validates
via `egress_proxy_addon_core.load_routes` (so both sides use
the same parser); writes via `docker cp`; SIGHUPs egress-proxy
with `docker kill --signal HUP`. `EgressProxyApplyError`
replaces `CredProxyApplyError`.
- `claude_bottle/cli/dashboard.py` — wires the new apply +
`discover_egress_proxy_slugs` helper; the operator-initiated
`routes edit <bottle>` verb now writes to egress-proxy with
`.yaml` suffix. Stale follow-up comment about path-aware
filtering removed — PRD 0017 settled that question.
- `tests/integration/test_supervise_sidecar.py` — restores the
approval round-trip test (chunk 2 had switched it to a reject
path because no cred-proxy existed). Approval stubs
`apply_routes_change` so the test focuses on the supervise
queue/response plumbing rather than docker-exec into a real
egress-proxy sidecar (that's covered separately).
- `tests/unit/test_egress_proxy_apply.py` — rewritten against
the new validator; covers JSON shape, missing routes key,
partial-auth-pair rejection (the addon-core parser catches
these before SIGHUP).
- PRDs 0010 + 0014 — status headers updated to
Superseded / Retargeted with a callout block pointing at PRD
0017's migration section. Historical text preserved.
384 unit + integration tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The "→ would allow host: api.github.com" framing added narration
where none was needed. Just render the host on its own line in
green — that's literally the text that gets appended to pipelock's
allowlist on approve, and the green color carries "what's about to
change". The URL (with path) is still right above for context.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When the operator opens a pipelock-block proposal in the detail
view (Enter / 'v'), append a green-coloured line:
→ would allow host: api.github.com
so what's actually about to change is obvious at a glance. The
full failed URL stays above the new line (the path is operator
context — pipelock can't enforce it, just records intent).
- _detail_lines now returns (text, attr) tuples; pipelock-block
appends the host-extract line tagged with the green color pair.
- _detail_view threaded the green_attr through from the main loop
(matches the new-proposal highlight pattern from earlier in this
PR).
- Best-effort URL parsing; unparseable payloads skip the highlight
line rather than render a misleading blank host.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PR #25's pipelock-block tool sends a full URL and the supervisor
extracts just the hostname for pipelock's allowlist — pipelock
2.3.0's api_allowlist is hostname-only (verified by inspecting the
binary's strict preset). The path component is operator context,
not enforced.
Document the follow-up shape inline at the apply site so a future
reader looking at why we're throwing away the path lands on the
plan: adding `auth_scheme: none` + `path_allowlist` to cred-proxy,
and rewiring pipelock-block to propose cred-proxy routes instead
of pipelock hostnames. Multi-touch change, its own PRD.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reshape the pipelock-block MCP tool around what the agent actually
knows at the moment of failure (the URL pipelock just refused), not
what the operator needs (a full allowlist file).
Before: agent had to read /etc/claude-bottle/current-config/allowlist,
copy the whole file, append their host, send back. Lots of work,
easy to get wrong, and the operator's diff was noisy because the
proposal contained every host the agent saw — most of which weren't
the change.
After: agent calls
pipelock-block(failed_url="https://api.github.com/repos/foo/bar",
justification="...")
supervisor extracts api.github.com, fetches the running allowlist,
adds the host if not already present, applies the merged content.
Path is captured as operator context (the detail view labels it
"failed URL" instead of "proposed file") but isn't enforced —
pipelock's api_allowlist is hostname-only, so the path can't
become an allow rule.
- supervise_server: pipelock-block input schema gains `failed_url`
(replaces `allowlist`); validate_proposed_file checks for
http/https + hostname.
- PROPOSED_FILE_FIELD updated; tool description rewritten.
- dashboard._apply_pipelock_url: extract host, fetch current,
merge, apply.
- _proposed_payload_label: detail view renders "failed URL" for
pipelock-block, "proposed file" otherwise.
- Tests updated end-to-end; new url-host-merge + idempotent-merge +
invalid-url cases added.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When a new proposal lands in the dashboard's list, the operator
shouldn't have to compare the list to a mental snapshot to spot
what's new. Render newly-arrived proposals in green for the first
five seconds after they show up.
- _try_init_green: initialise a green color pair; returns 0 if the
terminal lacks color so the highlight degrades to no-op.
- _main_loop tracks first_seen[proposal_id] across refresh ticks,
pruning entries when a proposal leaves the queue.
- _render ORs green into the existing attr (composes with selection
reverse-video — terminal handles the mix).
Applies to all tool types (cred-proxy-block, pipelock-block,
capability-block). If a tool-specific highlight is wanted later,
filter on qp.proposal.tool in _is_recent.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The main loop blocked on stdscr.getch() until the operator hit a
key — a tool call landing in the queue while the operator was just
watching wouldn't appear on the screen. The operator had to press
any key to trigger a re-render and see the new proposal.
Switch to stdscr.timeout(1000): getch returns -1 after 1s if no
key was pressed, and the loop re-renders with the latest
discover_pending() result. CPU cost is trivial; the loop body is
~one filesystem scan + curses draw per second.
Also restructure status_line lifecycle: was cleared right after
every render, which meant a timeout-driven re-render would wipe
the message ~1s after the operator's keystroke set it. Now
status_line is cleared only on actual key press, so messages
like "approved cred-proxy-block for [dev-xyz]" persist until the
operator does something else.
Detail view + prompt view are unchanged — they're modal, the
underlying proposal data doesn't move, and getstr can't tolerate
a re-render mid-input.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Extends the preserve-on-capability-block design to also preserve
state on agent crash, and snapshots the transcript on every
teardown so any resume (crash or capability-block) gets a warm
claude session — not a cold start.
- capability_apply: rename _snapshot_transcript → snapshot_transcript
(public; reused below). No behavior change in the capability path.
- cli/start.py: capture bottle.exec_claude's exit code; while the
container is still alive (inside the launch context):
* always snapshot_transcript(identity)
* if exit_code != 0, mark_preserved(identity)
Then the existing _settle_state runs after teardown.
Now the preservation matrix is:
exit 0 (clean) → snapshot + cleanup state
exit ≠0 (crash, Ctrl-C) → snapshot + preserve + show resume hint
capability-block → (already snapshotted/preserved by apply
before teardown; this path is a no-op
because the container is already gone
by the time exec_claude returns)
snapshot_transcript is best-effort — capability-block's earlier
snapshot is not clobbered when the container is already torn down,
and a missing /home/node/.claude is a warn + skip.
Tested behavior: clean exit doesn't preserve, non-zero exit
(including SIGINT/130 and SIGKILL/137) preserves; empty identity
no-ops both helpers.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`cli.py cleanup` already enumerated orphan containers + networks
and asked for confirmation before nuking them. Per-bottle state
under ~/.claude-bottle/state/ wasn't touched — accumulated forever,
including orphans from old code paths.
Add state to the cleanup flow with its own prompt: the trade-off is
different from containers (which are pure debris) because a state
dir may carry a resumable bottle (capability-block rebuild +
transcript snapshot) the operator still wants.
Output shows the resumable / orphan / rebuilt-Dockerfile / transcript /
preserve-marker flags for each state dir so the operator sees what
they'd lose. Both sections are skippable independently — answering
"n" to containers doesn't skip the state prompt.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Previously every bottle launch left ~/.claude-bottle/state/<identity>/
behind forever — metadata.json on every run, plus per-bottle
Dockerfile + transcript snapshot on capability-block rebuilds. The
metadata accumulated debris across launches; the only state worth
keeping was the capability-block rebuild bundle.
Make cleanup the default; preserve only on capability-block.
- bottle_state.py: .preserve marker helpers (mark_preserved,
is_preserved, clear_preserve_marker, preserve_marker_path) +
cleanup_state(identity) that rm -rf's the per-bottle dir.
- capability_apply.apply_capability_change writes mark_preserved
before teardown so cli.py's session-end cleanup keeps the dir.
- prepare.py clears any leftover marker at launch (start or resume),
so a marker from a prior capability-block doesn't keep state
alive past a subsequent normal session-end.
- cli/start.py runs the cleanup decision AFTER the launch context
closes: if is_preserved → print resume hint; else cleanup_state.
The resume hint moves out of the launch with-block (was previously
printed unconditionally — would have misled the operator about
whether state was actually kept).
Future-proof: cli.py never persists state speculatively. If the
agent wants to be resumable, it has to go through capability-block.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces the cwd-hash identity with a random 5-char base36 suffix
per launch, so two simultaneous `start <agent>` invocations against
the same cwd no longer collide on container names. Each launch is
its own bottle.
State carries metadata: every prepare step writes
~/.claude-bottle/state/<identity>/metadata.json with the
(agent_name, cwd, copy_cwd, started_at) the bottle was launched
with. The new `cli.py resume <identity>` reads this metadata and
re-launches a bottle pinned to the same identity — picking up the
per-bottle Dockerfile (from a prior capability-block apply) and
the transcript snapshot under the same state dir.
- bottle_state.py: bottle_identity(agent_name) drops the cwd param
and gains a random suffix; BottleMetadata dataclass +
read/write/metadata_path helpers.
- BottleSpec gains an optional identity field — resume sets it to
pin the identity; start leaves it empty so prepare mints fresh.
- prepare.py: writes metadata at launch time; uses spec.identity if
provided (resume) else bottle_identity(agent_name) (fresh start).
- start.py: extracted _launch_bottle from cmd_start so resume can
share the launch core; prints `./cli.py resume <identity>` hint
at session end.
- cli/resume.py (new): reads metadata, reconstructs BottleSpec
with the recorded identity + cwd, delegates to _launch_bottle.
Errors clearly when no state exists for the given identity.
- cli/__init__.py: registers `resume` in COMMANDS + usage.
- dashboard.py: capability-block approval status line now appends
the `resume <identity>` hint so the operator can copy-paste the
rebuild command without leaving the TUI.
Closes the rebuild loop in PRD 0016: agent calls capability-block →
operator approves → bottle torn down with state preserved → status
line shows resume command → operator runs it → replacement bottle
boots with the new Dockerfile and prior transcript.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Phase 3 of PRD 0016. dashboard.approve() now dispatches to
apply_capability_change when the proposal is a capability-block:
cred-proxy-block → apply_routes_change
pipelock-block → apply_allowlist_change
capability-block → apply_capability_change (new in PRD 0016)
CapabilityApplyError joins the ApplyError tuple, so the TUI's key
handlers catch it the same way and surface failures in the status
line.
After a successful capability-block apply, dashboard archives the
proposal+response itself — the supervise sidecar was torn down by
apply_capability_change and can't archive its own queue file.
Without this, dashboard.discover_pending would keep surfacing the
resolved proposal forever.
No audit log for capability-block per PRD 0013 — its record lives
in the per-bottle Dockerfile state + transcript snapshot.
Tests stub apply_capability_change at the dashboard module level,
add TestCapabilityApplyWiring (call wiring, failure-keeps-pending,
no-audit invariant, archive-after-apply), and update TestApproveReject
to stub the capability path too so it stays docker-independent.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Phase 3 of PRD 0015. Adds the proactive `pipelock edit` path,
mirroring routes edit from PRD 0014:
- discover_pipelock_slugs() lists running pipelock sidecars.
- operator_edit_allowlist(slug, new) wraps apply_allowlist_change
and writes an audit entry tagged ACTION_OPERATOR_EDIT.
- New 'p' keybinding in the main TUI: discover slugs, prompt if
multiple, fetch current allowlist, open in $EDITOR, apply on
save.
- Extracts shared scaffolding into _operator_edit_flow used by
both routes-edit and pipelock-edit — DRY without sacrificing
the per-verb status-line copy.
- Footer updated.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Phase 2 of PRD 0015. dashboard.approve() now dispatches on the
proposal's tool:
cred-proxy-block → apply_routes_change (from PRD 0014)
pipelock-block → apply_allowlist_change (new in PRD 0015)
capability-block → no-op (lands in PRD 0016)
PipelockApplyError joins CredProxyApplyError under the ApplyError
tuple the TUI catches: failures keep the proposal pending and the
status line surfaces the message; no response is written and no
audit entry is appended.
Tests: existing TestApproveReject stubs both apply paths; new
TestPipelockApplyWiring covers the call wiring, failure-propagation,
and real-diff-in-audit invariants.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>