didericis/bot-bottle

Fork 0

Files

T

didericis ec20293c0a

test / unit (pull_request) Successful in 18s

Details

test / integration (pull_request) Successful in 1m7s

Details

docs(prd-0020): start + attach to agents from the dashboard

Draft a PRD that turns the dashboard into the operator's single
surface — collapses today's two-terminal workflow (one for
`./cli.py start`, one for `./cli.py dashboard`) into a single
dashboard invocation that can spin up new agents, re-attach to
ones it already spun up, and explicitly stop them.

Picks the "handoff" mechanism from `docs/research/claude-code-
pane-in-dashboard.md` (curses.endwin → docker exec -it claude
→ stdscr.refresh) and crucially decouples the bottle's lifetime
from any single claude session: exit claude → back to dashboard
with the bottle still running; quit dashboard → tear down every
bottle the dashboard owns.

Sized into 5 chunks (refactor → picker + new-agent → re-attach
→ explicit stop → quit-cleanup). Seven open questions called
out, the biggest being modal-vs-drop-and-resume for the
preflight Y/N inside curses.

2026-05-26 02:59:42 -04:00

14 KiB

Raw Blame History

PRD 0020: Start and attach to agents from inside the dashboard

Status: Draft
Author: didericis
Created: 2026-05-26

Summary

Today the dashboard is read-only: it surfaces pending proposals and active agents (PRD 0019) but can't start an agent or re-enter one. The operator's path is split — they launch agents from one terminal (./cli.py start <name>), and watch them from another (./cli.py dashboard).

This PRD collapses that split. The dashboard becomes the operator's single surface: pressing a key opens an agent picker, selecting one runs the existing prepare → preflight → launch flow inside a curses-friendly variant, and on yield drops to a full-screen docker exec -it … claude session (the "handoff" shape from docs/research/claude-code-pane-in-dashboard.md). When the operator exits claude, the dashboard re-renders with the now-running bottle visible in the agents pane.

Crucially, the bottle's lifetime is owned by the dashboard process, not by the individual claude session. Exit claude → back to dashboard, bottle still running. Start another agent → two bottles up at once. Quit the dashboard → all dashboard- launched bottles tear down.

Problem

Two real frictions today:

Two terminals for one workflow. The dashboard is the right shape to watch agents — proposals queue, status updates, operator-edit verbs — but it's the wrong shape to start them. Today you open a second terminal for that. In parallel use (3–5 bottles), the operator has 5+ terminals open and the dashboard's "active agents" pane is hopelessly behind reality because they just spawned three in a row.
./cli.py start ties the bottle to a single claude session. The start command's ExitStack brings the bottle up, runs claude, and tears down on Ctrl-D — fine for a one- shot session, wrong for "let me bounce in and out of this bottle a few times while triaging proposals." Today the only way to re-enter a bottle after exiting claude is to start a fresh one and lose all in-bottle state.

The dashboard already discovers active bottles, scopes operator-edit verbs to a selected agent (PRD 0019), and captures full-merged logs per bottle (PRD 0018). It already wants to be the primary surface. This PRD finishes that.

Goals / Success Criteria

From inside ./cli.py dashboard, pressing n (new) opens an agent picker listing every agent defined in the manifest. Selecting one runs prepare → preflight → launch.
The preflight Y/N summary renders cleanly — either as a curses modal or via curses.endwin() → text-mode prompt → restore, matching the existing editor-flow pattern.
On launch success, the dashboard performs a handoff (option 1 from the research doc): curses.endwin() → docker exec -it claude-bottle-<slug> claude --dangerously-skip-permissions → on exit, stdscr.refresh() and re-render with the new bottle in the agents pane.
The bottle's lifetime is owned by the dashboard process, NOT by any single claude session. Exiting claude (Ctrl-D, /exit) returns to the dashboard with the bottle still running. The operator can start more agents and re-enter previous ones.
Pressing Enter on a selected row in the agents pane re- attaches to that agent's bottle via the same handoff — drops to full-screen claude, returns on exit.
Pressing x (or similar — keybinding decided in design) on a selected agent stops just that bottle (compose down + state cleanup) without quitting the dashboard.
Quitting the dashboard (q) tears down every bottle the dashboard started, unless something has explicitly preserved the state (capability-block, crash). Matches today's start.py teardown semantics.

Non-goals

A pane that hosts the claude TUI alongside proposals. The embedded-emulator option from the research doc is out of scope. The handoff (option 1) is the v1; option 2 is a separate PRD if and when handoff is observably insufficient.
Adopting bottles started by an out-of-dashboard ./cli.py start invocation. Those have their own ExitStack-owner and the dashboard treats them as read-only-watch (already does today). Re-attach only applies to bottles the current dashboard process started.
Persisting a "bottle pool" across dashboard runs. When the dashboard quits, its bottles go. Resume across dashboard invocations is ./cli.py resume <identity>, which is unchanged.
Multi-window UI. Single curses window, two existing panes (proposals + agents); the agent picker is a modal, not a third pane.
Removing ./cli.py start. Stays as the script-friendly / legacy entry point. The dashboard is the new default.

Scope

In scope

Manifest-driven agent picker (curses modal): list view with j/k navigation + Enter to confirm, Esc to abort.
Preflight rendering inside the dashboard's curses surface (modal or drop-and-resume — picked in design).
A new _dashboard_start_flow that wraps prepare + preflight
- launch and returns a DockerBottle handle the dashboard retains alongside its pending and agents lists.
A bottles: dict[slug, DockerBottle] map on the main loop that owns every dashboard-launched handle. ExitStack tears them all down on dashboard exit.
Enter on an agents-pane row → re-attach handoff (docker exec -it claude into the existing container).
x (or similar) on an agents-pane row → explicit per-bottle stop without quitting.
q (existing quit key) → tear down all dashboard-launched bottles before returning.

Out of scope

Changes to ./cli.py start itself. It keeps its current shape; the dashboard reuses its internal pieces (backend. prepare / backend.launch) without reaching through the CLI layer.
Changes to backend.launch's context-manager contract; the dashboard's bottle map just holds the context-manager-yielded Bottle and calls __exit__ on quit / explicit stop.
New manifest fields. The picker reads what's already there.
Adopting non-dashboard bottles into the dashboard's owned set.

Proposed design

Bottle ownership

Today's flow:

./cli.py start agent
  └─ with backend.launch(plan) as bottle:        ← bottle alive while inside `with`
       bottle.exec_claude([...], tty=True)       ← blocks until claude exits
     # context exits → compose down → state cleanup

The proposed dashboard-owned flow:

./cli.py dashboard
  └─ stack = ExitStack()
     bottles: dict[str, DockerBottle] = {}

     # operator presses `n`, picks agent
     ctx = backend.launch(plan)
     bottle = stack.enter_context(ctx)           ← bottle stays alive
     bottles[plan.slug] = bottle

     # operator interacts via:
     curses.endwin()
     bottle.exec_claude([...], tty=True)         ← blocks; returns on Ctrl-D
     stdscr.refresh()
     # bottle is STILL ALIVE here — only the claude process exited

     # ... operator does other things, eventually `q`:
     stack.close()                               ← tears down every bottle

The shift is one line of code semantically but the change in operator experience is real: bottles outlive any single claude session.

Agent picker

Pressing n opens a centered modal listing every agent name from spec.manifest.agents. j/k navigates; Enter selects; Esc aborts. Width is the longest name + bottle name + a column for "already running?" so the operator can see at a glance whether picking an agent starts a fresh one (different slug suffix) or not.

┌─ start agent ───────────────────────────┐
│   implementer       dev      (running)  │
│ > researcher        dev                 │
│   triage-bot        sandbox             │
└─ Enter: start  Esc: cancel ─────────────┘

Starting an agent that already has a running bottle is allowed — each start mints a fresh slug — but the picker surfaces the already-running state so the operator doesn't accidentally double-launch.

Preflight Y/N

Two viable shapes:

Modal — render the preflight summary lines (agent / env / skills / bottle / git gate / egress) in a centered curses modal with [y/N] at the bottom. Capture the next keypress.

Drop-and-resume — curses.endwin(), print the preflight to stderr, read y/N from stdin, restore curses. Matches the editor-flow + handoff pattern; lower implementation cost.

Lean toward modal for the y/N because it doesn't flash the terminal between dashboard frames. Drop-and-resume is acceptable if modal proves fiddly.

Re-attach (Enter on agent)

Same handoff pattern the new-agent flow uses. The dashboard already holds the DockerBottle for any slug it started — bottle.exec_claude([...], tty=True) does the right docker exec -it claude … and returns on session exit. Re-attach is "already-running" + the same exec call; the agent picker isn't involved.

For agents the dashboard didn't start (read-only watch), Enter is a no-op with a status hint ("dashboard didn't start this bottle; resume with ./cli.py resume <identity> outside the dashboard"). PRD-0019's selection model already differentiates focus; this layer just gates the action.

Explicit per-bottle stop

x on a selected dashboard-owned agent invokes stack.pop_callback-style targeted teardown: take that bottle out of the map, call its close() to tear down compose + state, update the agents pane on the next refresh. Bottles the dashboard didn't start (x on a read-only-watch row) → no-op with a status hint.

Dashboard quit

q (existing) calls stack.close() before exit; every dashboard-launched bottle goes through its normal teardown (compose down + state settle). Preserve markers (capability- block, crash) still keep state across teardown. The dashboard process itself returns 0.

If the operator wants to keep bottles alive past dashboard exit, the existing path is unchanged: launch them via ./cli.py start in a separate terminal. That ownership stays out-of-band.

Implementation chunks

Sized for one PR each.

Refactor _launch_bottle so the launch + exec_claude pieces are separable. Today's cli/start.py runs both inside one function. Extract prepare_with_preflight(spec, *, render_preflight, prompt_yes) and attach_claude(bottle, *, remote_control). The CLI's existing one-shot use binds them as before; the dashboard binds them with curses-aware render + prompt callables. No behavior change.
Agent picker modal + new-agent flow. New key n opens the picker; prepare_with_preflight runs against the selected agent; on Y, backend.launch(plan) enters the dashboard's ExitStack; handoff invokes attach_claude.
Re-attach via Enter on owned agents-pane row. Looks up the slug in the dashboard's bottles map; if present → handoff; else → status-line hint pointing at ./cli.py resume.
Explicit per-bottle stop (x keybinding). Pop the bottle's close callback off the stack, call it, refresh.
Quit-cleanup (q). Hook stack.close() into the normal return path. Document the "exiting dashboard tears down every bottle it started" contract in dashboard.py's module docstring.

Open questions

Modal vs. drop-and-resume for preflight Y/N. Both work; modal is nicer if the curses geometry handling is straightforward. Pick during chunk 2 by prototyping the modal in ~30 lines and seeing if it looks right.
Agent picker: text-filter typing? v1 is j/k navigation only. If the manifest has 20+ agents the picker gets noisy; add fzf-style filter input later if needed.
What happens if attach_claude exits because the container died (not a clean claude exit — e.g., OOM, panic)? Today's _settle_state marks the bottle preserved for non-zero exit codes. The dashboard's re-render needs to notice the bottle is gone (compose down or container-not- running state) and surface a status line. Probably: transcript snapshot + mark preserved + remove from bottles map + status line "claude session for [slug] ended with exit N; preserved for resume".
Double-start of the same agent. Allowed by design — slugs are unique per launch — but the picker should make it clear this is a "start a SECOND bottle" decision, not a "re-enter the first." Probably handled by showing the running-count in the picker row.
Should q confirm before tearing down N running bottles? A 5-bottle dashboard with 5 in-flight sessions loses non-trivial state on accidental q. Probably yes: curses modal "quit and tear down N bottles? [y/N]". Skip confirmation when there are zero owned bottles.
Race between handoff and 1s refresh tick. While the dashboard's stdscr.timeout is set, a key press fires the handoff and the dashboard sits in docker exec for minutes. discover_active_agents / discover_pending don't poll during that window, which is fine — the moment we stdscr.refresh() after exec returns, the next loop iter runs discovery and the panes reflect reality. Worth calling out in the design but no special handling needed.
Multi-bottle resource use. Five bottles up means five compose projects: 5×(agent + pipelock + egress optional + git-gate optional + supervise optional) containers, plus 5×2 networks. On a 16-GiB host this is fine; on something smaller the operator might want a soft cap or a warning. Out of v1; flag for follow-up if it bites.

References

PRD 0018 — compose-per-instance lifecycle (the backend. launch context-manager contract this PRD layers against)
PRD 0019 — active-agents pane + selection model (the agents-pane row the re-attach + stop verbs hook into)
docs/research/claude-code-pane-in-dashboard.md — option 1 (handoff) is what attach_claude implements here; options 2 / 3 are out of scope for this PRD
claude_bottle/cli/start.py:_launch_bottle — the function chunk 1 extracts the prepare + attach pieces out of

14 KiB Raw Blame History Unescape Escape