diff --git a/docs/prds/0020-start-and-attach-from-dashboard.md b/docs/prds/0020-start-and-attach-from-dashboard.md new file mode 100644 index 0000000..251cde0 --- /dev/null +++ b/docs/prds/0020-start-and-attach-from-dashboard.md @@ -0,0 +1,336 @@ +# PRD 0020: Start and attach to agents from inside the dashboard + +- **Status:** Draft +- **Author:** didericis +- **Created:** 2026-05-26 + +## Summary + +Today the dashboard is read-only: it surfaces pending proposals +and active agents (PRD 0019) but can't *start* an agent or +*re-enter* one. The operator's path is split — they launch +agents from one terminal (`./cli.py start `), and watch +them from another (`./cli.py dashboard`). + +This PRD collapses that split. The dashboard becomes the +operator's single surface: pressing a key opens an agent picker, +selecting one runs the existing prepare → preflight → launch +flow inside a curses-friendly variant, and on yield drops to a +full-screen `docker exec -it … claude` session (the "handoff" +shape from `docs/research/claude-code-pane-in-dashboard.md`). +When the operator exits claude, the dashboard re-renders with +the now-running bottle visible in the agents pane. + +Crucially, the bottle's lifetime is owned by the *dashboard +process*, not by the individual claude session. Exit claude → +back to dashboard, bottle still running. Start another agent → +two bottles up at once. Quit the dashboard → all dashboard- +launched bottles tear down. + +## Problem + +Two real frictions today: + +1. **Two terminals for one workflow.** The dashboard is the + right shape to *watch* agents — proposals queue, status + updates, operator-edit verbs — but it's the wrong shape to + *start* them. Today you open a second terminal for that. In + parallel use (3–5 bottles), the operator has 5+ terminals + open and the dashboard's "active agents" pane is hopelessly + behind reality because they just spawned three in a row. + +2. **`./cli.py start` ties the bottle to a single claude + session.** The start command's `ExitStack` brings the bottle + up, runs claude, and tears down on Ctrl-D — fine for a one- + shot session, wrong for "let me bounce in and out of this + bottle a few times while triaging proposals." Today the only + way to re-enter a bottle after exiting claude is to start a + fresh one and lose all in-bottle state. + +The dashboard already discovers active bottles, scopes +operator-edit verbs to a selected agent (PRD 0019), and +captures full-merged logs per bottle (PRD 0018). It already +*wants* to be the primary surface. This PRD finishes that. + +## Goals / Success Criteria + +1. From inside `./cli.py dashboard`, pressing `n` (new) opens + an agent picker listing every agent defined in the manifest. + Selecting one runs `prepare → preflight → launch`. +2. The preflight Y/N summary renders cleanly — either as a + curses modal or via `curses.endwin() → text-mode prompt + → restore`, matching the existing editor-flow pattern. +3. On launch success, the dashboard performs a handoff (option + 1 from the research doc): `curses.endwin()` → `docker exec + -it claude-bottle- claude --dangerously-skip-permissions` + → on exit, `stdscr.refresh()` and re-render with the new + bottle in the agents pane. +4. The bottle's lifetime is owned by the dashboard process, NOT + by any single claude session. Exiting claude (Ctrl-D, `/exit`) + returns to the dashboard with the bottle still running. The + operator can start more agents and re-enter previous ones. +5. Pressing Enter on a selected row in the agents pane re- + attaches to that agent's bottle via the same handoff — drops + to full-screen claude, returns on exit. +6. Pressing `x` (or similar — keybinding decided in design) + on a selected agent stops just that bottle (compose down + + state cleanup) without quitting the dashboard. +7. Quitting the dashboard (`q`) tears down every bottle the + dashboard started, unless something has explicitly preserved + the state (capability-block, crash). Matches today's + start.py teardown semantics. + +## Non-goals + +- **A pane that hosts the claude TUI alongside proposals.** The + embedded-emulator option from the research doc is out of + scope. The handoff (option 1) is the v1; option 2 is a + separate PRD if and when handoff is observably insufficient. +- **Adopting bottles started by an out-of-dashboard `./cli.py + start` invocation.** Those have their own ExitStack-owner and + the dashboard treats them as read-only-watch (already does + today). Re-attach only applies to bottles the *current + dashboard process* started. +- **Persisting a "bottle pool" across dashboard runs.** When + the dashboard quits, its bottles go. Resume across dashboard + invocations is `./cli.py resume `, which is + unchanged. +- **Multi-window UI.** Single curses window, two existing + panes (proposals + agents); the agent picker is a modal, not + a third pane. +- **Removing `./cli.py start`.** Stays as the script-friendly / + legacy entry point. The dashboard is the new default. + +## Scope + +### In scope + +- Manifest-driven agent picker (curses modal): list view with + j/k navigation + Enter to confirm, Esc to abort. +- Preflight rendering inside the dashboard's curses surface + (modal or drop-and-resume — picked in design). +- A new `_dashboard_start_flow` that wraps prepare + preflight + + launch and returns a `DockerBottle` handle the dashboard + retains alongside its `pending` and `agents` lists. +- A `bottles: dict[slug, DockerBottle]` map on the main loop + that owns every dashboard-launched handle. ExitStack tears + them all down on dashboard exit. +- `Enter` on an agents-pane row → re-attach handoff (docker + exec -it claude into the existing container). +- `x` (or similar) on an agents-pane row → explicit per-bottle + stop without quitting. +- `q` (existing quit key) → tear down all dashboard-launched + bottles before returning. + +### Out of scope + +- Changes to `./cli.py start` itself. It keeps its current + shape; the dashboard reuses its internal pieces (backend. + prepare / backend.launch) without reaching through the CLI + layer. +- Changes to `backend.launch`'s context-manager contract; the + dashboard's bottle map just holds the context-manager-yielded + Bottle and calls `__exit__` on quit / explicit stop. +- New manifest fields. The picker reads what's already there. +- Adopting non-dashboard bottles into the dashboard's owned set. + +## Proposed design + +### Bottle ownership + +Today's flow: + +``` +./cli.py start agent + └─ with backend.launch(plan) as bottle: ← bottle alive while inside `with` + bottle.exec_claude([...], tty=True) ← blocks until claude exits + # context exits → compose down → state cleanup +``` + +The proposed dashboard-owned flow: + +``` +./cli.py dashboard + └─ stack = ExitStack() + bottles: dict[str, DockerBottle] = {} + + # operator presses `n`, picks agent + ctx = backend.launch(plan) + bottle = stack.enter_context(ctx) ← bottle stays alive + bottles[plan.slug] = bottle + + # operator interacts via: + curses.endwin() + bottle.exec_claude([...], tty=True) ← blocks; returns on Ctrl-D + stdscr.refresh() + # bottle is STILL ALIVE here — only the claude process exited + + # ... operator does other things, eventually `q`: + stack.close() ← tears down every bottle +``` + +The shift is one line of code semantically but the change in +operator experience is real: bottles outlive any single claude +session. + +### Agent picker + +Pressing `n` opens a centered modal listing every agent name +from `spec.manifest.agents`. j/k navigates; Enter selects; Esc +aborts. Width is the longest name + bottle name + a column for +"already running?" so the operator can see at a glance whether +picking an agent starts a fresh one (different slug suffix) or +not. + +``` +┌─ start agent ───────────────────────────┐ +│ implementer dev (running) │ +│ > researcher dev │ +│ triage-bot sandbox │ +└─ Enter: start Esc: cancel ─────────────┘ +``` + +Starting an agent that already has a running bottle is allowed +— each `start` mints a fresh slug — but the picker surfaces the +already-running state so the operator doesn't accidentally +double-launch. + +### Preflight Y/N + +Two viable shapes: + +**Modal** — render the preflight summary lines (`agent / env / +skills / bottle / git gate / egress`) in a centered curses +modal with `[y/N]` at the bottom. Capture the next keypress. + +**Drop-and-resume** — `curses.endwin()`, print the preflight to +stderr, read y/N from stdin, restore curses. Matches the +editor-flow + handoff pattern; lower implementation cost. + +Lean toward **modal** for the y/N because it doesn't flash the +terminal between dashboard frames. Drop-and-resume is acceptable +if modal proves fiddly. + +### Re-attach (Enter on agent) + +Same handoff pattern the new-agent flow uses. The dashboard +already holds the `DockerBottle` for any slug it started — +`bottle.exec_claude([...], tty=True)` does the right `docker +exec -it claude …` and returns on session exit. Re-attach is +"already-running" + the same exec call; the agent picker isn't +involved. + +For agents the dashboard didn't start (read-only watch), Enter +is a no-op with a status hint ("dashboard didn't start this +bottle; resume with `./cli.py resume ` outside the +dashboard"). PRD-0019's selection model already differentiates +focus; this layer just gates the action. + +### Explicit per-bottle stop + +`x` on a selected dashboard-owned agent invokes +`stack.pop_callback`-style targeted teardown: take that bottle +out of the map, call its `close()` to tear down compose + state, +update the agents pane on the next refresh. Bottles the +dashboard didn't start (`x` on a read-only-watch row) → no-op +with a status hint. + +### Dashboard quit + +`q` (existing) calls `stack.close()` before exit; every +dashboard-launched bottle goes through its normal teardown +(`compose down` + state settle). Preserve markers (capability- +block, crash) still keep state across teardown. The dashboard +process itself returns 0. + +If the operator wants to keep bottles alive past dashboard +exit, the existing path is unchanged: launch them via +`./cli.py start` in a separate terminal. That ownership stays +out-of-band. + +## Implementation chunks + +Sized for one PR each. + +1. **Refactor `_launch_bottle` so the launch + exec_claude + pieces are separable.** Today's `cli/start.py` runs both + inside one function. Extract `prepare_with_preflight(spec, + *, render_preflight, prompt_yes)` and `attach_claude(bottle, + *, remote_control)`. The CLI's existing one-shot use binds + them as before; the dashboard binds them with curses-aware + render + prompt callables. No behavior change. +2. **Agent picker modal + new-agent flow.** New key `n` opens + the picker; `prepare_with_preflight` runs against the + selected agent; on Y, `backend.launch(plan)` enters the + dashboard's ExitStack; handoff invokes `attach_claude`. +3. **Re-attach via Enter on owned agents-pane row.** Looks up + the slug in the dashboard's `bottles` map; if present → + handoff; else → status-line hint pointing at `./cli.py + resume`. +4. **Explicit per-bottle stop (`x` keybinding).** Pop the + bottle's `close` callback off the stack, call it, refresh. +5. **Quit-cleanup (`q`).** Hook `stack.close()` into the + normal return path. Document the "exiting dashboard tears + down every bottle it started" contract in `dashboard.py`'s + module docstring. + +## Open questions + +1. **Modal vs. drop-and-resume for preflight Y/N.** Both work; + modal is nicer if the curses geometry handling is + straightforward. Pick during chunk 2 by prototyping the + modal in ~30 lines and seeing if it looks right. + +2. **Agent picker: text-filter typing?** v1 is j/k navigation + only. If the manifest has 20+ agents the picker gets noisy; + add fzf-style filter input later if needed. + +3. **What happens if `attach_claude` exits because the + container died** (not a clean claude exit — e.g., OOM, + panic)? Today's `_settle_state` marks the bottle preserved + for non-zero exit codes. The dashboard's re-render needs to + notice the bottle is gone (compose down or container-not- + running state) and surface a status line. Probably: + transcript snapshot + mark preserved + remove from + `bottles` map + status line "claude session for [slug] + ended with exit N; preserved for resume". + +4. **Double-start of the same agent.** Allowed by design — slugs + are unique per launch — but the picker should make it clear + this is a "start a SECOND bottle" decision, not a "re-enter + the first." Probably handled by showing the running-count in + the picker row. + +5. **Should `q` confirm before tearing down N running + bottles?** A 5-bottle dashboard with 5 in-flight sessions + loses non-trivial state on accidental `q`. Probably yes: + curses modal "quit and tear down N bottles? [y/N]". Skip + confirmation when there are zero owned bottles. + +6. **Race between handoff and 1s refresh tick.** While the + dashboard's `stdscr.timeout` is set, a key press fires the + handoff and the dashboard sits in `docker exec` for minutes. + `discover_active_agents` / `discover_pending` don't poll + during that window, which is fine — the moment we + `stdscr.refresh()` after exec returns, the next loop iter + runs discovery and the panes reflect reality. Worth calling + out in the design but no special handling needed. + +7. **Multi-bottle resource use.** Five bottles up means five + compose projects: 5×(agent + pipelock + egress optional + + git-gate optional + supervise optional) containers, plus 5×2 + networks. On a 16-GiB host this is fine; on something + smaller the operator might want a soft cap or a warning. + Out of v1; flag for follow-up if it bites. + +## References + +- PRD 0018 — compose-per-instance lifecycle (the `backend. + launch` context-manager contract this PRD layers against) +- PRD 0019 — active-agents pane + selection model (the + agents-pane row the re-attach + stop verbs hook into) +- `docs/research/claude-code-pane-in-dashboard.md` — option 1 + (handoff) is what `attach_claude` implements here; options 2 + / 3 are out of scope for this PRD +- `claude_bottle/cli/start.py:_launch_bottle` — the function + chunk 1 extracts the prepare + attach pieces out of