Files
bot-bottle/docs/prds/0020-start-and-attach-from-dashboard.md
T
didericis ec20293c0a
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m7s
docs(prd-0020): start + attach to agents from the dashboard
Draft a PRD that turns the dashboard into the operator's single
surface — collapses today's two-terminal workflow (one for
`./cli.py start`, one for `./cli.py dashboard`) into a single
dashboard invocation that can spin up new agents, re-attach to
ones it already spun up, and explicitly stop them.

Picks the "handoff" mechanism from `docs/research/claude-code-
pane-in-dashboard.md` (curses.endwin → docker exec -it claude
→ stdscr.refresh) and crucially decouples the bottle's lifetime
from any single claude session: exit claude → back to dashboard
with the bottle still running; quit dashboard → tear down every
bottle the dashboard owns.

Sized into 5 chunks (refactor → picker + new-agent → re-attach
→ explicit stop → quit-cleanup). Seven open questions called
out, the biggest being modal-vs-drop-and-resume for the
preflight Y/N inside curses.
2026-05-26 02:59:42 -04:00

14 KiB
Raw Blame History

PRD 0020: Start and attach to agents from inside the dashboard

  • Status: Draft
  • Author: didericis
  • Created: 2026-05-26

Summary

Today the dashboard is read-only: it surfaces pending proposals and active agents (PRD 0019) but can't start an agent or re-enter one. The operator's path is split — they launch agents from one terminal (./cli.py start <name>), and watch them from another (./cli.py dashboard).

This PRD collapses that split. The dashboard becomes the operator's single surface: pressing a key opens an agent picker, selecting one runs the existing prepare → preflight → launch flow inside a curses-friendly variant, and on yield drops to a full-screen docker exec -it … claude session (the "handoff" shape from docs/research/claude-code-pane-in-dashboard.md). When the operator exits claude, the dashboard re-renders with the now-running bottle visible in the agents pane.

Crucially, the bottle's lifetime is owned by the dashboard process, not by the individual claude session. Exit claude → back to dashboard, bottle still running. Start another agent → two bottles up at once. Quit the dashboard → all dashboard- launched bottles tear down.

Problem

Two real frictions today:

  1. Two terminals for one workflow. The dashboard is the right shape to watch agents — proposals queue, status updates, operator-edit verbs — but it's the wrong shape to start them. Today you open a second terminal for that. In parallel use (35 bottles), the operator has 5+ terminals open and the dashboard's "active agents" pane is hopelessly behind reality because they just spawned three in a row.

  2. ./cli.py start ties the bottle to a single claude session. The start command's ExitStack brings the bottle up, runs claude, and tears down on Ctrl-D — fine for a one- shot session, wrong for "let me bounce in and out of this bottle a few times while triaging proposals." Today the only way to re-enter a bottle after exiting claude is to start a fresh one and lose all in-bottle state.

The dashboard already discovers active bottles, scopes operator-edit verbs to a selected agent (PRD 0019), and captures full-merged logs per bottle (PRD 0018). It already wants to be the primary surface. This PRD finishes that.

Goals / Success Criteria

  1. From inside ./cli.py dashboard, pressing n (new) opens an agent picker listing every agent defined in the manifest. Selecting one runs prepare → preflight → launch.
  2. The preflight Y/N summary renders cleanly — either as a curses modal or via curses.endwin() → text-mode prompt → restore, matching the existing editor-flow pattern.
  3. On launch success, the dashboard performs a handoff (option 1 from the research doc): curses.endwin()docker exec -it claude-bottle-<slug> claude --dangerously-skip-permissions → on exit, stdscr.refresh() and re-render with the new bottle in the agents pane.
  4. The bottle's lifetime is owned by the dashboard process, NOT by any single claude session. Exiting claude (Ctrl-D, /exit) returns to the dashboard with the bottle still running. The operator can start more agents and re-enter previous ones.
  5. Pressing Enter on a selected row in the agents pane re- attaches to that agent's bottle via the same handoff — drops to full-screen claude, returns on exit.
  6. Pressing x (or similar — keybinding decided in design) on a selected agent stops just that bottle (compose down + state cleanup) without quitting the dashboard.
  7. Quitting the dashboard (q) tears down every bottle the dashboard started, unless something has explicitly preserved the state (capability-block, crash). Matches today's start.py teardown semantics.

Non-goals

  • A pane that hosts the claude TUI alongside proposals. The embedded-emulator option from the research doc is out of scope. The handoff (option 1) is the v1; option 2 is a separate PRD if and when handoff is observably insufficient.
  • Adopting bottles started by an out-of-dashboard ./cli.py start invocation. Those have their own ExitStack-owner and the dashboard treats them as read-only-watch (already does today). Re-attach only applies to bottles the current dashboard process started.
  • Persisting a "bottle pool" across dashboard runs. When the dashboard quits, its bottles go. Resume across dashboard invocations is ./cli.py resume <identity>, which is unchanged.
  • Multi-window UI. Single curses window, two existing panes (proposals + agents); the agent picker is a modal, not a third pane.
  • Removing ./cli.py start. Stays as the script-friendly / legacy entry point. The dashboard is the new default.

Scope

In scope

  • Manifest-driven agent picker (curses modal): list view with j/k navigation + Enter to confirm, Esc to abort.
  • Preflight rendering inside the dashboard's curses surface (modal or drop-and-resume — picked in design).
  • A new _dashboard_start_flow that wraps prepare + preflight
    • launch and returns a DockerBottle handle the dashboard retains alongside its pending and agents lists.
  • A bottles: dict[slug, DockerBottle] map on the main loop that owns every dashboard-launched handle. ExitStack tears them all down on dashboard exit.
  • Enter on an agents-pane row → re-attach handoff (docker exec -it claude into the existing container).
  • x (or similar) on an agents-pane row → explicit per-bottle stop without quitting.
  • q (existing quit key) → tear down all dashboard-launched bottles before returning.

Out of scope

  • Changes to ./cli.py start itself. It keeps its current shape; the dashboard reuses its internal pieces (backend. prepare / backend.launch) without reaching through the CLI layer.
  • Changes to backend.launch's context-manager contract; the dashboard's bottle map just holds the context-manager-yielded Bottle and calls __exit__ on quit / explicit stop.
  • New manifest fields. The picker reads what's already there.
  • Adopting non-dashboard bottles into the dashboard's owned set.

Proposed design

Bottle ownership

Today's flow:

./cli.py start agent
  └─ with backend.launch(plan) as bottle:        ← bottle alive while inside `with`
       bottle.exec_claude([...], tty=True)       ← blocks until claude exits
     # context exits → compose down → state cleanup

The proposed dashboard-owned flow:

./cli.py dashboard
  └─ stack = ExitStack()
     bottles: dict[str, DockerBottle] = {}

     # operator presses `n`, picks agent
     ctx = backend.launch(plan)
     bottle = stack.enter_context(ctx)           ← bottle stays alive
     bottles[plan.slug] = bottle

     # operator interacts via:
     curses.endwin()
     bottle.exec_claude([...], tty=True)         ← blocks; returns on Ctrl-D
     stdscr.refresh()
     # bottle is STILL ALIVE here — only the claude process exited

     # ... operator does other things, eventually `q`:
     stack.close()                               ← tears down every bottle

The shift is one line of code semantically but the change in operator experience is real: bottles outlive any single claude session.

Agent picker

Pressing n opens a centered modal listing every agent name from spec.manifest.agents. j/k navigates; Enter selects; Esc aborts. Width is the longest name + bottle name + a column for "already running?" so the operator can see at a glance whether picking an agent starts a fresh one (different slug suffix) or not.

┌─ start agent ───────────────────────────┐
│   implementer       dev      (running)  │
│ > researcher        dev                 │
│   triage-bot        sandbox             │
└─ Enter: start  Esc: cancel ─────────────┘

Starting an agent that already has a running bottle is allowed — each start mints a fresh slug — but the picker surfaces the already-running state so the operator doesn't accidentally double-launch.

Preflight Y/N

Two viable shapes:

Modal — render the preflight summary lines (agent / env / skills / bottle / git gate / egress) in a centered curses modal with [y/N] at the bottom. Capture the next keypress.

Drop-and-resumecurses.endwin(), print the preflight to stderr, read y/N from stdin, restore curses. Matches the editor-flow + handoff pattern; lower implementation cost.

Lean toward modal for the y/N because it doesn't flash the terminal between dashboard frames. Drop-and-resume is acceptable if modal proves fiddly.

Re-attach (Enter on agent)

Same handoff pattern the new-agent flow uses. The dashboard already holds the DockerBottle for any slug it started — bottle.exec_claude([...], tty=True) does the right docker exec -it claude … and returns on session exit. Re-attach is "already-running" + the same exec call; the agent picker isn't involved.

For agents the dashboard didn't start (read-only watch), Enter is a no-op with a status hint ("dashboard didn't start this bottle; resume with ./cli.py resume <identity> outside the dashboard"). PRD-0019's selection model already differentiates focus; this layer just gates the action.

Explicit per-bottle stop

x on a selected dashboard-owned agent invokes stack.pop_callback-style targeted teardown: take that bottle out of the map, call its close() to tear down compose + state, update the agents pane on the next refresh. Bottles the dashboard didn't start (x on a read-only-watch row) → no-op with a status hint.

Dashboard quit

q (existing) calls stack.close() before exit; every dashboard-launched bottle goes through its normal teardown (compose down + state settle). Preserve markers (capability- block, crash) still keep state across teardown. The dashboard process itself returns 0.

If the operator wants to keep bottles alive past dashboard exit, the existing path is unchanged: launch them via ./cli.py start in a separate terminal. That ownership stays out-of-band.

Implementation chunks

Sized for one PR each.

  1. Refactor _launch_bottle so the launch + exec_claude pieces are separable. Today's cli/start.py runs both inside one function. Extract prepare_with_preflight(spec, *, render_preflight, prompt_yes) and attach_claude(bottle, *, remote_control). The CLI's existing one-shot use binds them as before; the dashboard binds them with curses-aware render + prompt callables. No behavior change.
  2. Agent picker modal + new-agent flow. New key n opens the picker; prepare_with_preflight runs against the selected agent; on Y, backend.launch(plan) enters the dashboard's ExitStack; handoff invokes attach_claude.
  3. Re-attach via Enter on owned agents-pane row. Looks up the slug in the dashboard's bottles map; if present → handoff; else → status-line hint pointing at ./cli.py resume.
  4. Explicit per-bottle stop (x keybinding). Pop the bottle's close callback off the stack, call it, refresh.
  5. Quit-cleanup (q). Hook stack.close() into the normal return path. Document the "exiting dashboard tears down every bottle it started" contract in dashboard.py's module docstring.

Open questions

  1. Modal vs. drop-and-resume for preflight Y/N. Both work; modal is nicer if the curses geometry handling is straightforward. Pick during chunk 2 by prototyping the modal in ~30 lines and seeing if it looks right.

  2. Agent picker: text-filter typing? v1 is j/k navigation only. If the manifest has 20+ agents the picker gets noisy; add fzf-style filter input later if needed.

  3. What happens if attach_claude exits because the container died (not a clean claude exit — e.g., OOM, panic)? Today's _settle_state marks the bottle preserved for non-zero exit codes. The dashboard's re-render needs to notice the bottle is gone (compose down or container-not- running state) and surface a status line. Probably: transcript snapshot + mark preserved + remove from bottles map + status line "claude session for [slug] ended with exit N; preserved for resume".

  4. Double-start of the same agent. Allowed by design — slugs are unique per launch — but the picker should make it clear this is a "start a SECOND bottle" decision, not a "re-enter the first." Probably handled by showing the running-count in the picker row.

  5. Should q confirm before tearing down N running bottles? A 5-bottle dashboard with 5 in-flight sessions loses non-trivial state on accidental q. Probably yes: curses modal "quit and tear down N bottles? [y/N]". Skip confirmation when there are zero owned bottles.

  6. Race between handoff and 1s refresh tick. While the dashboard's stdscr.timeout is set, a key press fires the handoff and the dashboard sits in docker exec for minutes. discover_active_agents / discover_pending don't poll during that window, which is fine — the moment we stdscr.refresh() after exec returns, the next loop iter runs discovery and the panes reflect reality. Worth calling out in the design but no special handling needed.

  7. Multi-bottle resource use. Five bottles up means five compose projects: 5×(agent + pipelock + egress optional + git-gate optional + supervise optional) containers, plus 5×2 networks. On a 16-GiB host this is fine; on something smaller the operator might want a soft cap or a warning. Out of v1; flag for follow-up if it bites.

References

  • PRD 0018 — compose-per-instance lifecycle (the backend. launch context-manager contract this PRD layers against)
  • PRD 0019 — active-agents pane + selection model (the agents-pane row the re-attach + stop verbs hook into)
  • docs/research/claude-code-pane-in-dashboard.md — option 1 (handoff) is what attach_claude implements here; options 2 / 3 are out of scope for this PRD
  • claude_bottle/cli/start.py:_launch_bottle — the function chunk 1 extracts the prepare + attach pieces out of