Files
bot-bottle/docs/prds/0020-start-and-attach-from-dashboard.md
T
didericis ec20293c0a
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m7s
docs(prd-0020): start + attach to agents from the dashboard
Draft a PRD that turns the dashboard into the operator's single
surface — collapses today's two-terminal workflow (one for
`./cli.py start`, one for `./cli.py dashboard`) into a single
dashboard invocation that can spin up new agents, re-attach to
ones it already spun up, and explicitly stop them.

Picks the "handoff" mechanism from `docs/research/claude-code-
pane-in-dashboard.md` (curses.endwin → docker exec -it claude
→ stdscr.refresh) and crucially decouples the bottle's lifetime
from any single claude session: exit claude → back to dashboard
with the bottle still running; quit dashboard → tear down every
bottle the dashboard owns.

Sized into 5 chunks (refactor → picker + new-agent → re-attach
→ explicit stop → quit-cleanup). Seven open questions called
out, the biggest being modal-vs-drop-and-resume for the
preflight Y/N inside curses.
2026-05-26 02:59:42 -04:00

337 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# PRD 0020: Start and attach to agents from inside the dashboard
- **Status:** Draft
- **Author:** didericis
- **Created:** 2026-05-26
## Summary
Today the dashboard is read-only: it surfaces pending proposals
and active agents (PRD 0019) but can't *start* an agent or
*re-enter* one. The operator's path is split — they launch
agents from one terminal (`./cli.py start <name>`), and watch
them from another (`./cli.py dashboard`).
This PRD collapses that split. The dashboard becomes the
operator's single surface: pressing a key opens an agent picker,
selecting one runs the existing prepare → preflight → launch
flow inside a curses-friendly variant, and on yield drops to a
full-screen `docker exec -it … claude` session (the "handoff"
shape from `docs/research/claude-code-pane-in-dashboard.md`).
When the operator exits claude, the dashboard re-renders with
the now-running bottle visible in the agents pane.
Crucially, the bottle's lifetime is owned by the *dashboard
process*, not by the individual claude session. Exit claude →
back to dashboard, bottle still running. Start another agent →
two bottles up at once. Quit the dashboard → all dashboard-
launched bottles tear down.
## Problem
Two real frictions today:
1. **Two terminals for one workflow.** The dashboard is the
right shape to *watch* agents — proposals queue, status
updates, operator-edit verbs — but it's the wrong shape to
*start* them. Today you open a second terminal for that. In
parallel use (35 bottles), the operator has 5+ terminals
open and the dashboard's "active agents" pane is hopelessly
behind reality because they just spawned three in a row.
2. **`./cli.py start` ties the bottle to a single claude
session.** The start command's `ExitStack` brings the bottle
up, runs claude, and tears down on Ctrl-D — fine for a one-
shot session, wrong for "let me bounce in and out of this
bottle a few times while triaging proposals." Today the only
way to re-enter a bottle after exiting claude is to start a
fresh one and lose all in-bottle state.
The dashboard already discovers active bottles, scopes
operator-edit verbs to a selected agent (PRD 0019), and
captures full-merged logs per bottle (PRD 0018). It already
*wants* to be the primary surface. This PRD finishes that.
## Goals / Success Criteria
1. From inside `./cli.py dashboard`, pressing `n` (new) opens
an agent picker listing every agent defined in the manifest.
Selecting one runs `prepare → preflight → launch`.
2. The preflight Y/N summary renders cleanly — either as a
curses modal or via `curses.endwin() → text-mode prompt
→ restore`, matching the existing editor-flow pattern.
3. On launch success, the dashboard performs a handoff (option
1 from the research doc): `curses.endwin()``docker exec
-it claude-bottle-<slug> claude --dangerously-skip-permissions`
→ on exit, `stdscr.refresh()` and re-render with the new
bottle in the agents pane.
4. The bottle's lifetime is owned by the dashboard process, NOT
by any single claude session. Exiting claude (Ctrl-D, `/exit`)
returns to the dashboard with the bottle still running. The
operator can start more agents and re-enter previous ones.
5. Pressing Enter on a selected row in the agents pane re-
attaches to that agent's bottle via the same handoff — drops
to full-screen claude, returns on exit.
6. Pressing `x` (or similar — keybinding decided in design)
on a selected agent stops just that bottle (compose down +
state cleanup) without quitting the dashboard.
7. Quitting the dashboard (`q`) tears down every bottle the
dashboard started, unless something has explicitly preserved
the state (capability-block, crash). Matches today's
start.py teardown semantics.
## Non-goals
- **A pane that hosts the claude TUI alongside proposals.** The
embedded-emulator option from the research doc is out of
scope. The handoff (option 1) is the v1; option 2 is a
separate PRD if and when handoff is observably insufficient.
- **Adopting bottles started by an out-of-dashboard `./cli.py
start` invocation.** Those have their own ExitStack-owner and
the dashboard treats them as read-only-watch (already does
today). Re-attach only applies to bottles the *current
dashboard process* started.
- **Persisting a "bottle pool" across dashboard runs.** When
the dashboard quits, its bottles go. Resume across dashboard
invocations is `./cli.py resume <identity>`, which is
unchanged.
- **Multi-window UI.** Single curses window, two existing
panes (proposals + agents); the agent picker is a modal, not
a third pane.
- **Removing `./cli.py start`.** Stays as the script-friendly /
legacy entry point. The dashboard is the new default.
## Scope
### In scope
- Manifest-driven agent picker (curses modal): list view with
j/k navigation + Enter to confirm, Esc to abort.
- Preflight rendering inside the dashboard's curses surface
(modal or drop-and-resume — picked in design).
- A new `_dashboard_start_flow` that wraps prepare + preflight
+ launch and returns a `DockerBottle` handle the dashboard
retains alongside its `pending` and `agents` lists.
- A `bottles: dict[slug, DockerBottle]` map on the main loop
that owns every dashboard-launched handle. ExitStack tears
them all down on dashboard exit.
- `Enter` on an agents-pane row → re-attach handoff (docker
exec -it claude into the existing container).
- `x` (or similar) on an agents-pane row → explicit per-bottle
stop without quitting.
- `q` (existing quit key) → tear down all dashboard-launched
bottles before returning.
### Out of scope
- Changes to `./cli.py start` itself. It keeps its current
shape; the dashboard reuses its internal pieces (backend.
prepare / backend.launch) without reaching through the CLI
layer.
- Changes to `backend.launch`'s context-manager contract; the
dashboard's bottle map just holds the context-manager-yielded
Bottle and calls `__exit__` on quit / explicit stop.
- New manifest fields. The picker reads what's already there.
- Adopting non-dashboard bottles into the dashboard's owned set.
## Proposed design
### Bottle ownership
Today's flow:
```
./cli.py start agent
└─ with backend.launch(plan) as bottle: ← bottle alive while inside `with`
bottle.exec_claude([...], tty=True) ← blocks until claude exits
# context exits → compose down → state cleanup
```
The proposed dashboard-owned flow:
```
./cli.py dashboard
└─ stack = ExitStack()
bottles: dict[str, DockerBottle] = {}
# operator presses `n`, picks agent
ctx = backend.launch(plan)
bottle = stack.enter_context(ctx) ← bottle stays alive
bottles[plan.slug] = bottle
# operator interacts via:
curses.endwin()
bottle.exec_claude([...], tty=True) ← blocks; returns on Ctrl-D
stdscr.refresh()
# bottle is STILL ALIVE here — only the claude process exited
# ... operator does other things, eventually `q`:
stack.close() ← tears down every bottle
```
The shift is one line of code semantically but the change in
operator experience is real: bottles outlive any single claude
session.
### Agent picker
Pressing `n` opens a centered modal listing every agent name
from `spec.manifest.agents`. j/k navigates; Enter selects; Esc
aborts. Width is the longest name + bottle name + a column for
"already running?" so the operator can see at a glance whether
picking an agent starts a fresh one (different slug suffix) or
not.
```
┌─ start agent ───────────────────────────┐
│ implementer dev (running) │
│ > researcher dev │
│ triage-bot sandbox │
└─ Enter: start Esc: cancel ─────────────┘
```
Starting an agent that already has a running bottle is allowed
— each `start` mints a fresh slug — but the picker surfaces the
already-running state so the operator doesn't accidentally
double-launch.
### Preflight Y/N
Two viable shapes:
**Modal** — render the preflight summary lines (`agent / env /
skills / bottle / git gate / egress`) in a centered curses
modal with `[y/N]` at the bottom. Capture the next keypress.
**Drop-and-resume** — `curses.endwin()`, print the preflight to
stderr, read y/N from stdin, restore curses. Matches the
editor-flow + handoff pattern; lower implementation cost.
Lean toward **modal** for the y/N because it doesn't flash the
terminal between dashboard frames. Drop-and-resume is acceptable
if modal proves fiddly.
### Re-attach (Enter on agent)
Same handoff pattern the new-agent flow uses. The dashboard
already holds the `DockerBottle` for any slug it started —
`bottle.exec_claude([...], tty=True)` does the right `docker
exec -it claude …` and returns on session exit. Re-attach is
"already-running" + the same exec call; the agent picker isn't
involved.
For agents the dashboard didn't start (read-only watch), Enter
is a no-op with a status hint ("dashboard didn't start this
bottle; resume with `./cli.py resume <identity>` outside the
dashboard"). PRD-0019's selection model already differentiates
focus; this layer just gates the action.
### Explicit per-bottle stop
`x` on a selected dashboard-owned agent invokes
`stack.pop_callback`-style targeted teardown: take that bottle
out of the map, call its `close()` to tear down compose + state,
update the agents pane on the next refresh. Bottles the
dashboard didn't start (`x` on a read-only-watch row) → no-op
with a status hint.
### Dashboard quit
`q` (existing) calls `stack.close()` before exit; every
dashboard-launched bottle goes through its normal teardown
(`compose down` + state settle). Preserve markers (capability-
block, crash) still keep state across teardown. The dashboard
process itself returns 0.
If the operator wants to keep bottles alive past dashboard
exit, the existing path is unchanged: launch them via
`./cli.py start` in a separate terminal. That ownership stays
out-of-band.
## Implementation chunks
Sized for one PR each.
1. **Refactor `_launch_bottle` so the launch + exec_claude
pieces are separable.** Today's `cli/start.py` runs both
inside one function. Extract `prepare_with_preflight(spec,
*, render_preflight, prompt_yes)` and `attach_claude(bottle,
*, remote_control)`. The CLI's existing one-shot use binds
them as before; the dashboard binds them with curses-aware
render + prompt callables. No behavior change.
2. **Agent picker modal + new-agent flow.** New key `n` opens
the picker; `prepare_with_preflight` runs against the
selected agent; on Y, `backend.launch(plan)` enters the
dashboard's ExitStack; handoff invokes `attach_claude`.
3. **Re-attach via Enter on owned agents-pane row.** Looks up
the slug in the dashboard's `bottles` map; if present →
handoff; else → status-line hint pointing at `./cli.py
resume`.
4. **Explicit per-bottle stop (`x` keybinding).** Pop the
bottle's `close` callback off the stack, call it, refresh.
5. **Quit-cleanup (`q`).** Hook `stack.close()` into the
normal return path. Document the "exiting dashboard tears
down every bottle it started" contract in `dashboard.py`'s
module docstring.
## Open questions
1. **Modal vs. drop-and-resume for preflight Y/N.** Both work;
modal is nicer if the curses geometry handling is
straightforward. Pick during chunk 2 by prototyping the
modal in ~30 lines and seeing if it looks right.
2. **Agent picker: text-filter typing?** v1 is j/k navigation
only. If the manifest has 20+ agents the picker gets noisy;
add fzf-style filter input later if needed.
3. **What happens if `attach_claude` exits because the
container died** (not a clean claude exit — e.g., OOM,
panic)? Today's `_settle_state` marks the bottle preserved
for non-zero exit codes. The dashboard's re-render needs to
notice the bottle is gone (compose down or container-not-
running state) and surface a status line. Probably:
transcript snapshot + mark preserved + remove from
`bottles` map + status line "claude session for [slug]
ended with exit N; preserved for resume".
4. **Double-start of the same agent.** Allowed by design — slugs
are unique per launch — but the picker should make it clear
this is a "start a SECOND bottle" decision, not a "re-enter
the first." Probably handled by showing the running-count in
the picker row.
5. **Should `q` confirm before tearing down N running
bottles?** A 5-bottle dashboard with 5 in-flight sessions
loses non-trivial state on accidental `q`. Probably yes:
curses modal "quit and tear down N bottles? [y/N]". Skip
confirmation when there are zero owned bottles.
6. **Race between handoff and 1s refresh tick.** While the
dashboard's `stdscr.timeout` is set, a key press fires the
handoff and the dashboard sits in `docker exec` for minutes.
`discover_active_agents` / `discover_pending` don't poll
during that window, which is fine — the moment we
`stdscr.refresh()` after exec returns, the next loop iter
runs discovery and the panes reflect reality. Worth calling
out in the design but no special handling needed.
7. **Multi-bottle resource use.** Five bottles up means five
compose projects: 5×(agent + pipelock + egress optional +
git-gate optional + supervise optional) containers, plus 5×2
networks. On a 16-GiB host this is fine; on something
smaller the operator might want a soft cap or a warning.
Out of v1; flag for follow-up if it bites.
## References
- PRD 0018 — compose-per-instance lifecycle (the `backend.
launch` context-manager contract this PRD layers against)
- PRD 0019 — active-agents pane + selection model (the
agents-pane row the re-attach + stop verbs hook into)
- `docs/research/claude-code-pane-in-dashboard.md` — option 1
(handoff) is what `attach_claude` implements here; options 2
/ 3 are out of scope for this PRD
- `claude_bottle/cli/start.py:_launch_bottle` — the function
chunk 1 extracts the prepare + attach pieces out of