didericis/bot-bottle

Fork 0

Files

T

didericis-codex c0e1f5fd70

test / unit (pull_request) Successful in 36s

Details

test / integration (pull_request) Successful in 54s

Details

docs(prd): supersede dashboard agent PRDs

2026-06-03 17:25:32 +00:00

17 KiB

Raw Permalink Blame History

PRD 0020: Start and attach to agents from inside the dashboard

Status: Superseded by PRD 0049
Author: didericis
Created: 2026-05-26

Summary

Today the dashboard is read-only: it surfaces pending proposals and active agents (PRD 0019) but can't start an agent or re-enter one. The operator's path is split — they launch agents from one terminal (./cli.py start <name>), and watch them from another (./cli.py dashboard).

This PRD collapses that split. The dashboard becomes the operator's single surface: pressing a key opens an agent picker, selecting one runs the existing prepare → preflight → launch flow inside a curses-friendly variant, and on yield drops to a full-screen docker exec -it … claude session (the "handoff" shape from docs/research/claude-code-pane-in-dashboard.md). When the operator exits claude, the dashboard re-renders with the now-running bottle visible in the agents pane.

Crucially, the bottle's lifetime is decoupled from both the claude session AND the dashboard process. Exit claude → back to dashboard, bottle still running. Start another agent → two bottles up at once. Quit the dashboard → bottles continue running. Teardown is always explicit: the operator presses x on an agent, or runs ./cli.py cleanup later.

Problem

Two real frictions today:

Two terminals for one workflow. The dashboard is the right shape to watch agents — proposals queue, status updates, operator-edit verbs — but it's the wrong shape to start them. Today you open a second terminal for that. In parallel use (3–5 bottles), the operator has 5+ terminals open and the dashboard's "active agents" pane is hopelessly behind reality because they just spawned three in a row.
./cli.py start ties the bottle to a single claude session. The start command's ExitStack brings the bottle up, runs claude, and tears down on Ctrl-D — fine for a one- shot session, wrong for "let me bounce in and out of this bottle a few times while triaging proposals." Today the only way to re-enter a bottle after exiting claude is to start a fresh one and lose all in-bottle state.

The dashboard already discovers active bottles, scopes operator-edit verbs to a selected agent (PRD 0019), and captures full-merged logs per bottle (PRD 0018). It already wants to be the primary surface. This PRD finishes that.

Goals / Success Criteria

From inside ./cli.py dashboard, pressing n (new) opens an agent picker listing every agent defined in the manifest. Selecting one runs prepare → preflight → launch.
The preflight Y/N summary renders cleanly — either as a curses modal or via curses.endwin() → text-mode prompt → restore, matching the existing editor-flow pattern.
On launch success, the dashboard performs a handoff (option 1 from the research doc): curses.endwin() → docker exec -it bot-bottle-<slug> claude --dangerously-skip-permissions → on exit, stdscr.refresh() and re-render with the new bottle in the agents pane.
The bottle's lifetime is owned by the dashboard process, NOT by any single claude session. Exiting claude (Ctrl-D, /exit) returns to the dashboard with the bottle still running. The operator can start more agents and re-enter previous ones.
Pressing Enter on a selected row in the agents pane re- attaches to that agent's bottle via the same handoff — drops to full-screen claude, returns on exit.
Pressing x (or similar — keybinding decided in design) on a selected agent stops just that bottle (compose down + state cleanup) without quitting the dashboard.
Quitting the dashboard (q) leaves every running bottle running. Bottle teardown is always explicit (per-bottle x or ./cli.py cleanup). The next ./cli.py dashboard invocation re-discovers them via list_active_slugs() and surfaces re-attach for any it can reconstruct context for (see "Cross-dashboard re-attach" below).

Non-goals

A pane that hosts the claude TUI alongside proposals. The embedded-emulator option from the research doc is out of scope. The handoff (option 1) is the v1; option 2 is a separate PRD if and when handoff is observably insufficient.
Adopting bottles started by an out-of-dashboard ./cli.py start invocation. Those have their own ExitStack-owner and the dashboard treats them as read-only-watch (already does today). Re-attach only applies to bottles the current dashboard process started.
Resurrecting an out-of-process bottle into a new dashboard with full re-attach. A bottle started by ./cli.py start in another terminal — or by a previous dashboard run, now exited — appears in the agents pane (already does, PRD 0019) and can be re-attached via docker exec -it claude because the agent container is still running sleep infinity. That's in scope. What's out is anything that requires the launch- context object to drive teardown — e.g., the ExitStack-tracked CA + state cleanup _settle_state performs today. Cross-dashboard re-attach uses the existing ./cli.py cleanup for teardown, not an x keypress (see open questions).
Multi-window UI. Single curses window, two existing panes (proposals + agents); the agent picker is a modal, not a third pane.
Removing ./cli.py start. Stays as the script-friendly / legacy entry point. The dashboard is the new default.

Scope

In scope

Manifest-driven agent picker (curses modal): list view with j/k navigation + Enter to confirm, Esc to abort.
Preflight rendering inside the dashboard's curses surface (modal or drop-and-resume — picked in design).
A new _dashboard_start_flow that wraps prepare + preflight
- launch and returns a DockerBottle handle the dashboard retains alongside its pending and agents lists.
A bottles: dict[slug, DockerBottle] map on the main loop that owns every dashboard-launched handle. ExitStack tears them all down on dashboard exit.
Enter on an agents-pane row → re-attach handoff (docker exec -it claude into the existing container).
x (or similar) on an agents-pane row → explicit per-bottle stop without quitting.
q (existing quit key) → tear down all dashboard-launched bottles before returning.

Out of scope

Changes to ./cli.py start itself. It keeps its current shape; the dashboard reuses its internal pieces (backend. prepare / backend.launch) without reaching through the CLI layer.
Changes to backend.launch's context-manager contract; the dashboard's bottle map just holds the context-manager-yielded Bottle and calls __exit__ on quit / explicit stop.
New manifest fields. The picker reads what's already there.
Adopting non-dashboard bottles into the dashboard's owned set.

Proposed design

Bottle ownership

Today's flow:

./cli.py start agent
  └─ with backend.launch(plan) as bottle:        ← bottle alive while inside `with`
       bottle.exec_agent([...], tty=True)       ← blocks until claude exits
     # context exits → compose down → state cleanup

The proposed dashboard-driven flow:

./cli.py dashboard
  └─ bottles: dict[str, tuple[ContextManager, DockerBottle]] = {}

     # operator presses `n`, picks agent
     cm = backend.launch(plan)
     bottle = cm.__enter__()                     ← enter but don't bind to a `with`
     bottles[plan.slug] = (cm, bottle)

     # operator interacts via:
     curses.endwin()
     bottle.exec_agent([...], tty=True)         ← blocks; returns on Ctrl-D
     stdscr.refresh()
     # bottle is STILL ALIVE — only the claude process exited

     # ... operator presses `x` on selected agent:
     cm, _ = bottles.pop(slug)
     cm.__exit__(None, None, None)               ← tears down just that one

     # ... operator presses `q`:
     return  # bottles dict still populated; no teardown

Two shifts:

Bottles outlive any single claude session — the dashboard manages enter/exit per bottle, not per attach. Exit claude → still in the dashboard with the bottle running.
Bottles outlive the dashboard process itself. Quitting the dashboard does NOT close the context managers; the docker compose project keeps running with the agent container in sleep infinity. A subsequent dashboard invocation re-discovers it via docker compose ls (PRD 0019's list_active_slugs) and surfaces re-attach.

The trade-off: state cleanup that today runs in _settle_state (transcript snapshot, preserve-marker evaluation, state-dir reap) doesn't fire on a quit-while- running bottle. It DOES fire when the operator explicitly stops via x, because that calls cm.__exit__. For bottles a previous dashboard quit on, ./cli.py cleanup is the path — its compose-down + state-reap logic already covers the case.

Cross-dashboard re-attach

When the dashboard discovers a bottle in discover_active_agents that it didn't itself start (a previous-dashboard or external ./cli.py start bottle), Enter still attaches via docker exec -it … claude — the agent container is running sleep infinity exactly the same way regardless of who started it. The only thing the current dashboard lacks for those bottles is the launch-context object needed to drive a clean teardown via x.

For v1 we surface this honestly: pressing x on a non-owned agent shows a status hint pointing at ./cli.py cleanup (or ./cli.py cleanup targeted at the slug if we add that flag later). The agent stays alive; the operator handles teardown out-of-band. Enter (re-attach) works for both owned and non-owned bottles.

Agent picker

Pressing n opens a centered modal listing every agent name from spec.manifest.agents. j/k navigates; Enter selects; Esc aborts. Width is the longest name + bottle name + a column for "already running?" so the operator can see at a glance whether picking an agent starts a fresh one (different slug suffix) or not.

┌─ start agent ───────────────────────────┐
│   implementer       dev      (running)  │
│ > researcher        dev                 │
│   triage-bot        sandbox             │
└─ Enter: start  Esc: cancel ─────────────┘

Starting an agent that already has a running bottle is allowed — each start mints a fresh slug — but the picker surfaces the already-running state so the operator doesn't accidentally double-launch.

Preflight Y/N

Two viable shapes:

Modal — render the preflight summary lines (agent / env / skills / bottle / git gate / egress) in a centered curses modal with [y/N] at the bottom. Capture the next keypress.

Drop-and-resume — curses.endwin(), print the preflight to stderr, read y/N from stdin, restore curses. Matches the editor-flow + handoff pattern; lower implementation cost.

Lean toward modal for the y/N because it doesn't flash the terminal between dashboard frames. Drop-and-resume is acceptable if modal proves fiddly.

Re-attach (Enter on agent)

Same handoff pattern the new-agent flow uses. For an agent the dashboard started this session, the dashboard holds the DockerBottle handle in its bottles dict and calls bottle.exec_agent(...). For an agent it discovered via list_active_slugs (previous-dashboard or external start), the dashboard synthesizes a one-shot DockerBottle from the slug — container name is bot-bottle-<slug>, no prompt path because the agent's claude config already has --append- system-prompt-file baked in from the original launch — and runs the same exec. Either way, Enter drops to full-screen claude; on exit the dashboard re-renders.

Explicit per-bottle stop

x on a dashboard-owned agent: pop the (cm, bottle) from the dict, call cm.__exit__(None, None, None) which drives the existing compose-down + state-settle logic. Refresh the agents pane.

x on a non-owned agent (discovered via list_active_slugs but not in bottles dict): no-op with status hint pointing at ./cli.py cleanup (the existing path that tears down ANY bot-bottle compose project plus reaps state dirs).

Dashboard quit

q returns the dashboard process to 0 without touching any running bottles. The bottles dict goes out of scope but because the context managers' __exit__ is never invoked, the docker compose project keeps running. The next dashboard invocation discovers the bottles via list_active_slugs and surfaces re-attach.

This is a real departure from today's ./cli.py start semantics (which couples bottle lifetime to the process via ExitStack). It's intentional: the dashboard is a watching + acting surface, not a lifetime owner.

Implementation chunks

Sized for one PR each.

Refactor _launch_bottle so the launch + exec_agent pieces are separable. Today's cli/start.py runs both inside one function. Extract prepare_with_preflight(spec, *, render_preflight, prompt_yes) and attach_agent(bottle, *, remote_control). The CLI's existing one-shot use binds them as before; the dashboard binds them with curses-aware render + prompt callables. No behavior change.
Agent picker modal + new-agent flow. New key n opens the picker; prepare_with_preflight runs against the selected agent; on Y, backend.launch(plan) enters the dashboard's ExitStack; handoff invokes attach_agent.
Re-attach via Enter on owned agents-pane row. Looks up the slug in the dashboard's bottles map; if present → handoff; else → status-line hint pointing at ./cli.py resume.
Explicit per-bottle stop (x keybinding). Pop the bottle's close callback off the stack, call it, refresh.
Quit-cleanup (q). Hook stack.close() into the normal return path. Document the "exiting dashboard tears down every bottle it started" contract in dashboard.py's module docstring.

Resolved questions

Modal vs. drop-and-resume for preflight Y/N. Resolved: modal. Render the preflight lines centered in a curses sub-window with [y/N] at the bottom; capture the next keypress. If geometry proves fiddly during implementation we'll fall back to drop-and-resume, but modal is the target.
Agent picker: text-filter typing. Resolved: yes, include filter typing. As the operator types, the list filters to agents whose name matches (substring, case-insensitive). j/k still navigates within the filtered set; Esc clears the filter on first press, exits the picker on the second.
Container-died-during-claude handling. Keep the design as drafted: transcript snapshot (snapshot_transcript) + mark_preserved if exit code is non-zero + remove from the bottles dict + status line "claude session for [slug] ended with exit N; preserved for resume". The bottle's cm.__exit__ would normally run on stop; here it runs as part of the death-handling (the container is already gone, but compose-down + state-settle still sequence the network removal + state cleanup correctly).
Double-start of the same agent. Allowed. The picker surfaces a (N running) annotation next to any agent name that already has live bottles in this dashboard's bottles dict OR in list_active_slugs(), so the operator sees the running-count before picking. Selecting an already-running agent name mints a fresh slug for the new bottle as normal.
Quit behavior. Resolved: q does NOT tear down any bottles. Dashboard exit is purely a UI exit; the bottles dict goes out of scope without invoking __exit__, so the docker compose projects keep running. Bottle teardown is always explicit: per-bottle x (for dashboard-owned), or ./cli.py cleanup (for everything).

Open questions

Race between handoff and 1s refresh tick. While the dashboard's stdscr.timeout is set, a key press fires the handoff and the dashboard sits in docker exec for minutes. discover_active_agents / discover_pending don't poll during that window — that's harmless on its own (the moment we stdscr.refresh() after exec returns, the next loop iter runs discovery and the panes reflect reality), but it does mean: (a) proposals queued during the claude session won't fire any operator notification until the handoff ends, and (b) a bottle that died mid-claude won't be detectable until the operator exits back to the dashboard. Not blocking v1 — flagging as a known limitation to revisit alongside the option-2 embedded-emulator path from the research doc.

References

PRD 0018 — compose-per-instance lifecycle (the backend. launch context-manager contract this PRD layers against)
PRD 0019 — active-agents pane + selection model (the agents-pane row the re-attach + stop verbs hook into)
docs/research/claude-code-pane-in-dashboard.md — option 1 (handoff) is what attach_agent implements here; options 2 / 3 are out of scope for this PRD
bot_bottle/cli/start.py:_launch_bottle — the function chunk 1 extracts the prepare + attach pieces out of

17 KiB Raw Permalink Blame History Unescape Escape