docs(prd-0020): record answers to open questions, switch to no-teardown-on-quit

This commit is contained in:
2026-05-26 03:10:26 -04:00
parent ec20293c0a
commit 26322bdfd5
+154 -94
View File
@@ -21,11 +21,12 @@ shape from `docs/research/claude-code-pane-in-dashboard.md`).
When the operator exits claude, the dashboard re-renders with When the operator exits claude, the dashboard re-renders with
the now-running bottle visible in the agents pane. the now-running bottle visible in the agents pane.
Crucially, the bottle's lifetime is owned by the *dashboard Crucially, the bottle's lifetime is decoupled from both the
process*, not by the individual claude session. Exit claude → claude session AND the dashboard process. Exit claude → back to
back to dashboard, bottle still running. Start another agent → dashboard, bottle still running. Start another agent → two
two bottles up at once. Quit the dashboard → all dashboard- bottles up at once. Quit the dashboard → bottles continue
launched bottles tear down. running. Teardown is **always explicit**: the operator presses
`x` on an agent, or runs `./cli.py cleanup` later.
## Problem ## Problem
@@ -75,10 +76,12 @@ captures full-merged logs per bottle (PRD 0018). It already
6. Pressing `x` (or similar — keybinding decided in design) 6. Pressing `x` (or similar — keybinding decided in design)
on a selected agent stops just that bottle (compose down + on a selected agent stops just that bottle (compose down +
state cleanup) without quitting the dashboard. state cleanup) without quitting the dashboard.
7. Quitting the dashboard (`q`) tears down every bottle the 7. Quitting the dashboard (`q`) leaves every running bottle
dashboard started, unless something has explicitly preserved running. Bottle teardown is always explicit (per-bottle `x`
the state (capability-block, crash). Matches today's or `./cli.py cleanup`). The next `./cli.py dashboard`
start.py teardown semantics. invocation re-discovers them via `list_active_slugs()` and
surfaces re-attach for any it can reconstruct context for
(see "Cross-dashboard re-attach" below).
## Non-goals ## Non-goals
@@ -91,10 +94,18 @@ captures full-merged logs per bottle (PRD 0018). It already
the dashboard treats them as read-only-watch (already does the dashboard treats them as read-only-watch (already does
today). Re-attach only applies to bottles the *current today). Re-attach only applies to bottles the *current
dashboard process* started. dashboard process* started.
- **Persisting a "bottle pool" across dashboard runs.** When - **Resurrecting an out-of-process bottle into a new dashboard
the dashboard quits, its bottles go. Resume across dashboard with full re-attach.** A bottle started by `./cli.py start`
invocations is `./cli.py resume <identity>`, which is in another terminal — or by a previous dashboard run, now
unchanged. exited — appears in the agents pane (already does, PRD 0019)
and can be re-attached via `docker exec -it claude` because
the agent container is still running `sleep infinity`. That's
in scope. What's *out* is anything that requires the launch-
context object to drive teardown — e.g., the
ExitStack-tracked CA + state cleanup `_settle_state` performs
today. Cross-dashboard re-attach uses the existing
`./cli.py cleanup` for teardown, not an `x` keypress (see
open questions).
- **Multi-window UI.** Single curses window, two existing - **Multi-window UI.** Single curses window, two existing
panes (proposals + agents); the agent picker is a modal, not panes (proposals + agents); the agent picker is a modal, not
a third pane. a third pane.
@@ -147,31 +158,69 @@ Today's flow:
# context exits → compose down → state cleanup # context exits → compose down → state cleanup
``` ```
The proposed dashboard-owned flow: The proposed dashboard-driven flow:
``` ```
./cli.py dashboard ./cli.py dashboard
└─ stack = ExitStack() └─ bottles: dict[str, tuple[ContextManager, DockerBottle]] = {}
bottles: dict[str, DockerBottle] = {}
# operator presses `n`, picks agent # operator presses `n`, picks agent
ctx = backend.launch(plan) cm = backend.launch(plan)
bottle = stack.enter_context(ctx) ← bottle stays alive bottle = cm.__enter__() ← enter but don't bind to a `with`
bottles[plan.slug] = bottle bottles[plan.slug] = (cm, bottle)
# operator interacts via: # operator interacts via:
curses.endwin() curses.endwin()
bottle.exec_claude([...], tty=True) ← blocks; returns on Ctrl-D bottle.exec_claude([...], tty=True) ← blocks; returns on Ctrl-D
stdscr.refresh() stdscr.refresh()
# bottle is STILL ALIVE here — only the claude process exited # bottle is STILL ALIVE — only the claude process exited
# ... operator does other things, eventually `q`: # ... operator presses `x` on selected agent:
stack.close() ← tears down every bottle cm, _ = bottles.pop(slug)
cm.__exit__(None, None, None) ← tears down just that one
# ... operator presses `q`:
return # bottles dict still populated; no teardown
``` ```
The shift is one line of code semantically but the change in Two shifts:
operator experience is real: bottles outlive any single claude
session. 1. Bottles outlive any single claude session — the dashboard
manages enter/exit per bottle, not per attach. Exit claude
→ still in the dashboard with the bottle running.
2. Bottles outlive the dashboard process itself. Quitting the
dashboard does NOT close the context managers; the docker
compose project keeps running with the agent container in
`sleep infinity`. A subsequent dashboard invocation
re-discovers it via `docker compose ls` (PRD 0019's
`list_active_slugs`) and surfaces re-attach.
The trade-off: state cleanup that today runs in
`_settle_state` (transcript snapshot, preserve-marker
evaluation, state-dir reap) doesn't fire on a quit-while-
running bottle. It DOES fire when the operator explicitly
stops via `x`, because that calls `cm.__exit__`. For
bottles a previous dashboard quit on, `./cli.py cleanup`
is the path — its compose-down + state-reap logic
already covers the case.
### Cross-dashboard re-attach
When the dashboard discovers a bottle in `discover_active_agents`
that it didn't itself start (a previous-dashboard or external
`./cli.py start` bottle), Enter still attaches via `docker exec
-it … claude` — the agent container is running `sleep infinity`
exactly the same way regardless of who started it. The only
thing the current dashboard lacks for those bottles is the
launch-context object needed to drive a clean teardown via
`x`.
For v1 we surface this honestly: pressing `x` on a non-owned
agent shows a status hint pointing at `./cli.py cleanup` (or
`./cli.py cleanup` targeted at the slug if we add that flag
later). The agent stays alive; the operator handles teardown
out-of-band. Enter (re-attach) works for both owned and
non-owned bottles.
### Agent picker ### Agent picker
@@ -213,40 +262,43 @@ if modal proves fiddly.
### Re-attach (Enter on agent) ### Re-attach (Enter on agent)
Same handoff pattern the new-agent flow uses. The dashboard Same handoff pattern the new-agent flow uses. For an agent the
already holds the `DockerBottle` for any slug it started — dashboard started this session, the dashboard holds the
`bottle.exec_claude([...], tty=True)` does the right `docker `DockerBottle` handle in its `bottles` dict and calls
exec -it claude …` and returns on session exit. Re-attach is `bottle.exec_claude(...)`. For an agent it discovered via
"already-running" + the same exec call; the agent picker isn't `list_active_slugs` (previous-dashboard or external start),
involved. the dashboard synthesizes a one-shot `DockerBottle` from the
slug — container name is `claude-bottle-<slug>`, no prompt
For agents the dashboard didn't start (read-only watch), Enter path because the agent's claude config already has `--append-
is a no-op with a status hint ("dashboard didn't start this system-prompt-file` baked in from the original launch —
bottle; resume with `./cli.py resume <identity>` outside the and runs the same exec. Either way, Enter drops to
dashboard"). PRD-0019's selection model already differentiates full-screen claude; on exit the dashboard re-renders.
focus; this layer just gates the action.
### Explicit per-bottle stop ### Explicit per-bottle stop
`x` on a selected dashboard-owned agent invokes `x` on a dashboard-owned agent: pop the `(cm, bottle)` from
`stack.pop_callback`-style targeted teardown: take that bottle the dict, call `cm.__exit__(None, None, None)` which drives
out of the map, call its `close()` to tear down compose + state, the existing compose-down + state-settle logic. Refresh the
update the agents pane on the next refresh. Bottles the agents pane.
dashboard didn't start (`x` on a read-only-watch row) → no-op
with a status hint. `x` on a non-owned agent (discovered via `list_active_slugs`
but not in `bottles` dict): no-op with status hint pointing
at `./cli.py cleanup` (the existing path that tears down
ANY claude-bottle compose project plus reaps state dirs).
### Dashboard quit ### Dashboard quit
`q` (existing) calls `stack.close()` before exit; every `q` returns the dashboard process to 0 without touching any
dashboard-launched bottle goes through its normal teardown running bottles. The `bottles` dict goes out of scope but
(`compose down` + state settle). Preserve markers (capability- because the context managers' `__exit__` is never invoked,
block, crash) still keep state across teardown. The dashboard the `docker compose` project keeps running. The next dashboard
process itself returns 0. invocation discovers the bottles via `list_active_slugs` and
surfaces re-attach.
If the operator wants to keep bottles alive past dashboard This is a real departure from today's `./cli.py start`
exit, the existing path is unchanged: launch them via semantics (which couples bottle lifetime to the process via
`./cli.py start` in a separate terminal. That ownership stays ExitStack). It's intentional: the dashboard is a watching +
out-of-band. acting surface, not a lifetime owner.
## Implementation chunks ## Implementation chunks
@@ -274,54 +326,62 @@ Sized for one PR each.
down every bottle it started" contract in `dashboard.py`'s down every bottle it started" contract in `dashboard.py`'s
module docstring. module docstring.
## Resolved questions
1. **Modal vs. drop-and-resume for preflight Y/N.** Resolved:
**modal.** Render the preflight lines centered in a curses
sub-window with `[y/N]` at the bottom; capture the next
keypress. If geometry proves fiddly during implementation
we'll fall back to drop-and-resume, but modal is the target.
2. **Agent picker: text-filter typing.** Resolved: **yes,
include filter typing.** As the operator types, the list
filters to agents whose name matches (substring,
case-insensitive). j/k still navigates within the filtered
set; Esc clears the filter on first press, exits the picker
on the second.
3. **Container-died-during-claude handling.** Keep the design
as drafted: transcript snapshot (`snapshot_transcript`) +
`mark_preserved` if exit code is non-zero + remove from
the `bottles` dict + status line `"claude session for
[slug] ended with exit N; preserved for resume"`. The
bottle's `cm.__exit__` would normally run on stop; here it
runs as part of the death-handling (the container is
already gone, but compose-down + state-settle still
sequence the network removal + state cleanup correctly).
4. **Double-start of the same agent.** Allowed. The picker
surfaces a `(N running)` annotation next to any agent name
that already has live bottles in this dashboard's `bottles`
dict OR in `list_active_slugs()`, so the operator sees the
running-count before picking. Selecting an already-running
agent name mints a fresh slug for the new bottle as
normal.
5. **Quit behavior.** Resolved: **`q` does NOT tear down any
bottles.** Dashboard exit is purely a UI exit; the
bottles dict goes out of scope without invoking `__exit__`,
so the `docker compose` projects keep running. Bottle
teardown is always explicit: per-bottle `x` (for
dashboard-owned), or `./cli.py cleanup` (for everything).
## Open questions ## Open questions
1. **Modal vs. drop-and-resume for preflight Y/N.** Both work;
modal is nicer if the curses geometry handling is
straightforward. Pick during chunk 2 by prototyping the
modal in ~30 lines and seeing if it looks right.
2. **Agent picker: text-filter typing?** v1 is j/k navigation
only. If the manifest has 20+ agents the picker gets noisy;
add fzf-style filter input later if needed.
3. **What happens if `attach_claude` exits because the
container died** (not a clean claude exit — e.g., OOM,
panic)? Today's `_settle_state` marks the bottle preserved
for non-zero exit codes. The dashboard's re-render needs to
notice the bottle is gone (compose down or container-not-
running state) and surface a status line. Probably:
transcript snapshot + mark preserved + remove from
`bottles` map + status line "claude session for [slug]
ended with exit N; preserved for resume".
4. **Double-start of the same agent.** Allowed by design — slugs
are unique per launch — but the picker should make it clear
this is a "start a SECOND bottle" decision, not a "re-enter
the first." Probably handled by showing the running-count in
the picker row.
5. **Should `q` confirm before tearing down N running
bottles?** A 5-bottle dashboard with 5 in-flight sessions
loses non-trivial state on accidental `q`. Probably yes:
curses modal "quit and tear down N bottles? [y/N]". Skip
confirmation when there are zero owned bottles.
6. **Race between handoff and 1s refresh tick.** While the 6. **Race between handoff and 1s refresh tick.** While the
dashboard's `stdscr.timeout` is set, a key press fires the dashboard's `stdscr.timeout` is set, a key press fires the
handoff and the dashboard sits in `docker exec` for minutes. handoff and the dashboard sits in `docker exec` for minutes.
`discover_active_agents` / `discover_pending` don't poll `discover_active_agents` / `discover_pending` don't poll
during that window, which is fine — the moment we during that window — that's harmless on its own (the moment
`stdscr.refresh()` after exec returns, the next loop iter we `stdscr.refresh()` after exec returns, the next loop
runs discovery and the panes reflect reality. Worth calling iter runs discovery and the panes reflect reality), but
out in the design but no special handling needed. it does mean: (a) proposals queued during the claude
session won't fire any operator notification until the
7. **Multi-bottle resource use.** Five bottles up means five handoff ends, and (b) a bottle that died mid-claude won't
compose projects: 5×(agent + pipelock + egress optional + be detectable until the operator exits back to the
git-gate optional + supervise optional) containers, plus 5×2 dashboard. Not blocking v1 — flagging as a known limitation
networks. On a 16-GiB host this is fine; on something to revisit alongside the option-2 embedded-emulator path
smaller the operator might want a soft cap or a warning. from the research doc.
Out of v1; flag for follow-up if it bites.
## References ## References