2026-05-26 02:53:30 -04:00
1 changed files with 285 additions and 0 deletions
@@ -0,0 +1,285 @@
+# Claude-code pane in the dashboard
+
+## Question
+
+The dashboard today shows pending proposals (top pane) and active
+agents (bottom pane, PRD 0019). Selecting an agent and pressing
+`e` / `p` invokes operator-scoped edits. The next move the user
+wants is **a way to interact with the claude-code session inside
+the selected bottle without leaving the dashboard** — type at it,
+read its output, return focus to the dashboard.
+
+What's the cheapest path to that, and where does it bottom out?
+
+## What "interact" means
+
+Today the flow is bimodal. `./cli.py start <agent>` brings the
+bottle up and immediately drops you into an interactive
+`docker exec -it claude-bottle-<slug> claude ...` — claude-code
+owns the whole terminal until you Ctrl-D out, at which point the
+bottle tears down. The dashboard (`./cli.py dashboard`) is a
+*separate* invocation that watches across bottles but never
+exposes the claude TUI itself.
+
+The user wants the dashboard to *also* be a claude-code session
+host: one of the dashboard's panes (or a press-key-to-focus
+mode) is a live claude-code terminal connected to the agent
+container the operator is sitting on in the agents pane.
+
+That changes the dashboard's job from "screen of metadata" to
+"terminal multiplexer that also draws metadata." The interesting
+question is whether that change is small or unbounded.
+
+## The core problem
+
+claude-code is a TUI in its own right. It runs as an
+interactive Node process, expects a real PTY, drives its own
+cursor positioning, color, mouse events, and key bindings. The
+dashboard is *also* a TUI (curses), and curses owns the
+terminal's input + output stream while it's active.
+
+Two TUIs sharing one terminal can't both be "running" without
+one of them giving up screen control to the other. The decision
+shape is which one yields, and where the boundary lives.
+
+There are exactly three realistic ways to resolve this:
+
+  1. **Handoff** — the dashboard releases the terminal when the
+     user wants to talk to claude-code, claude-code takes over
+     full-screen, and the dashboard re-takes control when
+     claude-code exits or is detached. Like how `e` (routes
+     edit) already shells to `$EDITOR` today
+     (`curses.endwin()` → run editor → `stdscr.refresh()`).
+  2. **Embedded emulator** — the dashboard runs claude-code in a
+     PTY it owns, parses claude-code's ANSI escape stream
+     ourselves, and paints the resulting cell grid into a
+     curses pane. Keypresses inside the pane get routed to the
+     PTY's stdin; the dashboard renders metadata in the other
+     panes alongside.
+  3. **External multiplexer** — the dashboard doesn't render the
+     claude-code session at all. It asks tmux / screen / a
+     terminal emulator to open it in a real adjacent pane (split
+     window, new tab), and treats the multiplexer as the
+     coordinator instead of trying to be one.
+
+Below are the actual costs.
+
+## Option 1: Handoff
+
+The dashboard sees a key (say Enter on a selected agent in the
+agents pane). It calls `curses.endwin()`, then `subprocess.run(
+["docker", "exec", "-it", "claude-bottle-<slug>", "claude",
+"--dangerously-skip-permissions"])`. claude-code takes the
+terminal full-screen. When the operator exits claude-code
+(Ctrl-D, `/exit`), the subprocess returns; the dashboard calls
+`stdscr.refresh()` to redraw and resume.
+
+What's good:
+
+- It's ~20 lines of code. The plumbing (`curses.endwin` /
+  refresh + a `docker exec`) already exists for the editor flow.
+- Zero new dependencies. claude-code runs in its real PTY exactly
+  the way it does today.
+- No "embedded TUI inside another TUI" weirdness. Keybinding
+  collisions, terminal-resize stories, scrollback are all
+  whatever claude-code already does.
+- Already-running session reuse: a bottle's agent container
+  runs `sleep infinity` and `docker exec`s claude in on-demand
+  (PRD 0018 chunk 3). Re-entering with another `exec` would
+  start a *second* claude process; we'd want to either attach
+  to the first one (tricky — `docker exec` doesn't have an
+  "attach to existing exec" verb) or treat first-time entry as
+  "start the session" and stash a marker so re-entry is a
+  resume rather than a fresh process.
+
+What's not good:
+
+- It's not really "a pane in the dashboard." It's "press Enter
+  to leave the dashboard, talk to claude, come back." The user
+  wanted side-by-side; this is side-by-time.
+- The dashboard can't auto-refresh while claude-code has the
+  terminal. If a new proposal lands while you're in the claude
+  session, you won't see it until you exit.
+- Notifications during the claude session need a separate
+  channel (sound? OS notification?). Otherwise the operator's
+  reason for using the dashboard — "watch everything in one
+  place" — partially evaporates.
+
+This is the v1 the project's existing code-shape strongly
+prefers. It clears the bar of "let me talk to claude-code
+without quitting `./cli.py dashboard`."
+
+## Option 2: Embedded emulator
+
+The dashboard opens a PTY (stdlib `pty` module), spawns
+`docker exec -it … claude` attached to it, and runs a terminal
+emulator in-process that consumes claude-code's output stream
+and maintains a virtual screen buffer. The buffer's current
+state gets painted into a curses pane every refresh tick.
+Keypresses received inside the focused pane get written to the
+PTY's input fd.
+
+This is what tmux does. It is also what every "terminal in
+a TUI" demo does. The challenge is everything between "run a
+PTY" and "render its output correctly."
+
+What you need to implement (or take as a dep):
+
+- **ANSI/VT escape parsing.** claude-code uses xterm-class
+  escape sequences for cursor positioning, color, scroll
+  regions, alternate screen buffer (for the prompt UI), mouse
+  reporting, and so on. The full xterm spec is dozens of pages.
+  Sloppy parsing produces a corrupted display the user will
+  hate.
+- **A screen buffer model.** Cells with attributes
+  (foreground, background, bold, underline, italic, inverse).
+  Cursor position. Saved cursor. Alternate screen. Scrollback.
+- **Resize protocol.** claude-code asks the PTY its size via
+  `TIOCGWINSZ` and re-layouts on `SIGWINCH`. The dashboard has
+  to size the PTY to the pane it's rendering into and propagate
+  SIGWINCH when curses says the terminal resized.
+- **Input routing.** When the pane has focus, keypresses
+  written to the PTY. When the dashboard has focus, keypresses
+  consumed by the dashboard. Define an escape sequence (like
+  tmux's `Ctrl-B`) that toggles focus, and document that
+  claude-code's own use of that key sequence is now intercepted.
+- **Output throttling.** claude-code can emit megabytes of
+  tokens in a streaming response. The dashboard's 1s refresh
+  tick is too slow to render character-at-a-time; you want the
+  PTY reader to coalesce and the renderer to render on a
+  smaller cadence than the main loop's `getch` timeout.
+
+The stdlib has `pty` (the spawn side) and you can read/write
+the master fd by hand. It does **not** have an ANSI parser; the
+established Python library for this is `pyte`
+([pyte.readthedocs.io](https://pyte.readthedocs.io/)) — pure
+Python, MIT-licensed, no transitive deps. ~3k lines. It would
+be the project's first runtime dependency beyond stdlib.
+
+Even with `pyte`, the integration is non-trivial: you're
+re-rendering a 24x80-ish (or whatever fits) screen buffer into
+curses cells on every tick, dealing with attribute mapping
+(pyte's color enum → curses color pair), and handling mouse
+events through the pane. Plan on ~600–1200 lines, not 200.
+
+Open trap-doors:
+
+- **Claude-code uses bracketed paste, alternate screen, and
+  occasionally raw terminal control for its prompt input.**
+  Some of these features stress the emulator harder than `vim`
+  does — alt-screen has to be supported or claude's
+  command-prompt UI corrupts the line above it. `pyte` claims to
+  handle alt-screen; verify before committing.
+- **Scrollback in claude is `/transcript`-driven, not terminal
+  scrollback.** A small pane height means you only see the last
+  10–20 lines of output without leaving the dashboard, which is
+  the wrong shape for a 200-line streaming response. You'd
+  want to make the pane resizable or open a full-height
+  "expand" mode (which is just option 1, the handoff, with
+  extra steps).
+- **Multiple agents = multiple PTYs running concurrently.** If
+  the user wants to monitor 3 bottles, the dashboard is now
+  holding 3 PTYs open and parsing 3 ANSI streams in parallel.
+  Memory + CPU costs are bounded but nonzero; design the
+  PTY-per-agent state machine carefully.
+
+This is the option that delivers the "pane in the dashboard"
+literal request. It's the right answer if the user's day-to-day
+involves watching multiple bottles' output simultaneously
+without context-switching. It's the wrong answer if they mostly
+want one focused session at a time with proposals visible.
+
+## Option 3: External multiplexer
+
+The dashboard binds a key (e.g. `Enter` on agent) to
+`tmux split-window -h 'docker exec -it claude-bottle-<slug>
+claude'` when run inside a tmux session, or to `osascript`-
+driven iTerm pane spawning on macOS, or to `wezterm cli
+spawn` if the user is on wezterm.
+
+What's good:
+
+- The "real terminal in a real pane" is solved by tools the
+  user already trusts. tmux's terminal emulation is correct;
+  iTerm's is correct; wezterm's is correct. We're not
+  reimplementing any of them.
+- Multi-bottle parallelism is automatic — the user opens one
+  pane per agent, the multiplexer renders them.
+- Implementation cost is tiny: ~50 lines of "if `TMUX` env is
+  set, shell out to `tmux split-window`."
+
+What's not good:
+
+- It requires the user to be running in a multiplexer. Outside
+  one (plain Terminal.app, vscode integrated terminal, etc.)
+  the verb either falls back to handoff or just fails.
+- It splits the operator's mental model. The dashboard is one
+  window, claude-code panes are other windows; the dashboard's
+  "agents pane" no longer matches the visible reality (some
+  agents have an attached pane, others don't, and the dashboard
+  doesn't know which).
+- We don't actually own the layout. tmux's pane sizing rules
+  are tmux's; the dashboard can't enforce "agent pane on the
+  right, dashboard on the left."
+
+This is the right answer if the project decides "we shouldn't
+be a multiplexer; let tmux be a multiplexer." It's the wrong
+answer if the dashboard wants to *be* the operator's primary
+surface — which the user's question implies it does.
+
+## Recommendation
+
+**Build Option 1 first** (handoff). It clears the bar of "let me
+talk to claude without quitting the dashboard," it's a
+~30-line patch over the existing `e` / `p` infrastructure, and
+it carries no new dependencies. Critically it lets the user
+*confirm whether the workflow actually wants embedded panes* —
+if option 1 turns out to be enough in practice (Enter to drop
+in, Ctrl-D back out, watch proposals between sessions), the
+embedded-emulator complexity is unnecessary.
+
+**Treat Option 2 as a six-week project to plan only if Option 1
+is observably insufficient.** The pyte dep is acceptable but
+the integration is real engineering — terminal emulator
+correctness, pane sizing, input routing, multi-PTY state
+management. Schedule it; don't bolt it on. If the user lives in
+3-bottle days and needs simultaneous output, this is the option;
+otherwise it's premature optimization.
+
+**Treat Option 3 as a niche augmentation, not a primary path.**
+Detect `TMUX` / `WEZTERM_PANE` / `ITERM_SESSION_ID` at startup;
+if present, add a `split` keybinding that delegates pane
+creation to the multiplexer. The dashboard remains the primary
+interface; the multiplexer is convenience for power users.
+
+## Followups worth checking before committing
+
+- **What does claude-code's PTY-resize behavior look like at
+  small heights?** Drive it at 24×40, 10×80, 8×40 and see if it
+  blows up. The dashboard's bottom pane is going to be small.
+- **Is there a way to `docker exec` into an *already-running
+  exec* rather than spawning a new claude process per attach?**
+  If not, the dashboard needs its own session-state model: "is
+  there an exec running for this slug? attach to it. otherwise
+  start one."
+- **How does claude-code handle SIGWINCH at the moment?** If it
+  re-layouts cleanly the embedded-emulator story gets easier;
+  if it corrupts, layer 2 needs special-case handling.
+- **Does `docker exec -i` (no `-t`) preserve enough of the TTY
+  contract for claude-code to start at all?** Some apps refuse
+  to launch without a real TTY; the embedded emulator option
+  needs the PTY allocated on the host side and the exec
+  re-attached to it, which is the harder of the two options.
+
+## References
+
+- PRD 0019 — active agents pane + selection model (the
+  selection-source for whichever option lands)
+- PRD 0018 chunk 3 — agent container runs `sleep infinity`;
+  claude is invoked via `docker exec -it` (the
+  attachment-point this doc is layering against)
+- `claude_bottle/cli/dashboard.py:_operator_edit_flow` — the
+  existing `curses.endwin` → shell out → `stdscr.refresh()`
+  pattern Option 1 would clone
+- pyte: <https://pyte.readthedocs.io/> — the candidate
+  terminal-emulator library if Option 2 is picked