From 8cd867f3d2de48da3d5697177d7b0687a213e7d1 Mon Sep 17 00:00:00 2001 From: didericis Date: Tue, 26 May 2026 02:51:08 -0400 Subject: [PATCH] docs(research): claude-code pane in the dashboard MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Survey the three realistic ways to surface a claude-code session inside the dashboard TUI: 1. Handoff — drop curses, foreground claude, restore on exit (the existing `e`/`p` pattern, extended). Minimal code, side-by-time rather than side-by-side. 2. Embedded emulator — own a PTY, parse claude-code's ANSI stream via `pyte`, paint it into a curses pane. Real "pane in the dashboard" but a six-week build with one new dep and several integration trap-doors (alt-screen, resize, input routing, multi-PTY state). 3. External multiplexer — delegate pane creation to tmux / iTerm / wezterm when detected. Tiny code, but splits the operator's mental model and gives up layout control. Recommendation: ship Option 1 first; defer Option 2 to "only if Option 1 is observably insufficient"; treat Option 3 as a niche augmentation for power users. Calls out four followups worth verifying before committing (PTY behavior at small sizes, attach-to-existing-exec, SIGWINCH handling, `-it` vs `-i` for the embedded path). --- .../research/claude-code-pane-in-dashboard.md | 285 ++++++++++++++++++ 1 file changed, 285 insertions(+) create mode 100644 docs/research/claude-code-pane-in-dashboard.md diff --git a/docs/research/claude-code-pane-in-dashboard.md b/docs/research/claude-code-pane-in-dashboard.md new file mode 100644 index 0000000..84ea8fd --- /dev/null +++ b/docs/research/claude-code-pane-in-dashboard.md @@ -0,0 +1,285 @@ +# Claude-code pane in the dashboard + +## Question + +The dashboard today shows pending proposals (top pane) and active +agents (bottom pane, PRD 0019). Selecting an agent and pressing +`e` / `p` invokes operator-scoped edits. The next move the user +wants is **a way to interact with the claude-code session inside +the selected bottle without leaving the dashboard** — type at it, +read its output, return focus to the dashboard. + +What's the cheapest path to that, and where does it bottom out? + +## What "interact" means + +Today the flow is bimodal. `./cli.py start ` brings the +bottle up and immediately drops you into an interactive +`docker exec -it claude-bottle- claude ...` — claude-code +owns the whole terminal until you Ctrl-D out, at which point the +bottle tears down. The dashboard (`./cli.py dashboard`) is a +*separate* invocation that watches across bottles but never +exposes the claude TUI itself. + +The user wants the dashboard to *also* be a claude-code session +host: one of the dashboard's panes (or a press-key-to-focus +mode) is a live claude-code terminal connected to the agent +container the operator is sitting on in the agents pane. + +That changes the dashboard's job from "screen of metadata" to +"terminal multiplexer that also draws metadata." The interesting +question is whether that change is small or unbounded. + +## The core problem + +claude-code is a TUI in its own right. It runs as an +interactive Node process, expects a real PTY, drives its own +cursor positioning, color, mouse events, and key bindings. The +dashboard is *also* a TUI (curses), and curses owns the +terminal's input + output stream while it's active. + +Two TUIs sharing one terminal can't both be "running" without +one of them giving up screen control to the other. The decision +shape is which one yields, and where the boundary lives. + +There are exactly three realistic ways to resolve this: + + 1. **Handoff** — the dashboard releases the terminal when the + user wants to talk to claude-code, claude-code takes over + full-screen, and the dashboard re-takes control when + claude-code exits or is detached. Like how `e` (routes + edit) already shells to `$EDITOR` today + (`curses.endwin()` → run editor → `stdscr.refresh()`). + 2. **Embedded emulator** — the dashboard runs claude-code in a + PTY it owns, parses claude-code's ANSI escape stream + ourselves, and paints the resulting cell grid into a + curses pane. Keypresses inside the pane get routed to the + PTY's stdin; the dashboard renders metadata in the other + panes alongside. + 3. **External multiplexer** — the dashboard doesn't render the + claude-code session at all. It asks tmux / screen / a + terminal emulator to open it in a real adjacent pane (split + window, new tab), and treats the multiplexer as the + coordinator instead of trying to be one. + +Below are the actual costs. + +## Option 1: Handoff + +The dashboard sees a key (say Enter on a selected agent in the +agents pane). It calls `curses.endwin()`, then `subprocess.run( +["docker", "exec", "-it", "claude-bottle-", "claude", +"--dangerously-skip-permissions"])`. claude-code takes the +terminal full-screen. When the operator exits claude-code +(Ctrl-D, `/exit`), the subprocess returns; the dashboard calls +`stdscr.refresh()` to redraw and resume. + +What's good: + +- It's ~20 lines of code. The plumbing (`curses.endwin` / + refresh + a `docker exec`) already exists for the editor flow. +- Zero new dependencies. claude-code runs in its real PTY exactly + the way it does today. +- No "embedded TUI inside another TUI" weirdness. Keybinding + collisions, terminal-resize stories, scrollback are all + whatever claude-code already does. +- Already-running session reuse: a bottle's agent container + runs `sleep infinity` and `docker exec`s claude in on-demand + (PRD 0018 chunk 3). Re-entering with another `exec` would + start a *second* claude process; we'd want to either attach + to the first one (tricky — `docker exec` doesn't have an + "attach to existing exec" verb) or treat first-time entry as + "start the session" and stash a marker so re-entry is a + resume rather than a fresh process. + +What's not good: + +- It's not really "a pane in the dashboard." It's "press Enter + to leave the dashboard, talk to claude, come back." The user + wanted side-by-side; this is side-by-time. +- The dashboard can't auto-refresh while claude-code has the + terminal. If a new proposal lands while you're in the claude + session, you won't see it until you exit. +- Notifications during the claude session need a separate + channel (sound? OS notification?). Otherwise the operator's + reason for using the dashboard — "watch everything in one + place" — partially evaporates. + +This is the v1 the project's existing code-shape strongly +prefers. It clears the bar of "let me talk to claude-code +without quitting `./cli.py dashboard`." + +## Option 2: Embedded emulator + +The dashboard opens a PTY (stdlib `pty` module), spawns +`docker exec -it … claude` attached to it, and runs a terminal +emulator in-process that consumes claude-code's output stream +and maintains a virtual screen buffer. The buffer's current +state gets painted into a curses pane every refresh tick. +Keypresses received inside the focused pane get written to the +PTY's input fd. + +This is what tmux does. It is also what every "terminal in +a TUI" demo does. The challenge is everything between "run a +PTY" and "render its output correctly." + +What you need to implement (or take as a dep): + +- **ANSI/VT escape parsing.** claude-code uses xterm-class + escape sequences for cursor positioning, color, scroll + regions, alternate screen buffer (for the prompt UI), mouse + reporting, and so on. The full xterm spec is dozens of pages. + Sloppy parsing produces a corrupted display the user will + hate. +- **A screen buffer model.** Cells with attributes + (foreground, background, bold, underline, italic, inverse). + Cursor position. Saved cursor. Alternate screen. Scrollback. +- **Resize protocol.** claude-code asks the PTY its size via + `TIOCGWINSZ` and re-layouts on `SIGWINCH`. The dashboard has + to size the PTY to the pane it's rendering into and propagate + SIGWINCH when curses says the terminal resized. +- **Input routing.** When the pane has focus, keypresses + written to the PTY. When the dashboard has focus, keypresses + consumed by the dashboard. Define an escape sequence (like + tmux's `Ctrl-B`) that toggles focus, and document that + claude-code's own use of that key sequence is now intercepted. +- **Output throttling.** claude-code can emit megabytes of + tokens in a streaming response. The dashboard's 1s refresh + tick is too slow to render character-at-a-time; you want the + PTY reader to coalesce and the renderer to render on a + smaller cadence than the main loop's `getch` timeout. + +The stdlib has `pty` (the spawn side) and you can read/write +the master fd by hand. It does **not** have an ANSI parser; the +established Python library for this is `pyte` +([pyte.readthedocs.io](https://pyte.readthedocs.io/)) — pure +Python, MIT-licensed, no transitive deps. ~3k lines. It would +be the project's first runtime dependency beyond stdlib. + +Even with `pyte`, the integration is non-trivial: you're +re-rendering a 24x80-ish (or whatever fits) screen buffer into +curses cells on every tick, dealing with attribute mapping +(pyte's color enum → curses color pair), and handling mouse +events through the pane. Plan on ~600–1200 lines, not 200. + +Open trap-doors: + +- **Claude-code uses bracketed paste, alternate screen, and + occasionally raw terminal control for its prompt input.** + Some of these features stress the emulator harder than `vim` + does — alt-screen has to be supported or claude's + command-prompt UI corrupts the line above it. `pyte` claims to + handle alt-screen; verify before committing. +- **Scrollback in claude is `/transcript`-driven, not terminal + scrollback.** A small pane height means you only see the last + 10–20 lines of output without leaving the dashboard, which is + the wrong shape for a 200-line streaming response. You'd + want to make the pane resizable or open a full-height + "expand" mode (which is just option 1, the handoff, with + extra steps). +- **Multiple agents = multiple PTYs running concurrently.** If + the user wants to monitor 3 bottles, the dashboard is now + holding 3 PTYs open and parsing 3 ANSI streams in parallel. + Memory + CPU costs are bounded but nonzero; design the + PTY-per-agent state machine carefully. + +This is the option that delivers the "pane in the dashboard" +literal request. It's the right answer if the user's day-to-day +involves watching multiple bottles' output simultaneously +without context-switching. It's the wrong answer if they mostly +want one focused session at a time with proposals visible. + +## Option 3: External multiplexer + +The dashboard binds a key (e.g. `Enter` on agent) to +`tmux split-window -h 'docker exec -it claude-bottle- +claude'` when run inside a tmux session, or to `osascript`- +driven iTerm pane spawning on macOS, or to `wezterm cli +spawn` if the user is on wezterm. + +What's good: + +- The "real terminal in a real pane" is solved by tools the + user already trusts. tmux's terminal emulation is correct; + iTerm's is correct; wezterm's is correct. We're not + reimplementing any of them. +- Multi-bottle parallelism is automatic — the user opens one + pane per agent, the multiplexer renders them. +- Implementation cost is tiny: ~50 lines of "if `TMUX` env is + set, shell out to `tmux split-window`." + +What's not good: + +- It requires the user to be running in a multiplexer. Outside + one (plain Terminal.app, vscode integrated terminal, etc.) + the verb either falls back to handoff or just fails. +- It splits the operator's mental model. The dashboard is one + window, claude-code panes are other windows; the dashboard's + "agents pane" no longer matches the visible reality (some + agents have an attached pane, others don't, and the dashboard + doesn't know which). +- We don't actually own the layout. tmux's pane sizing rules + are tmux's; the dashboard can't enforce "agent pane on the + right, dashboard on the left." + +This is the right answer if the project decides "we shouldn't +be a multiplexer; let tmux be a multiplexer." It's the wrong +answer if the dashboard wants to *be* the operator's primary +surface — which the user's question implies it does. + +## Recommendation + +**Build Option 1 first** (handoff). It clears the bar of "let me +talk to claude without quitting the dashboard," it's a +~30-line patch over the existing `e` / `p` infrastructure, and +it carries no new dependencies. Critically it lets the user +*confirm whether the workflow actually wants embedded panes* — +if option 1 turns out to be enough in practice (Enter to drop +in, Ctrl-D back out, watch proposals between sessions), the +embedded-emulator complexity is unnecessary. + +**Treat Option 2 as a six-week project to plan only if Option 1 +is observably insufficient.** The pyte dep is acceptable but +the integration is real engineering — terminal emulator +correctness, pane sizing, input routing, multi-PTY state +management. Schedule it; don't bolt it on. If the user lives in +3-bottle days and needs simultaneous output, this is the option; +otherwise it's premature optimization. + +**Treat Option 3 as a niche augmentation, not a primary path.** +Detect `TMUX` / `WEZTERM_PANE` / `ITERM_SESSION_ID` at startup; +if present, add a `split` keybinding that delegates pane +creation to the multiplexer. The dashboard remains the primary +interface; the multiplexer is convenience for power users. + +## Followups worth checking before committing + +- **What does claude-code's PTY-resize behavior look like at + small heights?** Drive it at 24×40, 10×80, 8×40 and see if it + blows up. The dashboard's bottom pane is going to be small. +- **Is there a way to `docker exec` into an *already-running + exec* rather than spawning a new claude process per attach?** + If not, the dashboard needs its own session-state model: "is + there an exec running for this slug? attach to it. otherwise + start one." +- **How does claude-code handle SIGWINCH at the moment?** If it + re-layouts cleanly the embedded-emulator story gets easier; + if it corrupts, layer 2 needs special-case handling. +- **Does `docker exec -i` (no `-t`) preserve enough of the TTY + contract for claude-code to start at all?** Some apps refuse + to launch without a real TTY; the embedded emulator option + needs the PTY allocated on the host side and the exec + re-attached to it, which is the harder of the two options. + +## References + +- PRD 0019 — active agents pane + selection model (the + selection-source for whichever option lands) +- PRD 0018 chunk 3 — agent container runs `sleep infinity`; + claude is invoked via `docker exec -it` (the + attachment-point this doc is layering against) +- `claude_bottle/cli/dashboard.py:_operator_edit_flow` — the + existing `curses.endwin` → shell out → `stdscr.refresh()` + pattern Option 1 would clone +- pyte: — the candidate + terminal-emulator library if Option 2 is picked