docs(research): claude-code pane in the dashboard
test / integration (pull_request) Successful in 1m8s
test / unit (pull_request) Successful in 17s
test / unit (push) Successful in 17s
test / integration (push) Successful in 1m2s

Survey the three realistic ways to surface a claude-code session
inside the dashboard TUI:

  1. Handoff — drop curses, foreground claude, restore on exit
     (the existing `e`/`p` pattern, extended). Minimal code,
     side-by-time rather than side-by-side.
  2. Embedded emulator — own a PTY, parse claude-code's ANSI
     stream via `pyte`, paint it into a curses pane. Real
     "pane in the dashboard" but a six-week build with one new
     dep and several integration trap-doors (alt-screen, resize,
     input routing, multi-PTY state).
  3. External multiplexer — delegate pane creation to tmux /
     iTerm / wezterm when detected. Tiny code, but splits the
     operator's mental model and gives up layout control.

Recommendation: ship Option 1 first; defer Option 2 to "only if
Option 1 is observably insufficient"; treat Option 3 as a
niche augmentation for power users.

Calls out four followups worth verifying before committing
(PTY behavior at small sizes, attach-to-existing-exec, SIGWINCH
handling, `-it` vs `-i` for the embedded path).
This commit was merged in pull request #43.
This commit is contained in:
2026-05-26 02:51:08 -04:00
parent 942d3a387a
commit 8cd867f3d2
@@ -0,0 +1,285 @@
# Claude-code pane in the dashboard
## Question
The dashboard today shows pending proposals (top pane) and active
agents (bottom pane, PRD 0019). Selecting an agent and pressing
`e` / `p` invokes operator-scoped edits. The next move the user
wants is **a way to interact with the claude-code session inside
the selected bottle without leaving the dashboard** — type at it,
read its output, return focus to the dashboard.
What's the cheapest path to that, and where does it bottom out?
## What "interact" means
Today the flow is bimodal. `./cli.py start <agent>` brings the
bottle up and immediately drops you into an interactive
`docker exec -it claude-bottle-<slug> claude ...` — claude-code
owns the whole terminal until you Ctrl-D out, at which point the
bottle tears down. The dashboard (`./cli.py dashboard`) is a
*separate* invocation that watches across bottles but never
exposes the claude TUI itself.
The user wants the dashboard to *also* be a claude-code session
host: one of the dashboard's panes (or a press-key-to-focus
mode) is a live claude-code terminal connected to the agent
container the operator is sitting on in the agents pane.
That changes the dashboard's job from "screen of metadata" to
"terminal multiplexer that also draws metadata." The interesting
question is whether that change is small or unbounded.
## The core problem
claude-code is a TUI in its own right. It runs as an
interactive Node process, expects a real PTY, drives its own
cursor positioning, color, mouse events, and key bindings. The
dashboard is *also* a TUI (curses), and curses owns the
terminal's input + output stream while it's active.
Two TUIs sharing one terminal can't both be "running" without
one of them giving up screen control to the other. The decision
shape is which one yields, and where the boundary lives.
There are exactly three realistic ways to resolve this:
1. **Handoff** — the dashboard releases the terminal when the
user wants to talk to claude-code, claude-code takes over
full-screen, and the dashboard re-takes control when
claude-code exits or is detached. Like how `e` (routes
edit) already shells to `$EDITOR` today
(`curses.endwin()` → run editor → `stdscr.refresh()`).
2. **Embedded emulator** — the dashboard runs claude-code in a
PTY it owns, parses claude-code's ANSI escape stream
ourselves, and paints the resulting cell grid into a
curses pane. Keypresses inside the pane get routed to the
PTY's stdin; the dashboard renders metadata in the other
panes alongside.
3. **External multiplexer** — the dashboard doesn't render the
claude-code session at all. It asks tmux / screen / a
terminal emulator to open it in a real adjacent pane (split
window, new tab), and treats the multiplexer as the
coordinator instead of trying to be one.
Below are the actual costs.
## Option 1: Handoff
The dashboard sees a key (say Enter on a selected agent in the
agents pane). It calls `curses.endwin()`, then `subprocess.run(
["docker", "exec", "-it", "claude-bottle-<slug>", "claude",
"--dangerously-skip-permissions"])`. claude-code takes the
terminal full-screen. When the operator exits claude-code
(Ctrl-D, `/exit`), the subprocess returns; the dashboard calls
`stdscr.refresh()` to redraw and resume.
What's good:
- It's ~20 lines of code. The plumbing (`curses.endwin` /
refresh + a `docker exec`) already exists for the editor flow.
- Zero new dependencies. claude-code runs in its real PTY exactly
the way it does today.
- No "embedded TUI inside another TUI" weirdness. Keybinding
collisions, terminal-resize stories, scrollback are all
whatever claude-code already does.
- Already-running session reuse: a bottle's agent container
runs `sleep infinity` and `docker exec`s claude in on-demand
(PRD 0018 chunk 3). Re-entering with another `exec` would
start a *second* claude process; we'd want to either attach
to the first one (tricky — `docker exec` doesn't have an
"attach to existing exec" verb) or treat first-time entry as
"start the session" and stash a marker so re-entry is a
resume rather than a fresh process.
What's not good:
- It's not really "a pane in the dashboard." It's "press Enter
to leave the dashboard, talk to claude, come back." The user
wanted side-by-side; this is side-by-time.
- The dashboard can't auto-refresh while claude-code has the
terminal. If a new proposal lands while you're in the claude
session, you won't see it until you exit.
- Notifications during the claude session need a separate
channel (sound? OS notification?). Otherwise the operator's
reason for using the dashboard — "watch everything in one
place" — partially evaporates.
This is the v1 the project's existing code-shape strongly
prefers. It clears the bar of "let me talk to claude-code
without quitting `./cli.py dashboard`."
## Option 2: Embedded emulator
The dashboard opens a PTY (stdlib `pty` module), spawns
`docker exec -it … claude` attached to it, and runs a terminal
emulator in-process that consumes claude-code's output stream
and maintains a virtual screen buffer. The buffer's current
state gets painted into a curses pane every refresh tick.
Keypresses received inside the focused pane get written to the
PTY's input fd.
This is what tmux does. It is also what every "terminal in
a TUI" demo does. The challenge is everything between "run a
PTY" and "render its output correctly."
What you need to implement (or take as a dep):
- **ANSI/VT escape parsing.** claude-code uses xterm-class
escape sequences for cursor positioning, color, scroll
regions, alternate screen buffer (for the prompt UI), mouse
reporting, and so on. The full xterm spec is dozens of pages.
Sloppy parsing produces a corrupted display the user will
hate.
- **A screen buffer model.** Cells with attributes
(foreground, background, bold, underline, italic, inverse).
Cursor position. Saved cursor. Alternate screen. Scrollback.
- **Resize protocol.** claude-code asks the PTY its size via
`TIOCGWINSZ` and re-layouts on `SIGWINCH`. The dashboard has
to size the PTY to the pane it's rendering into and propagate
SIGWINCH when curses says the terminal resized.
- **Input routing.** When the pane has focus, keypresses
written to the PTY. When the dashboard has focus, keypresses
consumed by the dashboard. Define an escape sequence (like
tmux's `Ctrl-B`) that toggles focus, and document that
claude-code's own use of that key sequence is now intercepted.
- **Output throttling.** claude-code can emit megabytes of
tokens in a streaming response. The dashboard's 1s refresh
tick is too slow to render character-at-a-time; you want the
PTY reader to coalesce and the renderer to render on a
smaller cadence than the main loop's `getch` timeout.
The stdlib has `pty` (the spawn side) and you can read/write
the master fd by hand. It does **not** have an ANSI parser; the
established Python library for this is `pyte`
([pyte.readthedocs.io](https://pyte.readthedocs.io/)) — pure
Python, MIT-licensed, no transitive deps. ~3k lines. It would
be the project's first runtime dependency beyond stdlib.
Even with `pyte`, the integration is non-trivial: you're
re-rendering a 24x80-ish (or whatever fits) screen buffer into
curses cells on every tick, dealing with attribute mapping
(pyte's color enum → curses color pair), and handling mouse
events through the pane. Plan on ~6001200 lines, not 200.
Open trap-doors:
- **Claude-code uses bracketed paste, alternate screen, and
occasionally raw terminal control for its prompt input.**
Some of these features stress the emulator harder than `vim`
does — alt-screen has to be supported or claude's
command-prompt UI corrupts the line above it. `pyte` claims to
handle alt-screen; verify before committing.
- **Scrollback in claude is `/transcript`-driven, not terminal
scrollback.** A small pane height means you only see the last
1020 lines of output without leaving the dashboard, which is
the wrong shape for a 200-line streaming response. You'd
want to make the pane resizable or open a full-height
"expand" mode (which is just option 1, the handoff, with
extra steps).
- **Multiple agents = multiple PTYs running concurrently.** If
the user wants to monitor 3 bottles, the dashboard is now
holding 3 PTYs open and parsing 3 ANSI streams in parallel.
Memory + CPU costs are bounded but nonzero; design the
PTY-per-agent state machine carefully.
This is the option that delivers the "pane in the dashboard"
literal request. It's the right answer if the user's day-to-day
involves watching multiple bottles' output simultaneously
without context-switching. It's the wrong answer if they mostly
want one focused session at a time with proposals visible.
## Option 3: External multiplexer
The dashboard binds a key (e.g. `Enter` on agent) to
`tmux split-window -h 'docker exec -it claude-bottle-<slug>
claude'` when run inside a tmux session, or to `osascript`-
driven iTerm pane spawning on macOS, or to `wezterm cli
spawn` if the user is on wezterm.
What's good:
- The "real terminal in a real pane" is solved by tools the
user already trusts. tmux's terminal emulation is correct;
iTerm's is correct; wezterm's is correct. We're not
reimplementing any of them.
- Multi-bottle parallelism is automatic — the user opens one
pane per agent, the multiplexer renders them.
- Implementation cost is tiny: ~50 lines of "if `TMUX` env is
set, shell out to `tmux split-window`."
What's not good:
- It requires the user to be running in a multiplexer. Outside
one (plain Terminal.app, vscode integrated terminal, etc.)
the verb either falls back to handoff or just fails.
- It splits the operator's mental model. The dashboard is one
window, claude-code panes are other windows; the dashboard's
"agents pane" no longer matches the visible reality (some
agents have an attached pane, others don't, and the dashboard
doesn't know which).
- We don't actually own the layout. tmux's pane sizing rules
are tmux's; the dashboard can't enforce "agent pane on the
right, dashboard on the left."
This is the right answer if the project decides "we shouldn't
be a multiplexer; let tmux be a multiplexer." It's the wrong
answer if the dashboard wants to *be* the operator's primary
surface — which the user's question implies it does.
## Recommendation
**Build Option 1 first** (handoff). It clears the bar of "let me
talk to claude without quitting the dashboard," it's a
~30-line patch over the existing `e` / `p` infrastructure, and
it carries no new dependencies. Critically it lets the user
*confirm whether the workflow actually wants embedded panes*
if option 1 turns out to be enough in practice (Enter to drop
in, Ctrl-D back out, watch proposals between sessions), the
embedded-emulator complexity is unnecessary.
**Treat Option 2 as a six-week project to plan only if Option 1
is observably insufficient.** The pyte dep is acceptable but
the integration is real engineering — terminal emulator
correctness, pane sizing, input routing, multi-PTY state
management. Schedule it; don't bolt it on. If the user lives in
3-bottle days and needs simultaneous output, this is the option;
otherwise it's premature optimization.
**Treat Option 3 as a niche augmentation, not a primary path.**
Detect `TMUX` / `WEZTERM_PANE` / `ITERM_SESSION_ID` at startup;
if present, add a `split` keybinding that delegates pane
creation to the multiplexer. The dashboard remains the primary
interface; the multiplexer is convenience for power users.
## Followups worth checking before committing
- **What does claude-code's PTY-resize behavior look like at
small heights?** Drive it at 24×40, 10×80, 8×40 and see if it
blows up. The dashboard's bottom pane is going to be small.
- **Is there a way to `docker exec` into an *already-running
exec* rather than spawning a new claude process per attach?**
If not, the dashboard needs its own session-state model: "is
there an exec running for this slug? attach to it. otherwise
start one."
- **How does claude-code handle SIGWINCH at the moment?** If it
re-layouts cleanly the embedded-emulator story gets easier;
if it corrupts, layer 2 needs special-case handling.
- **Does `docker exec -i` (no `-t`) preserve enough of the TTY
contract for claude-code to start at all?** Some apps refuse
to launch without a real TTY; the embedded emulator option
needs the PTY allocated on the host side and the exec
re-attached to it, which is the harder of the two options.
## References
- PRD 0019 — active agents pane + selection model (the
selection-source for whichever option lands)
- PRD 0018 chunk 3 — agent container runs `sleep infinity`;
claude is invoked via `docker exec -it` (the
attachment-point this doc is layering against)
- `claude_bottle/cli/dashboard.py:_operator_edit_flow` — the
existing `curses.endwin` → shell out → `stdscr.refresh()`
pattern Option 1 would clone
- pyte: <https://pyte.readthedocs.io/> — the candidate
terminal-emulator library if Option 2 is picked