docs(research): claude-code pane in the dashboard #43
@@ -0,0 +1,285 @@
|
||||
# Claude-code pane in the dashboard
|
||||
|
||||
## Question
|
||||
|
||||
The dashboard today shows pending proposals (top pane) and active
|
||||
agents (bottom pane, PRD 0019). Selecting an agent and pressing
|
||||
`e` / `p` invokes operator-scoped edits. The next move the user
|
||||
wants is **a way to interact with the claude-code session inside
|
||||
the selected bottle without leaving the dashboard** — type at it,
|
||||
read its output, return focus to the dashboard.
|
||||
|
||||
What's the cheapest path to that, and where does it bottom out?
|
||||
|
||||
## What "interact" means
|
||||
|
||||
Today the flow is bimodal. `./cli.py start <agent>` brings the
|
||||
bottle up and immediately drops you into an interactive
|
||||
`docker exec -it claude-bottle-<slug> claude ...` — claude-code
|
||||
owns the whole terminal until you Ctrl-D out, at which point the
|
||||
bottle tears down. The dashboard (`./cli.py dashboard`) is a
|
||||
*separate* invocation that watches across bottles but never
|
||||
exposes the claude TUI itself.
|
||||
|
||||
The user wants the dashboard to *also* be a claude-code session
|
||||
host: one of the dashboard's panes (or a press-key-to-focus
|
||||
mode) is a live claude-code terminal connected to the agent
|
||||
container the operator is sitting on in the agents pane.
|
||||
|
||||
That changes the dashboard's job from "screen of metadata" to
|
||||
"terminal multiplexer that also draws metadata." The interesting
|
||||
question is whether that change is small or unbounded.
|
||||
|
||||
## The core problem
|
||||
|
||||
claude-code is a TUI in its own right. It runs as an
|
||||
interactive Node process, expects a real PTY, drives its own
|
||||
cursor positioning, color, mouse events, and key bindings. The
|
||||
dashboard is *also* a TUI (curses), and curses owns the
|
||||
terminal's input + output stream while it's active.
|
||||
|
||||
Two TUIs sharing one terminal can't both be "running" without
|
||||
one of them giving up screen control to the other. The decision
|
||||
shape is which one yields, and where the boundary lives.
|
||||
|
||||
There are exactly three realistic ways to resolve this:
|
||||
|
||||
1. **Handoff** — the dashboard releases the terminal when the
|
||||
user wants to talk to claude-code, claude-code takes over
|
||||
full-screen, and the dashboard re-takes control when
|
||||
claude-code exits or is detached. Like how `e` (routes
|
||||
edit) already shells to `$EDITOR` today
|
||||
(`curses.endwin()` → run editor → `stdscr.refresh()`).
|
||||
2. **Embedded emulator** — the dashboard runs claude-code in a
|
||||
PTY it owns, parses claude-code's ANSI escape stream
|
||||
ourselves, and paints the resulting cell grid into a
|
||||
curses pane. Keypresses inside the pane get routed to the
|
||||
PTY's stdin; the dashboard renders metadata in the other
|
||||
panes alongside.
|
||||
3. **External multiplexer** — the dashboard doesn't render the
|
||||
claude-code session at all. It asks tmux / screen / a
|
||||
terminal emulator to open it in a real adjacent pane (split
|
||||
window, new tab), and treats the multiplexer as the
|
||||
coordinator instead of trying to be one.
|
||||
|
||||
Below are the actual costs.
|
||||
|
||||
## Option 1: Handoff
|
||||
|
||||
The dashboard sees a key (say Enter on a selected agent in the
|
||||
agents pane). It calls `curses.endwin()`, then `subprocess.run(
|
||||
["docker", "exec", "-it", "claude-bottle-<slug>", "claude",
|
||||
"--dangerously-skip-permissions"])`. claude-code takes the
|
||||
terminal full-screen. When the operator exits claude-code
|
||||
(Ctrl-D, `/exit`), the subprocess returns; the dashboard calls
|
||||
`stdscr.refresh()` to redraw and resume.
|
||||
|
||||
What's good:
|
||||
|
||||
- It's ~20 lines of code. The plumbing (`curses.endwin` /
|
||||
refresh + a `docker exec`) already exists for the editor flow.
|
||||
- Zero new dependencies. claude-code runs in its real PTY exactly
|
||||
the way it does today.
|
||||
- No "embedded TUI inside another TUI" weirdness. Keybinding
|
||||
collisions, terminal-resize stories, scrollback are all
|
||||
whatever claude-code already does.
|
||||
- Already-running session reuse: a bottle's agent container
|
||||
runs `sleep infinity` and `docker exec`s claude in on-demand
|
||||
(PRD 0018 chunk 3). Re-entering with another `exec` would
|
||||
start a *second* claude process; we'd want to either attach
|
||||
to the first one (tricky — `docker exec` doesn't have an
|
||||
"attach to existing exec" verb) or treat first-time entry as
|
||||
"start the session" and stash a marker so re-entry is a
|
||||
resume rather than a fresh process.
|
||||
|
||||
What's not good:
|
||||
|
||||
- It's not really "a pane in the dashboard." It's "press Enter
|
||||
to leave the dashboard, talk to claude, come back." The user
|
||||
wanted side-by-side; this is side-by-time.
|
||||
- The dashboard can't auto-refresh while claude-code has the
|
||||
terminal. If a new proposal lands while you're in the claude
|
||||
session, you won't see it until you exit.
|
||||
- Notifications during the claude session need a separate
|
||||
channel (sound? OS notification?). Otherwise the operator's
|
||||
reason for using the dashboard — "watch everything in one
|
||||
place" — partially evaporates.
|
||||
|
||||
This is the v1 the project's existing code-shape strongly
|
||||
prefers. It clears the bar of "let me talk to claude-code
|
||||
without quitting `./cli.py dashboard`."
|
||||
|
||||
## Option 2: Embedded emulator
|
||||
|
||||
The dashboard opens a PTY (stdlib `pty` module), spawns
|
||||
`docker exec -it … claude` attached to it, and runs a terminal
|
||||
emulator in-process that consumes claude-code's output stream
|
||||
and maintains a virtual screen buffer. The buffer's current
|
||||
state gets painted into a curses pane every refresh tick.
|
||||
Keypresses received inside the focused pane get written to the
|
||||
PTY's input fd.
|
||||
|
||||
This is what tmux does. It is also what every "terminal in
|
||||
a TUI" demo does. The challenge is everything between "run a
|
||||
PTY" and "render its output correctly."
|
||||
|
||||
What you need to implement (or take as a dep):
|
||||
|
||||
- **ANSI/VT escape parsing.** claude-code uses xterm-class
|
||||
escape sequences for cursor positioning, color, scroll
|
||||
regions, alternate screen buffer (for the prompt UI), mouse
|
||||
reporting, and so on. The full xterm spec is dozens of pages.
|
||||
Sloppy parsing produces a corrupted display the user will
|
||||
hate.
|
||||
- **A screen buffer model.** Cells with attributes
|
||||
(foreground, background, bold, underline, italic, inverse).
|
||||
Cursor position. Saved cursor. Alternate screen. Scrollback.
|
||||
- **Resize protocol.** claude-code asks the PTY its size via
|
||||
`TIOCGWINSZ` and re-layouts on `SIGWINCH`. The dashboard has
|
||||
to size the PTY to the pane it's rendering into and propagate
|
||||
SIGWINCH when curses says the terminal resized.
|
||||
- **Input routing.** When the pane has focus, keypresses
|
||||
written to the PTY. When the dashboard has focus, keypresses
|
||||
consumed by the dashboard. Define an escape sequence (like
|
||||
tmux's `Ctrl-B`) that toggles focus, and document that
|
||||
claude-code's own use of that key sequence is now intercepted.
|
||||
- **Output throttling.** claude-code can emit megabytes of
|
||||
tokens in a streaming response. The dashboard's 1s refresh
|
||||
tick is too slow to render character-at-a-time; you want the
|
||||
PTY reader to coalesce and the renderer to render on a
|
||||
smaller cadence than the main loop's `getch` timeout.
|
||||
|
||||
The stdlib has `pty` (the spawn side) and you can read/write
|
||||
the master fd by hand. It does **not** have an ANSI parser; the
|
||||
established Python library for this is `pyte`
|
||||
([pyte.readthedocs.io](https://pyte.readthedocs.io/)) — pure
|
||||
Python, MIT-licensed, no transitive deps. ~3k lines. It would
|
||||
be the project's first runtime dependency beyond stdlib.
|
||||
|
||||
Even with `pyte`, the integration is non-trivial: you're
|
||||
re-rendering a 24x80-ish (or whatever fits) screen buffer into
|
||||
curses cells on every tick, dealing with attribute mapping
|
||||
(pyte's color enum → curses color pair), and handling mouse
|
||||
events through the pane. Plan on ~600–1200 lines, not 200.
|
||||
|
||||
Open trap-doors:
|
||||
|
||||
- **Claude-code uses bracketed paste, alternate screen, and
|
||||
occasionally raw terminal control for its prompt input.**
|
||||
Some of these features stress the emulator harder than `vim`
|
||||
does — alt-screen has to be supported or claude's
|
||||
command-prompt UI corrupts the line above it. `pyte` claims to
|
||||
handle alt-screen; verify before committing.
|
||||
- **Scrollback in claude is `/transcript`-driven, not terminal
|
||||
scrollback.** A small pane height means you only see the last
|
||||
10–20 lines of output without leaving the dashboard, which is
|
||||
the wrong shape for a 200-line streaming response. You'd
|
||||
want to make the pane resizable or open a full-height
|
||||
"expand" mode (which is just option 1, the handoff, with
|
||||
extra steps).
|
||||
- **Multiple agents = multiple PTYs running concurrently.** If
|
||||
the user wants to monitor 3 bottles, the dashboard is now
|
||||
holding 3 PTYs open and parsing 3 ANSI streams in parallel.
|
||||
Memory + CPU costs are bounded but nonzero; design the
|
||||
PTY-per-agent state machine carefully.
|
||||
|
||||
This is the option that delivers the "pane in the dashboard"
|
||||
literal request. It's the right answer if the user's day-to-day
|
||||
involves watching multiple bottles' output simultaneously
|
||||
without context-switching. It's the wrong answer if they mostly
|
||||
want one focused session at a time with proposals visible.
|
||||
|
||||
## Option 3: External multiplexer
|
||||
|
||||
The dashboard binds a key (e.g. `Enter` on agent) to
|
||||
`tmux split-window -h 'docker exec -it claude-bottle-<slug>
|
||||
claude'` when run inside a tmux session, or to `osascript`-
|
||||
driven iTerm pane spawning on macOS, or to `wezterm cli
|
||||
spawn` if the user is on wezterm.
|
||||
|
||||
What's good:
|
||||
|
||||
- The "real terminal in a real pane" is solved by tools the
|
||||
user already trusts. tmux's terminal emulation is correct;
|
||||
iTerm's is correct; wezterm's is correct. We're not
|
||||
reimplementing any of them.
|
||||
- Multi-bottle parallelism is automatic — the user opens one
|
||||
pane per agent, the multiplexer renders them.
|
||||
- Implementation cost is tiny: ~50 lines of "if `TMUX` env is
|
||||
set, shell out to `tmux split-window`."
|
||||
|
||||
What's not good:
|
||||
|
||||
- It requires the user to be running in a multiplexer. Outside
|
||||
one (plain Terminal.app, vscode integrated terminal, etc.)
|
||||
the verb either falls back to handoff or just fails.
|
||||
- It splits the operator's mental model. The dashboard is one
|
||||
window, claude-code panes are other windows; the dashboard's
|
||||
"agents pane" no longer matches the visible reality (some
|
||||
agents have an attached pane, others don't, and the dashboard
|
||||
doesn't know which).
|
||||
- We don't actually own the layout. tmux's pane sizing rules
|
||||
are tmux's; the dashboard can't enforce "agent pane on the
|
||||
right, dashboard on the left."
|
||||
|
||||
This is the right answer if the project decides "we shouldn't
|
||||
be a multiplexer; let tmux be a multiplexer." It's the wrong
|
||||
answer if the dashboard wants to *be* the operator's primary
|
||||
surface — which the user's question implies it does.
|
||||
|
||||
## Recommendation
|
||||
|
||||
**Build Option 1 first** (handoff). It clears the bar of "let me
|
||||
talk to claude without quitting the dashboard," it's a
|
||||
~30-line patch over the existing `e` / `p` infrastructure, and
|
||||
it carries no new dependencies. Critically it lets the user
|
||||
*confirm whether the workflow actually wants embedded panes* —
|
||||
if option 1 turns out to be enough in practice (Enter to drop
|
||||
in, Ctrl-D back out, watch proposals between sessions), the
|
||||
embedded-emulator complexity is unnecessary.
|
||||
|
||||
**Treat Option 2 as a six-week project to plan only if Option 1
|
||||
is observably insufficient.** The pyte dep is acceptable but
|
||||
the integration is real engineering — terminal emulator
|
||||
correctness, pane sizing, input routing, multi-PTY state
|
||||
management. Schedule it; don't bolt it on. If the user lives in
|
||||
3-bottle days and needs simultaneous output, this is the option;
|
||||
otherwise it's premature optimization.
|
||||
|
||||
**Treat Option 3 as a niche augmentation, not a primary path.**
|
||||
Detect `TMUX` / `WEZTERM_PANE` / `ITERM_SESSION_ID` at startup;
|
||||
if present, add a `split` keybinding that delegates pane
|
||||
creation to the multiplexer. The dashboard remains the primary
|
||||
interface; the multiplexer is convenience for power users.
|
||||
|
||||
## Followups worth checking before committing
|
||||
|
||||
- **What does claude-code's PTY-resize behavior look like at
|
||||
small heights?** Drive it at 24×40, 10×80, 8×40 and see if it
|
||||
blows up. The dashboard's bottom pane is going to be small.
|
||||
- **Is there a way to `docker exec` into an *already-running
|
||||
exec* rather than spawning a new claude process per attach?**
|
||||
If not, the dashboard needs its own session-state model: "is
|
||||
there an exec running for this slug? attach to it. otherwise
|
||||
start one."
|
||||
- **How does claude-code handle SIGWINCH at the moment?** If it
|
||||
re-layouts cleanly the embedded-emulator story gets easier;
|
||||
if it corrupts, layer 2 needs special-case handling.
|
||||
- **Does `docker exec -i` (no `-t`) preserve enough of the TTY
|
||||
contract for claude-code to start at all?** Some apps refuse
|
||||
to launch without a real TTY; the embedded emulator option
|
||||
needs the PTY allocated on the host side and the exec
|
||||
re-attached to it, which is the harder of the two options.
|
||||
|
||||
## References
|
||||
|
||||
- PRD 0019 — active agents pane + selection model (the
|
||||
selection-source for whichever option lands)
|
||||
- PRD 0018 chunk 3 — agent container runs `sleep infinity`;
|
||||
claude is invoked via `docker exec -it` (the
|
||||
attachment-point this doc is layering against)
|
||||
- `claude_bottle/cli/dashboard.py:_operator_edit_flow` — the
|
||||
existing `curses.endwin` → shell out → `stdscr.refresh()`
|
||||
pattern Option 1 would clone
|
||||
- pyte: <https://pyte.readthedocs.io/> — the candidate
|
||||
terminal-emulator library if Option 2 is picked
|
||||
Reference in New Issue
Block a user