bot-bottle/docs/research/claude-code-pane-in-dashboard.md

# Claude-code pane in the dashboard

## Question

The dashboard today shows pending proposals (top pane) and active
agents (bottom pane, PRD 0019). Selecting an agent and pressing
`e` / `p` invokes operator-scoped edits. The next move the user
wants is **a way to interact with the claude-code session inside
the selected bottle without leaving the dashboard** — type at it,
read its output, return focus to the dashboard.

What's the cheapest path to that, and where does it bottom out?

## What "interact" means

Today the flow is bimodal. `./cli.py start <agent>` brings the
bottle up and immediately drops you into an interactive
`docker exec -it claude-bottle-<slug> claude ...` — claude-code
owns the whole terminal until you Ctrl-D out, at which point the
bottle tears down. The dashboard (`./cli.py dashboard`) is a
*separate* invocation that watches across bottles but never
exposes the claude TUI itself.

The user wants the dashboard to *also* be a claude-code session
host: one of the dashboard's panes (or a press-key-to-focus
mode) is a live claude-code terminal connected to the agent
container the operator is sitting on in the agents pane.

That changes the dashboard's job from "screen of metadata" to
"terminal multiplexer that also draws metadata." The interesting
question is whether that change is small or unbounded.

## The core problem

claude-code is a TUI in its own right. It runs as an
interactive Node process, expects a real PTY, drives its own
cursor positioning, color, mouse events, and key bindings. The
dashboard is *also* a TUI (curses), and curses owns the
terminal's input + output stream while it's active.

Two TUIs sharing one terminal can't both be "running" without
one of them giving up screen control to the other. The decision
shape is which one yields, and where the boundary lives.

There are exactly three realistic ways to resolve this:

  1. **Handoff** — the dashboard releases the terminal when the
     user wants to talk to claude-code, claude-code takes over
     full-screen, and the dashboard re-takes control when
     claude-code exits or is detached. Like how `e` (routes
     edit) already shells to `$EDITOR` today
     (`curses.endwin()` → run editor → `stdscr.refresh()`).
  2. **Embedded emulator** — the dashboard runs claude-code in a
     PTY it owns, parses claude-code's ANSI escape stream
     ourselves, and paints the resulting cell grid into a
     curses pane. Keypresses inside the pane get routed to the
     PTY's stdin; the dashboard renders metadata in the other
     panes alongside.
  3. **External multiplexer** — the dashboard doesn't render the
     claude-code session at all. It asks tmux / screen / a
     terminal emulator to open it in a real adjacent pane (split
     window, new tab), and treats the multiplexer as the
     coordinator instead of trying to be one.

Below are the actual costs.

## Option 1: Handoff

The dashboard sees a key (say Enter on a selected agent in the
agents pane). It calls `curses.endwin()`, then `subprocess.run(
["docker", "exec", "-it", "claude-bottle-<slug>", "claude",
"--dangerously-skip-permissions"])`. claude-code takes the
terminal full-screen. When the operator exits claude-code
(Ctrl-D, `/exit`), the subprocess returns; the dashboard calls
`stdscr.refresh()` to redraw and resume.

What's good:

- It's ~20 lines of code. The plumbing (`curses.endwin` /
  refresh + a `docker exec`) already exists for the editor flow.
- Zero new dependencies. claude-code runs in its real PTY exactly
  the way it does today.
- No "embedded TUI inside another TUI" weirdness. Keybinding
  collisions, terminal-resize stories, scrollback are all
  whatever claude-code already does.
- Already-running session reuse: a bottle's agent container
  runs `sleep infinity` and `docker exec`s claude in on-demand
  (PRD 0018 chunk 3). Re-entering with another `exec` would
  start a *second* claude process; we'd want to either attach
  to the first one (tricky — `docker exec` doesn't have an
  "attach to existing exec" verb) or treat first-time entry as
  "start the session" and stash a marker so re-entry is a
  resume rather than a fresh process.

What's not good:

- It's not really "a pane in the dashboard." It's "press Enter
  to leave the dashboard, talk to claude, come back." The user
  wanted side-by-side; this is side-by-time.
- The dashboard can't auto-refresh while claude-code has the
  terminal. If a new proposal lands while you're in the claude
  session, you won't see it until you exit.
- Notifications during the claude session need a separate
  channel (sound? OS notification?). Otherwise the operator's
  reason for using the dashboard — "watch everything in one
  place" — partially evaporates.

This is the v1 the project's existing code-shape strongly
prefers. It clears the bar of "let me talk to claude-code
without quitting `./cli.py dashboard`."

## Option 2: Embedded emulator

The dashboard opens a PTY (stdlib `pty` module), spawns
`docker exec -it … claude` attached to it, and runs a terminal
emulator in-process that consumes claude-code's output stream
and maintains a virtual screen buffer. The buffer's current
state gets painted into a curses pane every refresh tick.
Keypresses received inside the focused pane get written to the
PTY's input fd.

This is what tmux does. It is also what every "terminal in
a TUI" demo does. The challenge is everything between "run a
PTY" and "render its output correctly."

What you need to implement (or take as a dep):

- **ANSI/VT escape parsing.** claude-code uses xterm-class
  escape sequences for cursor positioning, color, scroll
  regions, alternate screen buffer (for the prompt UI), mouse
  reporting, and so on. The full xterm spec is dozens of pages.
  Sloppy parsing produces a corrupted display the user will
  hate.
- **A screen buffer model.** Cells with attributes
  (foreground, background, bold, underline, italic, inverse).
  Cursor position. Saved cursor. Alternate screen. Scrollback.
- **Resize protocol.** claude-code asks the PTY its size via
  `TIOCGWINSZ` and re-layouts on `SIGWINCH`. The dashboard has
  to size the PTY to the pane it's rendering into and propagate
  SIGWINCH when curses says the terminal resized.
- **Input routing.** When the pane has focus, keypresses
  written to the PTY. When the dashboard has focus, keypresses
  consumed by the dashboard. Define an escape sequence (like
  tmux's `Ctrl-B`) that toggles focus, and document that
  claude-code's own use of that key sequence is now intercepted.
- **Output throttling.** claude-code can emit megabytes of
  tokens in a streaming response. The dashboard's 1s refresh
  tick is too slow to render character-at-a-time; you want the
  PTY reader to coalesce and the renderer to render on a
  smaller cadence than the main loop's `getch` timeout.

The stdlib has `pty` (the spawn side) and you can read/write
the master fd by hand. It does **not** have an ANSI parser; the
established Python library for this is `pyte`
([pyte.readthedocs.io](https://pyte.readthedocs.io/)) — pure
Python, MIT-licensed, no transitive deps. ~3k lines. It would
be the project's first runtime dependency beyond stdlib.

Even with `pyte`, the integration is non-trivial: you're
re-rendering a 24x80-ish (or whatever fits) screen buffer into
curses cells on every tick, dealing with attribute mapping
(pyte's color enum → curses color pair), and handling mouse
events through the pane. Plan on ~600–1200 lines, not 200.

Open trap-doors:

- **Claude-code uses bracketed paste, alternate screen, and
  occasionally raw terminal control for its prompt input.**
  Some of these features stress the emulator harder than `vim`
  does — alt-screen has to be supported or claude's
  command-prompt UI corrupts the line above it. `pyte` claims to
  handle alt-screen; verify before committing.
- **Scrollback in claude is `/transcript`-driven, not terminal
  scrollback.** A small pane height means you only see the last
  10–20 lines of output without leaving the dashboard, which is
  the wrong shape for a 200-line streaming response. You'd
  want to make the pane resizable or open a full-height
  "expand" mode (which is just option 1, the handoff, with
  extra steps).
- **Multiple agents = multiple PTYs running concurrently.** If
  the user wants to monitor 3 bottles, the dashboard is now
  holding 3 PTYs open and parsing 3 ANSI streams in parallel.
  Memory + CPU costs are bounded but nonzero; design the
  PTY-per-agent state machine carefully.

This is the option that delivers the "pane in the dashboard"
literal request. It's the right answer if the user's day-to-day
involves watching multiple bottles' output simultaneously
without context-switching. It's the wrong answer if they mostly
want one focused session at a time with proposals visible.

## Option 3: External multiplexer

The dashboard binds a key (e.g. `Enter` on agent) to
`tmux split-window -h 'docker exec -it claude-bottle-<slug>
claude'` when run inside a tmux session, or to `osascript`-
driven iTerm pane spawning on macOS, or to `wezterm cli
spawn` if the user is on wezterm.

What's good:

- The "real terminal in a real pane" is solved by tools the
  user already trusts. tmux's terminal emulation is correct;
  iTerm's is correct; wezterm's is correct. We're not
  reimplementing any of them.
- Multi-bottle parallelism is automatic — the user opens one
  pane per agent, the multiplexer renders them.
- Implementation cost is tiny: ~50 lines of "if `TMUX` env is
  set, shell out to `tmux split-window`."

What's not good:

- It requires the user to be running in a multiplexer. Outside
  one (plain Terminal.app, vscode integrated terminal, etc.)
  the verb either falls back to handoff or just fails.
- It splits the operator's mental model. The dashboard is one
  window, claude-code panes are other windows; the dashboard's
  "agents pane" no longer matches the visible reality (some
  agents have an attached pane, others don't, and the dashboard
  doesn't know which).
- We don't actually own the layout. tmux's pane sizing rules
  are tmux's; the dashboard can't enforce "agent pane on the
  right, dashboard on the left."

This is the right answer if the project decides "we shouldn't
be a multiplexer; let tmux be a multiplexer." It's the wrong
answer if the dashboard wants to *be* the operator's primary
surface — which the user's question implies it does.

## Recommendation

**Build Option 1 first** (handoff). It clears the bar of "let me
talk to claude without quitting the dashboard," it's a
~30-line patch over the existing `e` / `p` infrastructure, and
it carries no new dependencies. Critically it lets the user
*confirm whether the workflow actually wants embedded panes* —
if option 1 turns out to be enough in practice (Enter to drop
in, Ctrl-D back out, watch proposals between sessions), the
embedded-emulator complexity is unnecessary.

**Treat Option 2 as a six-week project to plan only if Option 1
is observably insufficient.** The pyte dep is acceptable but
the integration is real engineering — terminal emulator
correctness, pane sizing, input routing, multi-PTY state
management. Schedule it; don't bolt it on. If the user lives in
3-bottle days and needs simultaneous output, this is the option;
otherwise it's premature optimization.

**Treat Option 3 as a niche augmentation, not a primary path.**
Detect `TMUX` / `WEZTERM_PANE` / `ITERM_SESSION_ID` at startup;
if present, add a `split` keybinding that delegates pane
creation to the multiplexer. The dashboard remains the primary
interface; the multiplexer is convenience for power users.

## Followups worth checking before committing

- **What does claude-code's PTY-resize behavior look like at
  small heights?** Drive it at 24×40, 10×80, 8×40 and see if it
  blows up. The dashboard's bottom pane is going to be small.
- **Is there a way to `docker exec` into an *already-running
  exec* rather than spawning a new claude process per attach?**
  If not, the dashboard needs its own session-state model: "is
  there an exec running for this slug? attach to it. otherwise
  start one."
- **How does claude-code handle SIGWINCH at the moment?** If it
  re-layouts cleanly the embedded-emulator story gets easier;
  if it corrupts, layer 2 needs special-case handling.
- **Does `docker exec -i` (no `-t`) preserve enough of the TTY
  contract for claude-code to start at all?** Some apps refuse
  to launch without a real TTY; the embedded emulator option
  needs the PTY allocated on the host side and the exec
  re-attached to it, which is the harder of the two options.

## References

- PRD 0019 — active agents pane + selection model (the
  selection-source for whichever option lands)
- PRD 0018 chunk 3 — agent container runs `sleep infinity`;
  claude is invoked via `docker exec -it` (the
  attachment-point this doc is layering against)
- `claude_bottle/cli/dashboard.py:_operator_edit_flow` — the
  existing `curses.endwin` → shell out → `stdscr.refresh()`
  pattern Option 1 would clone
- pyte: <https://pyte.readthedocs.io/> — the candidate
  terminal-emulator library if Option 2 is picked