The Gitea CI runner shares the host docker socket but not its
filesystem, so pipelock_tls_init's host bind-mount path for CA
files is invisible to the runner container — the same constraint
that already gates the other bottle-bringup integration tests.
PRD 0022's test suite was missing this guard; it failed on the
post-merge main build with "pipelock tls init did not produce ca
files". Mirror the existing skipIf pattern at the class level.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two related changes the PRD 0022 sandbox-escape test surfaced:
1. `pipelock_build_config` now emits
`request_body_scanning.scan_headers: true` and
`header_mode: all`. Pipelock's default `header_mode:
sensitive` only checks Authorization / Cookie / X-Api-Key
/ X-Token / Proxy-Authorization / X-Goog-Api-Key — an
agent attempting exfil could trivially pick a
non-sensitive header (`X-Custom: $SECRET`) and slip
through. `all` closes the gap; pipelock caps it by the
same max_body_bytes the body scan uses.
2. Test 3 (HTTP exfil shapes) now targets
raw.githubusercontent.com instead of api.anthropic.com.
api.anthropic.com is in `DEFAULT_TLS_PASSTHROUGH` —
pipelock can't MITM it because real LLM conversation
bodies false-positive on DLP scanners (BIP-39 etc.). The
trade-off is documented in `pipelock.DEFAULT_TLS_PASSTHROUGH`;
the test now exercises a host where the sandbox is
actually supposed to block.
All 5 sandbox-escape attacks now produce HTTP 403 with the
expected sandbox marker (`egress:`, `pipelock`, or `blocked:`):
- Attack 1 (non-allowlisted host) ✓ egress
- Attack 2 (non-allowlisted IP + spoof) ✓ egress
- Attack 3a (URL path) ✓ pipelock DLP
- Attack 3b (URL query) ✓ pipelock DLP
- Attack 3c (request body) ✓ pipelock DLP
- Attack 3d (request header) ✓ pipelock DLP (scan_headers)
- Attack 4a (crafted subdomain) ✓ egress
- Attack 4b (direct dig @8.8.8.8) ✓ network isolation
- Attack 5 (README push, 3 secret shapes) ✓ gitleaks (pre-upstream)
489 unit tests pass (1 updated for the new request_body_scanning
shape). Full integration suite passes in ~6s.
End-to-end test that brings up a real bottle with allowlisted
egress + git-gate + three planted secrets, then runs five
attacks from inside the agent container.
Chunks 1-5 implemented in one pass against the Docker backend:
Attack 1 — non-allowlisted hostname (curl evil.example.com)
✓ blocked by egress
Attack 2 — non-allowlisted IP literal (198.51.100.1) + host-
header spoof via curl --resolve
✓ both blocked by egress
Attack 3 — HTTP exfil to allowlisted destination via path /
query / body / header
✗ ALL FOUR LEAK — request reaches api.anthropic.com
with the secret embedded. Pipelock's DLP doesn't
catch the anthropic-key shape in the body, and
nothing scans path / query / headers.
Attack 4 — DNS exfil via crafted subdomain + direct
dig @8.8.8.8 query
✓ both blocked (egress rejects subdomain, internal
network has no path to 8.8.8.8)
Attack 5 — README push through git-gate with secret-bearing
attacker URL (parameterized over anthropic / AWS /
generic shapes); ordering check that gitleaks fires
BEFORE any upstream attempt
✓ all three secret shapes blocked by gitleaks
Per PRD 0022 Q1 the assertion in attack 3 is authoritative —
HTTP 403 with an egress/pipelock marker in the body is the only
acceptable outcome. Any 4xx from upstream means the secret
reached the network. The four failing sub-tests are real
sandbox gaps that need their own remediation PRDs before this
test merges green.
Also adds `dnsutils` (dig) to the base agent image so attack 4's
direct-DNS check has a tool to run.
CI: no changes needed — `.gitea/workflows/test.yml` already runs
`tests/integration/` and the suite skip_unless_dockers cleanly
when the runner has no Docker socket.
All seven open questions now have decisions baked in:
- Q1 (HTTP-exfil scope): authoritative. Every shape MUST
block; chunk 3 expands into remediation sub-PRDs if
any of path/query/header leak today.
- Q3 (fake secret): multiple shapes, parameterized.
Three env vars (TEST_SECRET_ANTHROPIC, _AWS, _GENERIC);
test 5 loops via subTest. Resilient to gitleaks rule
renames.
- Q6 (missing backend): die. `get_bottle_backend()`'s
current behavior surfaces clearly; surprise-skips are
worse than loud failures for new-backend branches.
- Q7 (tool deps): preflight check. setUpClass runs
`which curl && which git && which dig`; SkipTest with
the missing list catches future backends shipping
thinner base images.
Updated implementation chunks + test-5 sketch to match.
No remaining open questions.
User feedback:
- Q2 (direct DNS resolver test): yes — test 4 grows a
second sub-assertion verifying `dig @8.8.8.8` from the
agent has no path out, alongside the existing
crafted-subdomain check.
- Q4 (gitleaks ordering): test 5 grows an ordering check
— asserts the rejection mentions `gitleaks` AND does
NOT mention upstream-network-phase phrases (resolve /
refused / unreachable / upstream). Confirms gitleaks
rejects BEFORE git-gate tries any upstream push.
- Q5 (CI): try it, accept fallback. New chunk 6 adds a
Gitea Actions job marked `continue-on-error: true` —
runs the suite if the runner can host compose, doesn't
block the workflow if docker-in-docker prevents it.
Three open questions remain (1: pipelock's actual DLP
coverage for non-body shapes; 3: realistic fake secret
shape vs. gitleaks regex; 6+7: backend-agnostic invocation
+ required tools — for the smolmachines work).
Draft a PRD for a composite integration test that brings up
a real bottle with a known allowlist + planted secret and
runs five attacks from inside the agent container:
1. Request to non-allowlisted hostname
2. Request to non-allowlisted IP (incl. host-header spoof)
3. Secret exfil via HTTP — path / query / body / headers
4. Secret exfil via crafted DNS subdomain
5. Secret exfil via README link pushed through git-gate
Each attack passes only when blocked with a permissions
error. The suite is backend-agnostic — runs against
whatever CLAUDE_BOTTLE_BACKEND selects — so it becomes the
gate the upcoming smolmachines spike has to pass before that
backend can substitute for Docker.
Sized into 5 chunks (fixture → attacks 1+2 → attack 3 →
attack 4 → attack 5). Seven open questions called out,
biggest being: today's pipelock probably leaks via header /
path / query because DLP only scans bodies — the test will
expose this as a real gap (chunk 3 lands with
`expectedFailure` markers if so).
When a fresh proposal arrives, the dashboard now also:
- Runs `tmux select-pane -t \$TMUX_PANE` (the dashboard's own
pane id, captured at startup) so tmux focus jumps to the
dashboard from wherever the operator was (typically claude
in the right pane).
- Flips internal focus to PANE_PROPOSALS so j/k navigates the
queued items immediately.
- Lands the selected cursor on the first new proposal —
proposals are sorted by arrival ascending, so the earliest
new arrival in the batch gets the cursor.
Stacks with the bell + label highlight from the previous
commit. The operator gets:
1. Audible bell (or tmux activity marker)
2. Tmux focus on the dashboard pane
3. Dashboard's internal focus on the proposals list
4. Cursor on the actual new proposal
5. Pane label flashing `(new!)` in bold green
— all without leaving the keyboard.
When a fresh proposal lands in the supervise queue, the
dashboard:
1. Rings the terminal bell via `curses.beep()` so tmux's
`monitor-bell` (or the terminal's own bell-on-activity)
surfaces a notice in the dashboard pane even when the
operator is focused on claude in the right pane.
2. Bolds + green-attrs the `proposals:` pane label and
suffixes it with `(new!)` so a glance at the dashboard
screen catches the alert at a glance.
The highlight tracks the existing per-row green-highlight
window (`_NEW_PROPOSAL_HIGHLIGHT_SEC`). The bell only fires for
NEWLY arrived proposals after the first tick — pre-existing
queue entries on dashboard startup don't ring.
The new-agent (`n`) flow's tmux branch was leaving keyboard
focus in the dashboard pane after compose-up + provision
finished and claude landed in the right pane — same situation
as Enter re-attach before its `focus_right_pane` fix. The
operator just spun an agent up; they want to type at it.
Pass `focus_right_pane=True` to `_attach_in_tmux` from the
new-agent flow. `tmux select-pane` runs after the respawn.
`--continue` exits non-zero when an agent has been spun up but
never typed at — there's no transcript to resume. Re-attaching
to such an agent via Enter (tmux mode) was crashing the pane.
Wrap the resume invocation in `sh -c '<cmd> --continue || <cmd>'`
so a failed `--continue` cleanly falls through to a fresh
claude. The shell adds microseconds and the fallback only
kicks in when --continue would have failed anyway.
New `_build_resume_argv_with_fallback(bottle)` builds the
shell-wrapped docker exec argv with proper shlex quoting (so
paths-with-spaces in `--append-system-prompt-file` survive).
Only the tmux re-attach path uses it; first-attach + foreground
handoff are unchanged.
489 unit tests pass (4 new for the fallback builder).
The Enter key on a focused agents-pane row is the operator's
explicit "I want to interact with this agent" signal — after
respawning the right pane with claude, move tmux's keyboard
focus to that pane so the operator can start typing
immediately. Without this, every Enter required a manual tmux
nav (C-b →) to actually use the session.
Mechanics:
- `_attach_in_tmux` gains `focus_right_pane: bool = False`.
- When True, runs `tmux select-pane -t <pane_id>` after the
respawn.
- `_attach_to_bottle` (the Enter handler's helper) passes
True.
- Other callers (new-agent flow, stop's auto-attach) leave
it False so the operator stays in the dashboard for
follow-up navigation.
`_tmux_select_pane` is a small subprocess wrapper, best-effort
on failure.
After `x` stops a dashboard-owned bottle, slide focus to the
next agent in the agents pane (the one filling the stopped
row, or the new last row if the stopped was last) and respawn
the right pane with that agent's claude session via `--continue`.
If no agents remain, close the right pane via `tmux kill-pane`.
Two new helpers:
- `_tmux_close_right_pane(tmux_state)` — kills the tracked
pane (if it exists) and clears pane_id / slug.
- `_pick_next_after_stop(agents_before, selected_index,
stopped_slug)` — pure chooser returning (new_index, agent)
or None. Tested directly.
Outside tmux, only the selected_agent index slides; no
auto-attach (foreground handoff would take over the terminal,
disruptive). 485 unit tests pass (6 new for the pick helper).
The dashboard is primarily an agent-management surface
(PRD 0020 + 0021); landing on the proposals pane was a holdover
from when proposals were the only thing the dashboard showed.
Default focus is now `PANE_AGENTS`, so j/k navigates the agents
list immediately on launch — the operator Tabs to proposals
when something queues. Focus choice still persists across
operations.
Both `_new_agent_flow` (bringup) and `_stop_bottle_flow`
(teardown) were doing the same five-step dance: open the log
path, mkdir parents, empty the file, ensure the right pane is
tailing it, redirect fd 2 to the same file. Extract into a
context manager:
with _route_op_to_right_pane(tmux_state, slug, log_name) as routed:
if routed:
<run op>
Yields True when routing succeeded (fd 2 redirected, pane
tailing), False on fallback conditions (not in tmux, no
tmux_state, or tmux failed to spawn a pane). The fallback
paths still differ between callers — bringup follows up
with `_attach_in_tmux`, teardown does the curses-endwin
compose-down — so the helper stops at "is stderr routed
or not" and lets callers branch from there. Net diff:
~60 lines deleted, the routing-to-right-pane concept now
lives in one place.
PRD 0021 follow-up. Mirrors the bringup-into-right-pane fix
on the explicit-stop path: when `\$TMUX` is set, the stop
flow respawns the right pane with `tail -F
state/<slug>/teardown.log` (via `_ensure_right_pane` —
reuses the existing right pane if it's the agent's claude
session) and redirects fd 2 to that log for the duration of
`capture_session_state` + `cm.__exit__`. compose-down +
network-remove messages stream into the right pane.
After `settle_state` removes the state dir, the tail keeps
its buffered output visible (tail -F handles file removal
gracefully); the next attach respawns the pane with claude.
Falls back to the existing curses-endwin path on tmux
failure, or when the dashboard isn't in tmux at all.
After the operator pressed `y` on the preflight modal (or
picked an agent in the picker), the modal's curses sub-window
stayed on screen until the dashboard's main loop ticked again
— which during a 5-10s launch made it look like the
confirmation never registered.
Add `_erase_modal` (touchwin + refresh on stdscr) and call it
at every exit from `_preflight_modal` and `_picker_modal`.
The pre-modal frame buffered on stdscr immediately overwrites
the sub-window's area; the launch proceeds with a clean
dashboard underneath.
PRD 0021 follow-up. The new-agent flow was calling a dedicated
`_tmux_split_pane_tail` that ALWAYS created a new pane —
so every `n` start spawned a fresh right pane next to any
existing one, accumulating panes instead of reusing them.
Replace with a generic `_ensure_right_pane(tmux_state, argv)`
that respawns the dashboard's tracked right pane if one is
alive, splits a new one only when none is tracked or the
tracked pane was closed. Both the new-agent tail-during-
bringup path AND the existing claude-attach path now route
through this helper.
Net effect: starting a second agent reuses the same right
pane — bringup tail replaces the prior claude session,
then claude (for the new agent) replaces the tail. Closing
the right pane manually via `C-b x` still triggers a fresh
split on the next attach.
PRD 0021 follow-up. When starting a new agent via `n` while
in tmux, the dashboard now:
1. Pre-creates the right pane with `tail -F
state/<slug>/bringup.log`.
2. Redirects fd 2 (stderr) to that log file via dup2 — affects
both Python `info()` calls AND subprocess inheritors'
stderr (docker compose up, network creates, provision).
3. Runs `backend.launch().__enter__()` with the redirect in
place; everything streams into the right pane via tail.
4. Restores stderr.
5. Respawns the right pane (tail → claude session).
Net effect: dashboard pane stays uncluttered during bringup,
and the operator watches the compose-up + provision output in
the same pane that's about to hold the claude session — no
visual handoff between "starting" and "started."
Curses never needs to come down on the tmux path (the pane is
already created in the dashboard's neighbor pane, and stderr
is redirected away from the terminal entirely).
If `_tmux_split_pane_tail` fails (tmux missing, server died),
falls through to the existing curses-endwin handoff so the
operator still gets a session.
PRD 0021 chunk 4 (final). Two adjustments to close the
split-pane loop:
1. `_stop_bottle_flow` clears `tmux_state['slug']` when the
stopped bottle was the right-pane occupant. The pane itself
stays in place (claude exits with "container not found");
the operator presses Enter on a different agent to
repurpose it via respawn-pane.
2. `_render` accepts `right_pane_slug` and marks the matching
agents-pane row with a `*` prefix + A_BOLD (when it's not
also the focused row — focused selection still wins for
visibility). Gives the operator a clear visual link
between which agent the dashboard says is "active right
now" and which one is visible to their right.
Wired through `_main_loop`: passes `tmux_state` to
`_stop_bottle_flow` on `x`, and `tmux_state.get('slug')` to
`_render` on every tick.
479 unit tests pass (1 new for the tmux_state-preservation
on non-owned stop). PRD 0021 implementation complete pending
merge.
PRD 0021 chunk 3. The `n` flow (PRD 0020 chunk 2) now routes
the first claude session of a freshly-started bottle into the
right tmux pane when `\$TMUX` is set — same `_attach_in_tmux`
state machine the Enter re-attach uses, just with
`resume=False` so claude starts fresh.
Outside tmux the existing foreground handoff is unchanged.
The compose-up phase (`backend.launch.__enter__`) still drops
curses for its stderr output; we restore curses BEFORE
spawning into the right pane so the dashboard re-renders
alongside the new claude session instead of waiting for
attach to return.
PRD 0021 chunk 2. New tmux integration: when `\$TMUX` is set
and the operator presses Enter on a focused agent row, the
dashboard spawns / respawns the right pane with that bottle's
claude session instead of taking over the terminal via
curses.endwin.
Mechanics:
- `_in_tmux()` — true when `\$TMUX` is set.
- `_tmux_split_pane_create` — first attach: `tmux split-window
-h -P -F '#{pane_id}'` opens a right pane and prints its id
for tracking.
- `_tmux_respawn_pane` — subsequent attaches: `tmux
respawn-pane -k -t <id>` swaps the content without
re-splitting.
- `_tmux_pane_exists` — `tmux list-panes` check before
respawn so a manually-closed pane gracefully falls back to
a fresh split.
- `_attach_in_tmux` — owns the create-or-respawn state
machine, mutates `tmux_state` ({pane_id, slug}) so the
main loop tracks the right-pane occupant.
- `_attach_via_handoff` — the previous curses-endwin path,
extracted as the fallback when tmux is missing or fails.
- `_attach_to_bottle` dispatches: in tmux + state available →
`_attach_in_tmux`; otherwise → handoff.
Main loop gets `tmux_state: dict = {"pane_id": None, "slug":
None}`. Chunks 3 + 4 wire it through the new-agent flow and
the stop hook.
`FileNotFoundError`-safe `subprocess.run` calls around every
tmux invocation — a missing tmux binary cleanly falls back to
the handoff for that keypress. 478 unit tests pass (10 new
for the pure argv builders + `_claude_runtime_args`).
PRD 0021 chunk 1. The tmux split-pane helpers (chunk 2+) need
the same docker-exec argv that `exec_claude` builds — including
the `--append-system-prompt-file <path>` flag the bottle's
provisioner copies into place. Extract the argv construction
into a pure `claude_docker_argv(argv, *, tty)` method so both
foreground (`subprocess.run`) and tmux paths
(`tmux respawn-pane …`) build from the same source.
`exec_claude` becomes a one-liner that runs subprocess.run on
the argv. No behavior change; 472 unit tests pass (7 new for
the pure builder).
PR #48 closed; treat the implementation as starting from
main, where no tmux integration exists yet. The PRD now
describes the full design (including the `_in_tmux` detection
+ helper scaffolding) as fresh work. Sized into 4 chunks:
`claude_docker_argv` refactor → tmux helpers + pane state +
`_attach_to_bottle` dispatch → new-agent flow → stop +
indicator.
Same design as before — opt-in by `\$TMUX`, split-window-then-
respawn, falls back to handoff on tmux failure or missing
binary. No external references to PR #48.
Draft a PRD that tightens PR #48's tmux integration from
"one new window per attach" to "one persistent right pane that
the dashboard's selection drives." Inside tmux (`\$TMUX` set):
dashboard in the left pane; pressing Enter or `n` spawns
claude in the right pane via `tmux split-window` on first
attach, then `tmux respawn-pane` on subsequent attaches so the
operator-focused agent is always the visible one.
Outside tmux: falls back to today's handoff. Opt-in by
environment; no flag.
Sized into 4 chunks (pane state + create → respawn → stop
integration → supersede PR #48's new-window). Seven open
questions called out, the biggest being whether the dashboard
should auto-exec into a fresh tmux session when launched
outside one (v1 says no — operators start tmux themselves).
The `bottles` dict held `@contextmanager`-wrapped launch contexts.
On normal Python interpreter shutdown those context managers'
generators got GC'd, which raised GeneratorExit at the yield
point and ran the `finally` block — invoking each bottle's
teardown and tearing down the compose project. Net effect: `q`
WAS implicitly stopping every dashboard-launched bottle even
though the keypress handler just `return`'d.
`os._exit(0)` skips all Python-level cleanup (GC, atexit, etc.),
so the docker compose projects survive the dashboard exit
untouched. Curses gets explicit `endwin()` first because the
brutal exit skips curses.wrapper's normal terminal restoration.
Matches PRD 0020's resolved-question answer (`q` does NOT tear
down bottles; teardown is always explicit via `x` or
`./cli.py cleanup`).
`--resume` alone surfaces claude's session picker even when only
one session exists. `--continue` jumps to the most recent session
non-interactively, which is the actual behavior the dashboard's
Enter re-attach wants for typical bottle-with-one-session cases.
Re-entering a running bottle from the dashboard (Enter on the
agents pane) now invokes claude with `--resume` so the session
picks up the prior conversation history rather than starting a
fresh transcript. The first-attach paths (`./cli.py start` and
the dashboard's new-agent `n` flow) leave it off — the
transcript doesn't exist yet there.
`attach_claude` gains a `resume: bool = False` kwarg;
`_attach_to_bottle` in the dashboard passes `True`.
Final PRD 0020 chunk. `x` on a focused agents-pane row tears
down the selected bottle if the dashboard owns it (started via
the chunk-2 `n` flow): pops `(cm, bottle, identity)` from the
main loop's bottles map, snapshots the transcript best-effort,
calls `cm.__exit__(None, None, None)` to drive the existing
compose-down + network-remove sequence, then `settle_state` to
honor any pre-existing preserve marker.
On a non-owned slug (discovered via `list_active_slugs` but not
in the dashboard's bottles dict — i.e., previous-dashboard or
external `./cli.py start` bottle), `x` is a no-op with a status
hint pointing at `./cli.py cleanup`. Matches the PRD's
cross-dashboard re-attach model: the dashboard can re-attach
either kind, but can only tear down its own.
The PRD's chunk 5 ("quit-cleanup") is satisfied by the existing
no-op behavior of `q` — per the user's resolved-question
answer, quit leaves bottles running unchanged. No code change
needed for that.
Footer surfaces `[x] stop`. 465 unit tests pass (1 new for the
non-owned no-op path; the owned path is integration territory
because it drives a real compose-down).
PRD 0020 chunk 3. Enter on a focused agents-pane row drops to a
claude session inside the selected bottle. Works for both
dashboard-owned bottles (looks up the stored Bottle handle in
the main loop's `bottles` dict) and externally-discovered ones
(synthesizes a DockerBottle from the slug → `claude-bottle-<slug>`
container name).
For the synthesized path, the `--append-system-prompt-file`
target resolves via metadata.json + the manifest's agent prompt
if both can be read; otherwise the re-attach runs without the
flag (claude defaults to no system prompt, the bottle's other
state is untouched).
Shares the curses.endwin → attach → refresh handoff with the
chunk-2 new-agent flow via a new `_attach_to_bottle` helper.
Footer reshuffled to advertise `[Enter] view/attach`. 464 unit
tests pass (3 new for `_bottle_for_slug`).
PRD 0020 chunk 2. Pressing `n` opens a modal that lists every
agent from the manifest with `(N running)` suffixes for ones
that already have bottles up. Type to filter (substring,
case-insensitive); j/k or arrows to navigate; Enter to confirm;
Esc clears the filter on first press, exits the picker on the
second.
On confirmation, the dashboard runs:
- `prepare_with_preflight` from chunk 1 with curses-modal
render + prompt callables (the preflight modal centers the
plan summary + captures [y/N]).
- `backend.launch(plan).__enter__()` — enters but doesn't bind
the context to a `with`. The (cm, bottle, identity) tuple
lands in the main loop's `bottles` dict keyed by slug.
- `curses.endwin()` → `attach_claude(bottle)` → `stdscr.refresh()`
handoff. The agent's claude session takes over the terminal;
on exit the dashboard re-renders with the bottle now visible
in the agents pane.
Crucially the context manager is held alive in `bottles` — never
`__exit__`'d at quit. Chunk 4 will wire `x` to that exit; for
now bottles started from the dashboard stay running until
explicit cleanup. Matches the PRD's "q does not tear down"
decision.
Footer surfaces `[n] new agent`. 461 unit tests pass (8 new for
`_filter_agents` and `_running_counts`).
PRD 0020 chunk 1. `cli/start.py`'s `_launch_bottle` did three
things in one function: prepare + preflight, attach claude, and
settle state on teardown. Split them so the dashboard (PRD 0020
chunk 2+) can reuse the prepare + attach pieces piecewise
without going through the CLI's one-shot orchestrator:
- `prepare_with_preflight(spec, *, stage_dir, render_preflight,
prompt_yes, dry_run)` — injects render + prompt callables so
the CLI binds them to stderr/stdin while the dashboard binds
them to a curses modal. Returns `(plan, identity)`; identity
is set after `backend.prepare` returns so callers can reap
the prepare-time state dir on abort via `settle_state` in
their finally — preserving today's preflight-N cleanup.
- `attach_claude(bottle, *, remote_control)` — runs claude
inside the bottle and returns its exit code. The dashboard
calls this from inside a `curses.endwin` → … →
`stdscr.refresh()` handoff.
- `capture_session_state` / `settle_state` lose their leading
underscore; the dashboard will call them on
session-end + explicit-stop respectively.
`_launch_bottle` becomes a thin orchestrator over those helpers.
No behavior change; all 453 unit tests pass and `./cli.py start
implementer --dry-run` produces identical preflight output.
Draft a PRD that turns the dashboard into the operator's single
surface — collapses today's two-terminal workflow (one for
`./cli.py start`, one for `./cli.py dashboard`) into a single
dashboard invocation that can spin up new agents, re-attach to
ones it already spun up, and explicitly stop them.
Picks the "handoff" mechanism from `docs/research/claude-code-
pane-in-dashboard.md` (curses.endwin → docker exec -it claude
→ stdscr.refresh) and crucially decouples the bottle's lifetime
from any single claude session: exit claude → back to dashboard
with the bottle still running; quit dashboard → tear down every
bottle the dashboard owns.
Sized into 5 chunks (refactor → picker + new-agent → re-attach
→ explicit stop → quit-cleanup). Seven open questions called
out, the biggest being modal-vs-drop-and-resume for the
preflight Y/N inside curses.
Survey the three realistic ways to surface a claude-code session
inside the dashboard TUI:
1. Handoff — drop curses, foreground claude, restore on exit
(the existing `e`/`p` pattern, extended). Minimal code,
side-by-time rather than side-by-side.
2. Embedded emulator — own a PTY, parse claude-code's ANSI
stream via `pyte`, paint it into a curses pane. Real
"pane in the dashboard" but a six-week build with one new
dep and several integration trap-doors (alt-screen, resize,
input routing, multi-PTY state).
3. External multiplexer — delegate pane creation to tmux /
iTerm / wezterm when detected. Tiny code, but splits the
operator's mental model and gives up layout control.
Recommendation: ship Option 1 first; defer Option 2 to "only if
Option 1 is observably insufficient"; treat Option 3 as a
niche augmentation for power users.
Calls out four followups worth verifying before committing
(PTY behavior at small sizes, attach-to-existing-exec, SIGWINCH
handling, `-it` vs `-i` for the embedded path).
PRD 0018 chunk 3's atomicity fix used write-temp-then-rename to
update bind-mounted config files. POSIX rename atomically swaps
the inode at the host path — but Docker single-file bind mounts
on Linux pin the source inode at mount time, so post-rename the
container's mount points at the now-orphaned old inode and never
sees the new content. The egress sidecar's SIGHUP-driven reload
re-reads the same stale file → "egress route updates aren't
updatable via the supervisor anymore".
Switch egress_apply + pipelock_apply to write in place (same
inode, truncated + rewritten). Lose file-level POSIX atomicity,
but:
- egress: SIGHUP fires only AFTER the write returns; the
addon's `load_routes` raises `ValueError` on a partial read
and keeps the previous in-memory routes, so the in-process
race window (already narrow) is non-disruptive.
- pipelock: applies via `docker restart` rather than SIGHUP;
restart serializes after the host write completes, so the
container reads the fully-written file on next boot.
macOS Docker Desktop's file-sharing layer (virtiofs / osxfs)
silently re-resolves the path on rename, which is why this bug
didn't surface in dev tests on macOS. Linux native Docker is
the strict reading; the fix works on both.
`egress_render_routes` now emits hand-rolled YAML in the same style
as `pipelock_render_yaml`. The egress addon parses it via
`yaml_subset.parse_yaml_subset` — the same parser the manifest
loader + pipelock_apply use.
Why bother: routes.yaml is bind-mounted into the egress sidecar
AND surfaced to operators through `routes edit` (PRD 0019). JSON-
in-yml renders ugly in $EDITOR and signals "this is data" rather
than "this is config you can read at a glance". Real YAML reads
cleanly.
Mechanics:
- `yaml_subset.py` drops its `claude_bottle.log` dependency.
Errors now raise `YamlSubsetError` (a `ValueError`); the
manifest loader + pipelock_apply catch it at the boundary
and forward to `die` / `PipelockApplyError` so callers see
the same behavior they did before.
- `Dockerfile.egress` adds one COPY line for `yaml_subset.py`
so it sits flat in `/app/` next to the addon. The addon
uses an absolute-import-with-fallback shim so the same file
works inside the container AND from the host's unit tests.
- `egress_apply._merge_single_route` round-trips current
routes.yaml through `parse_yaml_subset` + a new
`_render_routes_payload` helper instead of `json.loads` +
`json.dumps`.
End-to-end: rebuilt the egress image, ran `./cli.py start` to a
full bring-up, confirmed the addon's boot log shows `egress:
loaded 9 route(s)` — i.e., the YAML parses inside the container.
453 unit + 3 integration tests pass.
PRD 0019 chunk 4 (final). The `e` (routes edit) and `p` (pipelock
edit) keys now require an agent selection in the agents pane.
Pressing them with the proposals pane focused, with no active
agents, or with an out-of-range selection is a no-op with a
status hint ("no agent selected; Tab into the agents pane first").
The discover-and-prompt scaffolding inside
`_operator_edit_routes_flow` / `_operator_edit_allowlist_flow` /
`_operator_edit_flow` is gone. The flows now take an `ActiveAgent`
+ required-service name; they refuse with a clear message when
the bottle lacks the requested sidecar (e.g., `routes edit`
against a bottle with no `bottle.egress.routes` declared). The
`discover_egress_slugs` + `discover_pipelock_slugs` +
`_discover_active_with_service` helpers come out — they had no
remaining callers.
Footer now reads `[e/p] edit selected agent`.
PRD 0019 chunk 3. The TUI now has two focusable panes — proposals
and agents — and `Tab` toggles which one the `j/k`/arrow keys
move through.
Each pane keeps its own selection index. Switching panes doesn't
lose the position in the other; the cursor (`>` + reverse-video
row) appears only in the focused pane. The label line on each
pane shows "(focused)" when active.
Footer reshuffled: `[Tab] switch pane [j/k] move [Enter] view
[a/m/r] proposal [e/p] edit [q] quit`. When the agents pane is
focused and there's no status message to display, the idle
status line surfaces the currently-selected agent (or "[no
active agents]" / "[no agent selected]" fallbacks) so the
operator knows what an agent-scoped edit verb will target after
chunk 4 wires them up.
Proposal action keys (a/m/r/Enter) are gated on the proposals
pane being focused — pressing them with the agents pane focused
is a no-op. e/p still use the global discover-and-prompt flow
for one more chunk; chunk 4 swaps them to read the agents-pane
selection.
PRD 0019 chunk 2. The TUI's main render now draws two panes:
proposals on top (existing), active agents on the bottom (new).
Header counts both totals. The agents pane refreshes on the
same 1s tick — agents starting/stopping reflect without
operator action.
Each agent row shows slug, agent name, started-time (HH:MM:SS
of the metadata.json timestamp), and the bracketed list of
sidecars currently up. The `agent` service is filtered out of
the displayed list — it's always present so it'd be noise; the
sidecars are the differentiator. A bottle whose only running
service is `agent` (sidecars still warming up) renders as
`(starting)`.
No selection model yet — that's chunk 3. The cursor stays in
the proposals pane; `j/k`/arrow nav and the proposal action
keys are unchanged.