Two related changes the PRD 0022 sandbox-escape test surfaced:
1. `pipelock_build_config` now emits
`request_body_scanning.scan_headers: true` and
`header_mode: all`. Pipelock's default `header_mode:
sensitive` only checks Authorization / Cookie / X-Api-Key
/ X-Token / Proxy-Authorization / X-Goog-Api-Key — an
agent attempting exfil could trivially pick a
non-sensitive header (`X-Custom: $SECRET`) and slip
through. `all` closes the gap; pipelock caps it by the
same max_body_bytes the body scan uses.
2. Test 3 (HTTP exfil shapes) now targets
raw.githubusercontent.com instead of api.anthropic.com.
api.anthropic.com is in `DEFAULT_TLS_PASSTHROUGH` —
pipelock can't MITM it because real LLM conversation
bodies false-positive on DLP scanners (BIP-39 etc.). The
trade-off is documented in `pipelock.DEFAULT_TLS_PASSTHROUGH`;
the test now exercises a host where the sandbox is
actually supposed to block.
All 5 sandbox-escape attacks now produce HTTP 403 with the
expected sandbox marker (`egress:`, `pipelock`, or `blocked:`):
- Attack 1 (non-allowlisted host) ✓ egress
- Attack 2 (non-allowlisted IP + spoof) ✓ egress
- Attack 3a (URL path) ✓ pipelock DLP
- Attack 3b (URL query) ✓ pipelock DLP
- Attack 3c (request body) ✓ pipelock DLP
- Attack 3d (request header) ✓ pipelock DLP (scan_headers)
- Attack 4a (crafted subdomain) ✓ egress
- Attack 4b (direct dig @8.8.8.8) ✓ network isolation
- Attack 5 (README push, 3 secret shapes) ✓ gitleaks (pre-upstream)
489 unit tests pass (1 updated for the new request_body_scanning
shape). Full integration suite passes in ~6s.
When a fresh proposal arrives, the dashboard now also:
- Runs `tmux select-pane -t \$TMUX_PANE` (the dashboard's own
pane id, captured at startup) so tmux focus jumps to the
dashboard from wherever the operator was (typically claude
in the right pane).
- Flips internal focus to PANE_PROPOSALS so j/k navigates the
queued items immediately.
- Lands the selected cursor on the first new proposal —
proposals are sorted by arrival ascending, so the earliest
new arrival in the batch gets the cursor.
Stacks with the bell + label highlight from the previous
commit. The operator gets:
1. Audible bell (or tmux activity marker)
2. Tmux focus on the dashboard pane
3. Dashboard's internal focus on the proposals list
4. Cursor on the actual new proposal
5. Pane label flashing `(new!)` in bold green
— all without leaving the keyboard.
When a fresh proposal lands in the supervise queue, the
dashboard:
1. Rings the terminal bell via `curses.beep()` so tmux's
`monitor-bell` (or the terminal's own bell-on-activity)
surfaces a notice in the dashboard pane even when the
operator is focused on claude in the right pane.
2. Bolds + green-attrs the `proposals:` pane label and
suffixes it with `(new!)` so a glance at the dashboard
screen catches the alert at a glance.
The highlight tracks the existing per-row green-highlight
window (`_NEW_PROPOSAL_HIGHLIGHT_SEC`). The bell only fires for
NEWLY arrived proposals after the first tick — pre-existing
queue entries on dashboard startup don't ring.
The new-agent (`n`) flow's tmux branch was leaving keyboard
focus in the dashboard pane after compose-up + provision
finished and claude landed in the right pane — same situation
as Enter re-attach before its `focus_right_pane` fix. The
operator just spun an agent up; they want to type at it.
Pass `focus_right_pane=True` to `_attach_in_tmux` from the
new-agent flow. `tmux select-pane` runs after the respawn.
`--continue` exits non-zero when an agent has been spun up but
never typed at — there's no transcript to resume. Re-attaching
to such an agent via Enter (tmux mode) was crashing the pane.
Wrap the resume invocation in `sh -c '<cmd> --continue || <cmd>'`
so a failed `--continue` cleanly falls through to a fresh
claude. The shell adds microseconds and the fallback only
kicks in when --continue would have failed anyway.
New `_build_resume_argv_with_fallback(bottle)` builds the
shell-wrapped docker exec argv with proper shlex quoting (so
paths-with-spaces in `--append-system-prompt-file` survive).
Only the tmux re-attach path uses it; first-attach + foreground
handoff are unchanged.
489 unit tests pass (4 new for the fallback builder).
The Enter key on a focused agents-pane row is the operator's
explicit "I want to interact with this agent" signal — after
respawning the right pane with claude, move tmux's keyboard
focus to that pane so the operator can start typing
immediately. Without this, every Enter required a manual tmux
nav (C-b →) to actually use the session.
Mechanics:
- `_attach_in_tmux` gains `focus_right_pane: bool = False`.
- When True, runs `tmux select-pane -t <pane_id>` after the
respawn.
- `_attach_to_bottle` (the Enter handler's helper) passes
True.
- Other callers (new-agent flow, stop's auto-attach) leave
it False so the operator stays in the dashboard for
follow-up navigation.
`_tmux_select_pane` is a small subprocess wrapper, best-effort
on failure.
After `x` stops a dashboard-owned bottle, slide focus to the
next agent in the agents pane (the one filling the stopped
row, or the new last row if the stopped was last) and respawn
the right pane with that agent's claude session via `--continue`.
If no agents remain, close the right pane via `tmux kill-pane`.
Two new helpers:
- `_tmux_close_right_pane(tmux_state)` — kills the tracked
pane (if it exists) and clears pane_id / slug.
- `_pick_next_after_stop(agents_before, selected_index,
stopped_slug)` — pure chooser returning (new_index, agent)
or None. Tested directly.
Outside tmux, only the selected_agent index slides; no
auto-attach (foreground handoff would take over the terminal,
disruptive). 485 unit tests pass (6 new for the pick helper).
The dashboard is primarily an agent-management surface
(PRD 0020 + 0021); landing on the proposals pane was a holdover
from when proposals were the only thing the dashboard showed.
Default focus is now `PANE_AGENTS`, so j/k navigates the agents
list immediately on launch — the operator Tabs to proposals
when something queues. Focus choice still persists across
operations.
Both `_new_agent_flow` (bringup) and `_stop_bottle_flow`
(teardown) were doing the same five-step dance: open the log
path, mkdir parents, empty the file, ensure the right pane is
tailing it, redirect fd 2 to the same file. Extract into a
context manager:
with _route_op_to_right_pane(tmux_state, slug, log_name) as routed:
if routed:
<run op>
Yields True when routing succeeded (fd 2 redirected, pane
tailing), False on fallback conditions (not in tmux, no
tmux_state, or tmux failed to spawn a pane). The fallback
paths still differ between callers — bringup follows up
with `_attach_in_tmux`, teardown does the curses-endwin
compose-down — so the helper stops at "is stderr routed
or not" and lets callers branch from there. Net diff:
~60 lines deleted, the routing-to-right-pane concept now
lives in one place.
PRD 0021 follow-up. Mirrors the bringup-into-right-pane fix
on the explicit-stop path: when `\$TMUX` is set, the stop
flow respawns the right pane with `tail -F
state/<slug>/teardown.log` (via `_ensure_right_pane` —
reuses the existing right pane if it's the agent's claude
session) and redirects fd 2 to that log for the duration of
`capture_session_state` + `cm.__exit__`. compose-down +
network-remove messages stream into the right pane.
After `settle_state` removes the state dir, the tail keeps
its buffered output visible (tail -F handles file removal
gracefully); the next attach respawns the pane with claude.
Falls back to the existing curses-endwin path on tmux
failure, or when the dashboard isn't in tmux at all.
After the operator pressed `y` on the preflight modal (or
picked an agent in the picker), the modal's curses sub-window
stayed on screen until the dashboard's main loop ticked again
— which during a 5-10s launch made it look like the
confirmation never registered.
Add `_erase_modal` (touchwin + refresh on stdscr) and call it
at every exit from `_preflight_modal` and `_picker_modal`.
The pre-modal frame buffered on stdscr immediately overwrites
the sub-window's area; the launch proceeds with a clean
dashboard underneath.
PRD 0021 follow-up. The new-agent flow was calling a dedicated
`_tmux_split_pane_tail` that ALWAYS created a new pane —
so every `n` start spawned a fresh right pane next to any
existing one, accumulating panes instead of reusing them.
Replace with a generic `_ensure_right_pane(tmux_state, argv)`
that respawns the dashboard's tracked right pane if one is
alive, splits a new one only when none is tracked or the
tracked pane was closed. Both the new-agent tail-during-
bringup path AND the existing claude-attach path now route
through this helper.
Net effect: starting a second agent reuses the same right
pane — bringup tail replaces the prior claude session,
then claude (for the new agent) replaces the tail. Closing
the right pane manually via `C-b x` still triggers a fresh
split on the next attach.
PRD 0021 follow-up. When starting a new agent via `n` while
in tmux, the dashboard now:
1. Pre-creates the right pane with `tail -F
state/<slug>/bringup.log`.
2. Redirects fd 2 (stderr) to that log file via dup2 — affects
both Python `info()` calls AND subprocess inheritors'
stderr (docker compose up, network creates, provision).
3. Runs `backend.launch().__enter__()` with the redirect in
place; everything streams into the right pane via tail.
4. Restores stderr.
5. Respawns the right pane (tail → claude session).
Net effect: dashboard pane stays uncluttered during bringup,
and the operator watches the compose-up + provision output in
the same pane that's about to hold the claude session — no
visual handoff between "starting" and "started."
Curses never needs to come down on the tmux path (the pane is
already created in the dashboard's neighbor pane, and stderr
is redirected away from the terminal entirely).
If `_tmux_split_pane_tail` fails (tmux missing, server died),
falls through to the existing curses-endwin handoff so the
operator still gets a session.
PRD 0021 chunk 4 (final). Two adjustments to close the
split-pane loop:
1. `_stop_bottle_flow` clears `tmux_state['slug']` when the
stopped bottle was the right-pane occupant. The pane itself
stays in place (claude exits with "container not found");
the operator presses Enter on a different agent to
repurpose it via respawn-pane.
2. `_render` accepts `right_pane_slug` and marks the matching
agents-pane row with a `*` prefix + A_BOLD (when it's not
also the focused row — focused selection still wins for
visibility). Gives the operator a clear visual link
between which agent the dashboard says is "active right
now" and which one is visible to their right.
Wired through `_main_loop`: passes `tmux_state` to
`_stop_bottle_flow` on `x`, and `tmux_state.get('slug')` to
`_render` on every tick.
479 unit tests pass (1 new for the tmux_state-preservation
on non-owned stop). PRD 0021 implementation complete pending
merge.
PRD 0021 chunk 3. The `n` flow (PRD 0020 chunk 2) now routes
the first claude session of a freshly-started bottle into the
right tmux pane when `\$TMUX` is set — same `_attach_in_tmux`
state machine the Enter re-attach uses, just with
`resume=False` so claude starts fresh.
Outside tmux the existing foreground handoff is unchanged.
The compose-up phase (`backend.launch.__enter__`) still drops
curses for its stderr output; we restore curses BEFORE
spawning into the right pane so the dashboard re-renders
alongside the new claude session instead of waiting for
attach to return.
PRD 0021 chunk 2. New tmux integration: when `\$TMUX` is set
and the operator presses Enter on a focused agent row, the
dashboard spawns / respawns the right pane with that bottle's
claude session instead of taking over the terminal via
curses.endwin.
Mechanics:
- `_in_tmux()` — true when `\$TMUX` is set.
- `_tmux_split_pane_create` — first attach: `tmux split-window
-h -P -F '#{pane_id}'` opens a right pane and prints its id
for tracking.
- `_tmux_respawn_pane` — subsequent attaches: `tmux
respawn-pane -k -t <id>` swaps the content without
re-splitting.
- `_tmux_pane_exists` — `tmux list-panes` check before
respawn so a manually-closed pane gracefully falls back to
a fresh split.
- `_attach_in_tmux` — owns the create-or-respawn state
machine, mutates `tmux_state` ({pane_id, slug}) so the
main loop tracks the right-pane occupant.
- `_attach_via_handoff` — the previous curses-endwin path,
extracted as the fallback when tmux is missing or fails.
- `_attach_to_bottle` dispatches: in tmux + state available →
`_attach_in_tmux`; otherwise → handoff.
Main loop gets `tmux_state: dict = {"pane_id": None, "slug":
None}`. Chunks 3 + 4 wire it through the new-agent flow and
the stop hook.
`FileNotFoundError`-safe `subprocess.run` calls around every
tmux invocation — a missing tmux binary cleanly falls back to
the handoff for that keypress. 478 unit tests pass (10 new
for the pure argv builders + `_claude_runtime_args`).
PRD 0021 chunk 1. The tmux split-pane helpers (chunk 2+) need
the same docker-exec argv that `exec_claude` builds — including
the `--append-system-prompt-file <path>` flag the bottle's
provisioner copies into place. Extract the argv construction
into a pure `claude_docker_argv(argv, *, tty)` method so both
foreground (`subprocess.run`) and tmux paths
(`tmux respawn-pane …`) build from the same source.
`exec_claude` becomes a one-liner that runs subprocess.run on
the argv. No behavior change; 472 unit tests pass (7 new for
the pure builder).
The `bottles` dict held `@contextmanager`-wrapped launch contexts.
On normal Python interpreter shutdown those context managers'
generators got GC'd, which raised GeneratorExit at the yield
point and ran the `finally` block — invoking each bottle's
teardown and tearing down the compose project. Net effect: `q`
WAS implicitly stopping every dashboard-launched bottle even
though the keypress handler just `return`'d.
`os._exit(0)` skips all Python-level cleanup (GC, atexit, etc.),
so the docker compose projects survive the dashboard exit
untouched. Curses gets explicit `endwin()` first because the
brutal exit skips curses.wrapper's normal terminal restoration.
Matches PRD 0020's resolved-question answer (`q` does NOT tear
down bottles; teardown is always explicit via `x` or
`./cli.py cleanup`).
`--resume` alone surfaces claude's session picker even when only
one session exists. `--continue` jumps to the most recent session
non-interactively, which is the actual behavior the dashboard's
Enter re-attach wants for typical bottle-with-one-session cases.
Re-entering a running bottle from the dashboard (Enter on the
agents pane) now invokes claude with `--resume` so the session
picks up the prior conversation history rather than starting a
fresh transcript. The first-attach paths (`./cli.py start` and
the dashboard's new-agent `n` flow) leave it off — the
transcript doesn't exist yet there.
`attach_claude` gains a `resume: bool = False` kwarg;
`_attach_to_bottle` in the dashboard passes `True`.
Final PRD 0020 chunk. `x` on a focused agents-pane row tears
down the selected bottle if the dashboard owns it (started via
the chunk-2 `n` flow): pops `(cm, bottle, identity)` from the
main loop's bottles map, snapshots the transcript best-effort,
calls `cm.__exit__(None, None, None)` to drive the existing
compose-down + network-remove sequence, then `settle_state` to
honor any pre-existing preserve marker.
On a non-owned slug (discovered via `list_active_slugs` but not
in the dashboard's bottles dict — i.e., previous-dashboard or
external `./cli.py start` bottle), `x` is a no-op with a status
hint pointing at `./cli.py cleanup`. Matches the PRD's
cross-dashboard re-attach model: the dashboard can re-attach
either kind, but can only tear down its own.
The PRD's chunk 5 ("quit-cleanup") is satisfied by the existing
no-op behavior of `q` — per the user's resolved-question
answer, quit leaves bottles running unchanged. No code change
needed for that.
Footer surfaces `[x] stop`. 465 unit tests pass (1 new for the
non-owned no-op path; the owned path is integration territory
because it drives a real compose-down).
PRD 0020 chunk 3. Enter on a focused agents-pane row drops to a
claude session inside the selected bottle. Works for both
dashboard-owned bottles (looks up the stored Bottle handle in
the main loop's `bottles` dict) and externally-discovered ones
(synthesizes a DockerBottle from the slug → `claude-bottle-<slug>`
container name).
For the synthesized path, the `--append-system-prompt-file`
target resolves via metadata.json + the manifest's agent prompt
if both can be read; otherwise the re-attach runs without the
flag (claude defaults to no system prompt, the bottle's other
state is untouched).
Shares the curses.endwin → attach → refresh handoff with the
chunk-2 new-agent flow via a new `_attach_to_bottle` helper.
Footer reshuffled to advertise `[Enter] view/attach`. 464 unit
tests pass (3 new for `_bottle_for_slug`).
PRD 0020 chunk 2. Pressing `n` opens a modal that lists every
agent from the manifest with `(N running)` suffixes for ones
that already have bottles up. Type to filter (substring,
case-insensitive); j/k or arrows to navigate; Enter to confirm;
Esc clears the filter on first press, exits the picker on the
second.
On confirmation, the dashboard runs:
- `prepare_with_preflight` from chunk 1 with curses-modal
render + prompt callables (the preflight modal centers the
plan summary + captures [y/N]).
- `backend.launch(plan).__enter__()` — enters but doesn't bind
the context to a `with`. The (cm, bottle, identity) tuple
lands in the main loop's `bottles` dict keyed by slug.
- `curses.endwin()` → `attach_claude(bottle)` → `stdscr.refresh()`
handoff. The agent's claude session takes over the terminal;
on exit the dashboard re-renders with the bottle now visible
in the agents pane.
Crucially the context manager is held alive in `bottles` — never
`__exit__`'d at quit. Chunk 4 will wire `x` to that exit; for
now bottles started from the dashboard stay running until
explicit cleanup. Matches the PRD's "q does not tear down"
decision.
Footer surfaces `[n] new agent`. 461 unit tests pass (8 new for
`_filter_agents` and `_running_counts`).
PRD 0020 chunk 1. `cli/start.py`'s `_launch_bottle` did three
things in one function: prepare + preflight, attach claude, and
settle state on teardown. Split them so the dashboard (PRD 0020
chunk 2+) can reuse the prepare + attach pieces piecewise
without going through the CLI's one-shot orchestrator:
- `prepare_with_preflight(spec, *, stage_dir, render_preflight,
prompt_yes, dry_run)` — injects render + prompt callables so
the CLI binds them to stderr/stdin while the dashboard binds
them to a curses modal. Returns `(plan, identity)`; identity
is set after `backend.prepare` returns so callers can reap
the prepare-time state dir on abort via `settle_state` in
their finally — preserving today's preflight-N cleanup.
- `attach_claude(bottle, *, remote_control)` — runs claude
inside the bottle and returns its exit code. The dashboard
calls this from inside a `curses.endwin` → … →
`stdscr.refresh()` handoff.
- `capture_session_state` / `settle_state` lose their leading
underscore; the dashboard will call them on
session-end + explicit-stop respectively.
`_launch_bottle` becomes a thin orchestrator over those helpers.
No behavior change; all 453 unit tests pass and `./cli.py start
implementer --dry-run` produces identical preflight output.
PRD 0018 chunk 3's atomicity fix used write-temp-then-rename to
update bind-mounted config files. POSIX rename atomically swaps
the inode at the host path — but Docker single-file bind mounts
on Linux pin the source inode at mount time, so post-rename the
container's mount points at the now-orphaned old inode and never
sees the new content. The egress sidecar's SIGHUP-driven reload
re-reads the same stale file → "egress route updates aren't
updatable via the supervisor anymore".
Switch egress_apply + pipelock_apply to write in place (same
inode, truncated + rewritten). Lose file-level POSIX atomicity,
but:
- egress: SIGHUP fires only AFTER the write returns; the
addon's `load_routes` raises `ValueError` on a partial read
and keeps the previous in-memory routes, so the in-process
race window (already narrow) is non-disruptive.
- pipelock: applies via `docker restart` rather than SIGHUP;
restart serializes after the host write completes, so the
container reads the fully-written file on next boot.
macOS Docker Desktop's file-sharing layer (virtiofs / osxfs)
silently re-resolves the path on rename, which is why this bug
didn't surface in dev tests on macOS. Linux native Docker is
the strict reading; the fix works on both.
`egress_render_routes` now emits hand-rolled YAML in the same style
as `pipelock_render_yaml`. The egress addon parses it via
`yaml_subset.parse_yaml_subset` — the same parser the manifest
loader + pipelock_apply use.
Why bother: routes.yaml is bind-mounted into the egress sidecar
AND surfaced to operators through `routes edit` (PRD 0019). JSON-
in-yml renders ugly in $EDITOR and signals "this is data" rather
than "this is config you can read at a glance". Real YAML reads
cleanly.
Mechanics:
- `yaml_subset.py` drops its `claude_bottle.log` dependency.
Errors now raise `YamlSubsetError` (a `ValueError`); the
manifest loader + pipelock_apply catch it at the boundary
and forward to `die` / `PipelockApplyError` so callers see
the same behavior they did before.
- `Dockerfile.egress` adds one COPY line for `yaml_subset.py`
so it sits flat in `/app/` next to the addon. The addon
uses an absolute-import-with-fallback shim so the same file
works inside the container AND from the host's unit tests.
- `egress_apply._merge_single_route` round-trips current
routes.yaml through `parse_yaml_subset` + a new
`_render_routes_payload` helper instead of `json.loads` +
`json.dumps`.
End-to-end: rebuilt the egress image, ran `./cli.py start` to a
full bring-up, confirmed the addon's boot log shows `egress:
loaded 9 route(s)` — i.e., the YAML parses inside the container.
453 unit + 3 integration tests pass.
PRD 0019 chunk 4 (final). The `e` (routes edit) and `p` (pipelock
edit) keys now require an agent selection in the agents pane.
Pressing them with the proposals pane focused, with no active
agents, or with an out-of-range selection is a no-op with a
status hint ("no agent selected; Tab into the agents pane first").
The discover-and-prompt scaffolding inside
`_operator_edit_routes_flow` / `_operator_edit_allowlist_flow` /
`_operator_edit_flow` is gone. The flows now take an `ActiveAgent`
+ required-service name; they refuse with a clear message when
the bottle lacks the requested sidecar (e.g., `routes edit`
against a bottle with no `bottle.egress.routes` declared). The
`discover_egress_slugs` + `discover_pipelock_slugs` +
`_discover_active_with_service` helpers come out — they had no
remaining callers.
Footer now reads `[e/p] edit selected agent`.
PRD 0019 chunk 3. The TUI now has two focusable panes — proposals
and agents — and `Tab` toggles which one the `j/k`/arrow keys
move through.
Each pane keeps its own selection index. Switching panes doesn't
lose the position in the other; the cursor (`>` + reverse-video
row) appears only in the focused pane. The label line on each
pane shows "(focused)" when active.
Footer reshuffled: `[Tab] switch pane [j/k] move [Enter] view
[a/m/r] proposal [e/p] edit [q] quit`. When the agents pane is
focused and there's no status message to display, the idle
status line surfaces the currently-selected agent (or "[no
active agents]" / "[no agent selected]" fallbacks) so the
operator knows what an agent-scoped edit verb will target after
chunk 4 wires them up.
Proposal action keys (a/m/r/Enter) are gated on the proposals
pane being focused — pressing them with the agents pane focused
is a no-op. e/p still use the global discover-and-prompt flow
for one more chunk; chunk 4 swaps them to read the agents-pane
selection.
PRD 0019 chunk 2. The TUI's main render now draws two panes:
proposals on top (existing), active agents on the bottom (new).
Header counts both totals. The agents pane refreshes on the
same 1s tick — agents starting/stopping reflect without
operator action.
Each agent row shows slug, agent name, started-time (HH:MM:SS
of the metadata.json timestamp), and the bracketed list of
sidecars currently up. The `agent` service is filtered out of
the displayed list — it's always present so it'd be noise; the
sidecars are the differentiator. A bottle whose only running
service is `agent` (sidecars still warming up) renders as
`(starting)`.
No selection model yet — that's chunk 3. The cursor stays in
the proposals pane; `j/k`/arrow nav and the proposal action
keys are unchanged.
PRD 0019 chunk 1. New `discover_active_agents()` in dashboard.py
returns one `ActiveAgent(slug, agent_name, started_at, services)`
per currently-running compose project:
- Slugs come from `list_active_slugs()` (chunk-5 shared helper).
- The service set per project comes from ONE label-filtered
`docker ps` call (PRD open question #1: avoids N per-bottle
`compose ps` invocations on each 1s refresh tick).
- agent_name + started_at come from each bottle's
metadata.json; "?" / "" fallbacks when the file is missing
so the row renders rather than vanishes.
Not wired into the TUI yet — chunk 2 renders the agents pane.
The parser (`_parse_services_by_project`) is split out as a pure
function so the conditional-input shape can be unit-tested
without docker.
PRD 0018 chunk 5. The dashboard's operator-edit verbs
(`routes edit`, `pipelock edit`) enumerated running sidecars
via `docker ps --filter name=...` prefix scans. Switch to
`docker compose ls`-based discovery so the dashboard, cleanup
CLI, and launch step all agree on what's running.
Mechanics:
- `claude_bottle/backend/docker/compose.py` grows three shared
helpers: `list_compose_projects` (the JSON parse moved out
of cleanup), `slug_from_compose_project` (inverse of
`compose_project_name`), and `list_active_slugs` (sugar over
the first two for the common "what's running?" question).
- cleanup.py drops its private `_list_compose_projects` +
`_PROJECT_PREFIX` in favor of the shared ones; `list_active`
simplifies (one compose-ls call, not two).
- dashboard.py's `_discover_sidecar_slugs` becomes
`_discover_active_with_service`: cross-references the active
slug list with a label-filtered `docker ps` so only bottles
whose given service container is actually up surface in the
edit menu. Bottles without an egress sidecar (no
bottle.egress.routes) no longer appear for `routes edit`.
3 new unit tests cover the slug ↔ compose-project naming
contract; manual probe with a fake compose project confirms
both `discover_egress_slugs` and `discover_pipelock_slugs`
return the expected slug.
PRD 0018 chunk 4. `claude-bottle cleanup` now derives its work
from `docker compose ls --all --format json`, filtered to projects
whose name starts with `claude-bottle-`. Per project: one `compose
down --volumes` removes the containers + the compose-managed
networks atomically.
The plan also enumerates three fallback buckets:
- Stray containers — `claude-bottle-*` containers with no
`com.docker.compose.project` label (left over from pre-compose
code paths). Cleared via `docker rm -f`.
- Stray networks — `claude-bottle-*` networks with no compose
project label. Cleared via `docker network rm`.
- Orphan state dirs — per-bottle `~/.claude-bottle/state/<id>/`
dirs with no live project AND no `.preserve` marker. The
`.preserve` marker (capability-block or auto-preserve-on-crash)
explicitly opts-out of reaping; manual `rm -rf` is the only
path for preserved state.
cli/cleanup.py collapses to a single y/N prompt — backend.prepare_cleanup
returns everything in one plan, backend.cleanup processes everything,
no more double-prompt for state. The CLI-side state-dir enumeration
+ `_state_summary` flags from PR #25 are gone; the backend's
orphan-detection rules subsume them.
PRD 0018 chunk 4 spike: empirically verified that pipelock's SSRF
guard checks proxied-request destinations (e.g. api.anthropic.com →
public IP) and not source IPs of incoming connections. The
bottle's own internal CIDR was being added to ssrf.ip_allowlist
defensively, but that defense isn't load-bearing — direct pipelock
probe (`curl --proxy http://pipelockhttps://api.anthropic.com/`)
returns 404 from upstream rather than blocking on SSRF.
So:
- Networks become compose-managed (`internal: true` on the
internal network; the egress one is a normal user-defined
bridge). Compose creates + removes them via up/down.
- launch.py drops the `docker network create` + `network_inspect_cidr`
+ pipelock yaml re-render dance.
- The pre-create/external scaffolding from chunk 3 goes with it.
End-to-end `./cli.py start` still works; cleanup leaves no
orphans. If real-world use surfaces an SSRF block we hadn't
predicted, the allowlist can come back via subnet-pinning rather
than pre-create.
PRD 0018 chunk 3. Each instance is now one `docker compose` project:
- launch.py renders the compose spec via chunk-1's
bottle_plan_to_compose, writes it to state/<slug>/docker-compose.yml,
`docker compose up -d`s, and (on teardown) dumps
`docker compose logs --no-color --timestamps` to
state/<slug>/compose.log before `docker compose down`.
- Networks are pre-created (`docker network create --internal` +
user-defined bridge) so pipelock yaml can know the internal CIDR
before compose-up. Compose references them with `external: true`;
the launch step's ExitStack still owns network removal.
- Agent still runs `sleep infinity`; claude reaches it via
`docker exec -it` exactly like before (per the PRD's resolved
TTY question).
- metadata.json grows a `compose_project` field so dashboard /
cleanup tooling can derive compose invocations without
re-deriving the slug.
Security follow-ups from chunk-2 review:
(b) CA private keys: pipelock + egress ca-key.pem land at 0o600
explicitly. The mitmproxy cert+key concat stays 0o644 because
the egress container's uid-1000 user reads it through the
bind mount; parent dir at 0o700 still restricts host-side
reach.
(c) Apply atomicity: egress_apply + pipelock_apply switch from
`docker cp` to host-side write-temp-then-rename on the
bind-mount source. POSIX rename is atomic on the same
filesystem, so a sidecar SIGHUP racing the apply can't see
a half-written routes.yaml / pipelock.yaml.
Per-sidecar Docker{Sidecar}.start/stop methods stay in place — the
integration test suite drives them directly to validate each image
in isolation, which is still useful. launch.py no longer calls
them; a follow-up chunk can prune if the integration tests move to
the compose lifecycle.
git-gate entrypoint's chmod 600 on the keyfile + known_hosts now
tolerates EROFS (`|| true`) — the host SSH key is already 0600
(SSH refuses to load otherwise), so the inside-container chmod
was already a no-op in the docker-cp path and now just needs to
not error on the read-only bind mount.
422 unit tests pass; supervise integration test passes; end-to-end
`./cli.py start implementer` brings up the project, attaches,
captures full merged logs on teardown, and reaps all containers +
networks.
PRD 0018 chunk 2. Each sidecar's prepare-time output (pipelock yaml +
CAs, egress routes.yaml + CAs, git-gate entrypoint + hooks, supervise
current-config, agent env + prompt) now lands in
~/.claude-bottle/state/<slug>/<service>/ instead of an ephemeral
mktemp dir. The state subdirs become the stable bind-mount sources
that chunk 3's docker compose project will reference.
The SDK launch path is unchanged — `docker cp` still copies from the
plan-held paths into containers, just from new locations. start.py's
session-end cleanup is now in `finally`, which also reaps state dirs
left behind by dry-run / preflight-N / prepare-exception paths
(previously only the post-launch path settled state).
PRD 0018 chunk 1. New module `claude_bottle/backend/docker/compose.py`
exposing `bottle_plan_to_compose(plan) -> dict` — a pure function that
translates a fully-resolved DockerBottlePlan into a Compose v2 spec.
Not wired in yet. Tests cover the conditional-service matrix (git
on/off × egress on/off × supervise on/off) plus per-service shape
(images vs builds, network aliases, bind mounts, env vars, depends_on).
The manifest key is `egress:` now; finish the rename so the rest of
the codebase matches. Files (Dockerfile.egress, claude_bottle/egress.py
etc.), classes (Egress, EgressConfig, EgressRoute, EgressPlan,
DockerEgress), constants (EGRESS_HOSTNAME, EGRESS_ROUTES, ...),
container name prefix (claude-bottle-egress-*), docker network alias
(egress), the introspection host (_egress.local), the MCP tool IDs
(egress-block, list-egress-routes), and the preflight label all drop
the `-proxy` suffix.
Now that `bottle.egress` (the old allowlist/dlp_action block) is
gone, the longer `egress_proxy:` disambiguator isn't needed. The
manifest field reads more naturally as just `egress:` with the
same nested `routes: [...]` shape.
Renamed:
- Manifest YAML key: `egress_proxy:` → `egress:`
- Bottle dataclass attr: `bottle.egress_proxy` → `bottle.egress`
- `_BOTTLE_KEYS` entry, schema docstring, and all
user-facing error message labels (`egress.routes[N]`,
`egress has unknown key …`, etc.).
Kept (these refer to the egress-proxy SIDECAR, not the manifest
field):
- File names: `egress_proxy.py`, `egress_proxy_apply.py`,
`egress_proxy_addon.py`, `egress_proxy_addon_core.py`.
- Class names: `EgressProxyConfig`, `EgressProxyRoute`,
`EgressProxyPlan`, `EgressProxy`, `DockerEgressProxy`.
- Helper names: `egress_proxy_manifest_routes`,
`egress_proxy_routes_for_bottle`,
`egress_proxy_token_env_map`, etc.
- Constants: `EGRESS_PROXY_HOSTNAME`, `EGRESS_PROXY_ROLES`,
`EGRESS_PROXY_AUTH_SCHEMES`, `EGRESS_PROXY_FORWARD_PROXY`,
`EGRESS_PROXY_INTROSPECT_URL`, `EGRESS_PROXY_PORT`, etc.
- Container name prefix `claude-bottle-egress-proxy-*`, the
`egress-proxy` docker network alias, the
`egress-proxy-block` + `list-egress-proxy-routes` MCP tool
IDs, the `egress-proxy` audit-log component label.
Local bottle migrated (`~/.claude-bottle/bottles/dev.md` already
updated). The legacy `egress_proxy` key isn't surfaced anywhere
anymore; the generic unknown-key validator catches typos with a
"did you mean: egress, env, git, supervise" hint.
409 unit + integration tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Goal: one allowlist surface (egress_proxy.routes), no second
free-form `egress:` knob. Anything that used to live there now
goes in `egress_proxy.routes` as a bare-pass entry
(`- host: <name>`).
Removed:
- `BottleEgress` dataclass + DLP_ACTIONS constant + bottle.egress
field on `Bottle`.
- `pipelock_bottle_allowlist` helper.
- `pipelock_allowlist_summary` helper (the compact preflight
summary stopped using it after PR #31).
- `allowlist_summary` field on `DockerBottlePlan`.
- `bottle.egress.allowlist` folding in
`egress_proxy_routes_for_bottle` — only DEFAULT_ALLOWLIST
auto-folds now.
- The two-branch logic in `pipelock_effective_allowlist`
(egress-proxy-present vs not) — pipelock now just mirrors
`egress_proxy_routes_for_bottle` unconditionally.
Hard-coded:
- `request_body_scanning.action = "block"` in
`pipelock_build_config` (was driven by
`bottle.egress.dlp_action`). The previous default was already
"block" — the knob to switch to "warn" was a foot-gun in a
sandboxed agent context, so it's gone.
Tests:
- `test_pipelock_allowlist.py` rewritten to assert the
mirrored-from-egress-proxy semantics directly.
- `test_manifest_md_load.py`, `test_pipelock_yaml.py`,
`test_egress_proxy.py` fixtures migrated to put hosts in
`egress_proxy.routes` instead of `egress.allowlist`.
Local bottle migrated too: `~/.claude-bottle/bottles/dev.md`
loses the `egress: { allowlist: [example.com] }` block, picks up
a bare-pass `- host: example.com` route.
409 unit + integration tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Companion to the compact preflight in #31 — the JSON format was
the structured alternative to the verbose text summary. With the
new compact text already on screen, no consumer was using the
JSON shape, and the abstract `BottlePlan.to_dict` was the
biggest piece of API surface no one is implementing against.
Removed:
- `--format` CLI flag from `start` and `resume`.
- `output_format` kwarg from `_launch_bottle`.
- `BottlePlan.to_dict` abstract method.
- `DockerBottlePlan.to_dict` (60-line dict builder).
- The `_PlanView` dataclass — `print` was the only remaining
caller, so the env-name computation is inlined.
- `tests/integration/test_dry_run_plan.py` (JSON-shape
integration test).
- `tests/unit/test_cli_start_format.py` (flag-conflict unit).
Plan-introspection is still possible by reading the
`DockerBottlePlan` dataclass directly — fields like `image`,
`container_name`, `stage_dir`, `use_runsc` are all there. Tooling
that needs a stable wire shape can JSON-serialize the dataclass
themselves.
411 unit + integration tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Trim the y/N preflight to the parts the operator actually scans
before pressing y:
agent
env (one per line)
skills (one per line)
bottle
git gate (one upstream per line)
egress-proxy (one route per line, with [auth:scheme] when set)
Dropped from the display (still on the plan dataclass / json
output for tooling): image, dockerfile, derived-image (cwd) line,
container, stage dir, docker runtime, git remotes list, egress
allowlist summary, tls interception note, supervise note, prompt
metadata, remote-control flag.
`remote_control` kwarg kept on `.print()` for callsite stability
but unused in the compact format.
A `_multi(label, values)` helper does the "first value next to
the label, remainder continuation-indented" pattern that env /
skills / git gate / egress-proxy all share — keeps the columns
aligned to the label width.
Verified against my own dev bottle: output is byte-for-byte the
spec the operator asked for.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The apex-vs-subdomain question, the cert/SNI mismatch when
pipelock-passthrough hosts have wildcard certs, and the
mirror-divergence corner cases stacked up faster than the feature
earned its keep. Going back to exact-host match only.
Addon (`match_route`): single pass, case-insensitive exact match.
`*.foo.com` in a route table is now a literal string that won't
match anything — operators that want subdomains declare them
individually.
Pipelock mirror (`_pipelock_safe_hosts`): silently drops hosts
that don't fit pipelock's `[A-Za-z0-9_.-]+` charset (wildcards,
IPv6 literals, stray chars). Previously normalised wildcards to
their suffix; now just drops them, which matches egress-proxy's
behavior of not matching them either.
8 wildcard test cases removed; 2 lightweight "wildcards are not
supported" assertions retained as documentation. 386 unit pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`*.example.com` now matches `example.com` itself in addition to
every subdomain. RFC 6125 TLS-wildcard semantics excluded the
apex; an allowlist's natural reading of `*.example.com` is "all
of example.com" — and the pipelock mirror already strips
`*.example.com` to `example.com`, so without the apex match the
two layers disagreed (pipelock allowed the apex, egress-proxy
blocked it).
Behavior:
- `*.example.com` matches `example.com` (apex)
- `*.example.com` matches `foo.example.com` (subdomain)
- `*.example.com` matches `a.b.example.com` (nested)
- `*.example.com` does NOT match `barexample.com` (label
boundary required)
Test renamed: `test_wildcard_does_not_match_apex` →
`test_wildcard_matches_apex`. 395 tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PRD 0017 v1 deliberately punted wildcards ("Exact match in v1 —
globs / wildcards are a follow-up"). Now that the supervise mirror
strips `*.` to its suffix for pipelock, the addon needs to actually
match wildcard hosts on its side or the route is dead weight.
Addon `match_route` now does two passes:
1. Exact (case-insensitive) literal match on the hostname.
2. Wildcard suffix match: a route whose host starts with `*.`
matches any request host that ends with `.<suffix>`. So
`*.example.com` matches `foo.example.com` and
`a.b.example.com`, but NOT the apex `example.com` and not
`barexample.com` (the leading `.` of the suffix is
required).
Exact wins — operators can layer a specific route (e.g.
`api.github.com` with auth) on top of a broader wildcard (e.g.
`*.github.com` bare-pass).
8 new unit tests: direct subdomain match, nested subdomain match,
apex rejection, overlapping-suffix rejection, case-insensitive,
exact-wins-over-wildcard (both route orders), no-match
fall-through. 395 unit + integration pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Previous fix stripped wildcard hosts entirely from the pipelock
mirror; the operator wanted the suffix kept so pipelock pins the
base hostname. Now `*.example.com` becomes `example.com` in the
mirror — egress-proxy keeps the wildcard for its own host match,
pipelock allows the suffix.
Behavior change:
- `*.example.com` → `example.com` (was: dropped)
- `*.foo.bar.com` → `foo.bar.com` (one `*.` strip, not
recursive)
- `*` → dropped (normalises to empty)
- `example.com` → `example.com` (unchanged)
- `[::1]`, etc. → dropped (still off pipelock's
charset after any prefix
strip)
Adds explicit de-dup so `*.example.com` + `example.com` collapse
to one entry. Existing wildcard-strip test reshaped + 3 new
edge-case tests.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Pipelock's allowlist parser only accepts `[A-Za-z0-9_.-]+`
literal hostnames. Wildcard routes (`*.example.com`) that
egress-proxy's route table accepts trip pipelock's parser the
moment the mirror tries to render them into the new yaml; the
whole apply fails before pipelock is even touched. Symptom:
operator approves an egress-proxy-block proposal, gets
"pipelock allowlist mirror failed: allowlist line N: '<wildcard>'
has disallowed characters."
Fix: `_mirror_hosts_to_pipelock` filters through
`_pipelock_safe_hosts` before merging — anything outside
pipelock's allowed charset is silently skipped. Wildcard routes
stay live on egress-proxy; pipelock just won't pin a hostname
for the wildcard-matched traffic (caller's call to accept the
hostname-only enforcement gap there).
Adds 4 unit tests covering normal hostnames pass-through,
wildcard stripping, IPv6-literal stripping, and order
preservation.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`_mirror_hosts_to_pipelock` runs BEFORE the egress-proxy write in
`apply_routes_change` — if it raises, egress-proxy is left intact.
The error message claimed the opposite ("egress-proxy routes
updated but pipelock allowlist mirror failed"), pointing the
operator at the wrong half-state.
Reword to make the actual state clear: pipelock failed,
egress-proxy NOT updated, fix pipelock manually with
`pipelock edit <bottle>` then retry.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Instead of asking the agent to compose and submit a full routes
file, the tool now takes ONE proposed route — host + optional
path_allowlist + optional auth — and the supervisor merges it
into the live routes table at approval time. The agent no longer
needs to fetch / reproduce / extend the existing allowlist; it
just describes the host it wants reachable.
Tool input (new):
- `host` (required)
- `path_allowlist` (optional, array of absolute path prefixes)
- `auth` (optional, {scheme, token_ref})
- `justification` (required)
Merge semantics (in `egress_proxy_apply._merge_single_route`):
- Host NOT in current routes → append the proposed route as a
new entry. If `auth` is set, assign the next EGRESS_PROXY_TOKEN_N
slot.
- Host already present → union the proposed `path_allowlist`
with the existing one (proposed entries appended after
existing, deduped). Existing `auth_scheme` / `token_env`
preserved; proposed `auth` ignored (operator-controlled, not
agent-controlled).
- Hostname comparison is case-insensitive.
Dashboard wiring: `approve()` on an egress-proxy-block proposal
now calls `add_route(slug, proposed_route_json)` instead of
`apply_routes_change(slug, full_file)`. add_route fetches the
current routes from the running egress-proxy, merges, and calls
apply_routes_change with the merged content — so the
pipelock-mirror + SIGHUP plumbing from chunk 3 still runs
end-to-end. Audit diff still captures the full-file before/after.
Tool description rewritten to make the new shape obvious and to
stop pointing the agent at the routes file. The
`list-egress-proxy-routes` tool stays available for agents that
want to see what's currently allowed.
Tests: 9 new `_merge_single_route` cases (host absent/present,
path-allowlist union+dedup, auth-slot indexing, case-insensitive
match, existing-auth preservation, missing-host rejection,
malformed-current rejection). 407 unit + integration pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reshape the allowlist topology so the egress-proxy is the bottle's
single allowlist surface, and replace the agent-side
routes/allowlist file mounts with a live MCP tool.
Policy change (move defaults to egress-proxy):
- `egress_proxy_routes_for_bottle(bottle)` now folds in
DEFAULT_ALLOWLIST (the claude-code defaults) and
`bottle.egress.allowlist` (user adds) as bare-pass routes (no
auth, no path filter), on top of the bottle's
`egress_proxy.routes`. Manifest routes win on host collision.
- `pipelock_effective_allowlist(bottle)` mirrors egress-proxy's
effective host set when egress-proxy is in use. Pipelock is
no longer the bottle's primary allowlist authority; it
enforces a downstream copy as defense-in-depth + does DLP body
scanning.
- Split out `egress_proxy_manifest_routes(bottle)` for callers
that want just the manifest entries (tests, internal use).
- DEFAULT_ALLOWLIST moves from `pipelock.py` to `egress_proxy.py`
(pipelock re-imports for the no-egress-proxy fallback path).
- Dropped the `egress-proxy` auto-allow on pipelock's allowlist
— the agent never dials egress-proxy via the proxy mechanism;
pipelock only sees upstream hostnames from egress-proxy's
CONNECTs.
Introspection endpoint (existing mitmproxy feature):
- Egress-proxy addon recognises requests to the magic host
`_egress-proxy.local` and synthesizes responses via
`flow.response = http.Response.make(...)` — no upstream
connection, no allowlist enforcement on the magic host.
- `GET /allowlist` returns the in-memory route table as JSON
(host + path_allowlist + auth_scheme + token_env per route;
no token VALUES).
- Smoke-tested end-to-end against a real egress-proxy container.
MCP tool (existing supervise plumbing):
- New `list-egress-proxy-routes` tool (no inputs, no operator
approval). Handler fetches via egress-proxy's introspection
endpoint using urllib's ProxyHandler against
`EGRESS_PROXY_FORWARD_PROXY`. Returns the JSON payload as the
tool's text content; `isError: true` if the proxy is
unreachable.
- `egress-proxy-block` description now points the agent at
`list-egress-proxy-routes` instead of a staged file path.
- `pipelock-block` description acknowledges the mirror — agents
should prefer `egress-proxy-block` to add hosts; pipelock-block
stays for the rare divergence case.
Drop agent-side file mounts:
- Supervise's `current-config` dir staging no longer writes
routes.yaml / allowlist. Only `Dockerfile` remains
(capability-block still reads it from
`/etc/claude-bottle/current-config/Dockerfile`).
- `prepare.py` stops passing `routes_content` /
`allowlist_content` to `supervise.prepare`.
- `Supervise.prepare` signature simplified to one
`dockerfile_content` kwarg.
Tests: 400 unit + integration pass. Added coverage for
defaults-folding (`TestRoutesForBottleFoldsDefaults`), the new
tool definition + handler, and the updated supervise.prepare
shape.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When the operator approves an egress-proxy-block proposal that
adds a host to egress-proxy's routes, the request would still 403
downstream at pipelock — pipelock's hostname allowlist is set at
bottle launch and doesn't learn about routes added later. The
agent saw "Approved" but the very next retry still failed.
Fix: `apply_routes_change` now mirrors every host in the proposed
routes onto pipelock's allowlist before flipping egress-proxy.
Order matters — pipelock first so a pipelock failure doesn't
leave egress-proxy in a half-state:
1. Validate the new routes content.
2. Extract the hosts.
3. Merge them onto pipelock's current allowlist
(`apply_allowlist_change` — restarts pipelock with the merged
yaml). No-op when every host is already present.
4. docker cp the new routes.yaml into egress-proxy + SIGHUP.
If pipelock's restart fails, egress-proxy is untouched and the
operator gets a clear error pointing at the pipelock half-state.
If egress-proxy's update fails after pipelock succeeded, pipelock
just has the host pre-allowlisted — harmless extra-permissive
until the operator retries.
Adds `_hosts_in_routes` helper using the addon's own parser
(so the mirrored host set matches exactly what the addon will
match on). 4 new unit tests; 368 total pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>