Commit Graph

341 Commits

Author SHA1 Message Date
didericis ae6d11f09d fix(dashboard): use os._exit on quit so bottles survive the dashboard
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m9s
The `bottles` dict held `@contextmanager`-wrapped launch contexts.
On normal Python interpreter shutdown those context managers'
generators got GC'd, which raised GeneratorExit at the yield
point and ran the `finally` block — invoking each bottle's
teardown and tearing down the compose project. Net effect: `q`
WAS implicitly stopping every dashboard-launched bottle even
though the keypress handler just `return`'d.

`os._exit(0)` skips all Python-level cleanup (GC, atexit, etc.),
so the docker compose projects survive the dashboard exit
untouched. Curses gets explicit `endwin()` first because the
brutal exit skips curses.wrapper's normal terminal restoration.

Matches PRD 0020's resolved-question answer (`q` does NOT tear
down bottles; teardown is always explicit via `x` or
`./cli.py cleanup`).
2026-05-26 13:48:16 -04:00
didericis 14d5c78370 fix(attach): use --continue (no picker) instead of --resume
`--resume` alone surfaces claude's session picker even when only
one session exists. `--continue` jumps to the most recent session
non-interactively, which is the actual behavior the dashboard's
Enter re-attach wants for typical bottle-with-one-session cases.
2026-05-26 13:48:16 -04:00
didericis 832e92c7a6 feat(attach): pass --resume on dashboard re-attach
Re-entering a running bottle from the dashboard (Enter on the
agents pane) now invokes claude with `--resume` so the session
picks up the prior conversation history rather than starting a
fresh transcript. The first-attach paths (`./cli.py start` and
the dashboard's new-agent `n` flow) leave it off — the
transcript doesn't exist yet there.

`attach_claude` gains a `resume: bool = False` kwarg;
`_attach_to_bottle` in the dashboard passes `True`.
2026-05-26 13:48:16 -04:00
didericis 3d179f18fc Merge pull request 'feat(dashboard): x stops a dashboard-owned bottle' (#46) from chunk-4-explicit-stop into main
test / unit (push) Successful in 19s
test / integration (push) Successful in 1m8s
2026-05-26 13:48:03 -04:00
didericis 3ed3745982 feat(dashboard): x stops a dashboard-owned bottle (PRD 0020 chunk 4)
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m7s
Final PRD 0020 chunk. `x` on a focused agents-pane row tears
down the selected bottle if the dashboard owns it (started via
the chunk-2 `n` flow): pops `(cm, bottle, identity)` from the
main loop's bottles map, snapshots the transcript best-effort,
calls `cm.__exit__(None, None, None)` to drive the existing
compose-down + network-remove sequence, then `settle_state` to
honor any pre-existing preserve marker.

On a non-owned slug (discovered via `list_active_slugs` but not
in the dashboard's bottles dict — i.e., previous-dashboard or
external `./cli.py start` bottle), `x` is a no-op with a status
hint pointing at `./cli.py cleanup`. Matches the PRD's
cross-dashboard re-attach model: the dashboard can re-attach
either kind, but can only tear down its own.

The PRD's chunk 5 ("quit-cleanup") is satisfied by the existing
no-op behavior of `q` — per the user's resolved-question
answer, quit leaves bottles running unchanged. No code change
needed for that.

Footer surfaces `[x] stop`. 465 unit tests pass (1 new for the
non-owned no-op path; the owned path is integration territory
because it drives a real compose-down).
2026-05-26 03:46:57 -04:00
didericis fc8be2e418 Merge pull request 'feat(dashboard): Enter on agents pane re-attaches to bottle' (#45) from chunk-3-reattach into main
test / unit (push) Successful in 17s
test / integration (push) Successful in 1m8s
2026-05-26 03:40:42 -04:00
didericis 572306ddb6 feat(dashboard): Enter on agents pane re-attaches to bottle
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m11s
PRD 0020 chunk 3. Enter on a focused agents-pane row drops to a
claude session inside the selected bottle. Works for both
dashboard-owned bottles (looks up the stored Bottle handle in
the main loop's `bottles` dict) and externally-discovered ones
(synthesizes a DockerBottle from the slug → `claude-bottle-<slug>`
container name).

For the synthesized path, the `--append-system-prompt-file`
target resolves via metadata.json + the manifest's agent prompt
if both can be read; otherwise the re-attach runs without the
flag (claude defaults to no system prompt, the bottle's other
state is untouched).

Shares the curses.endwin → attach → refresh handoff with the
chunk-2 new-agent flow via a new `_attach_to_bottle` helper.
Footer reshuffled to advertise `[Enter] view/attach`. 464 unit
tests pass (3 new for `_bottle_for_slug`).
2026-05-26 03:39:58 -04:00
didericis 5f2b40e679 Merge pull request 'docs(prd-0020): start + attach to agents from the dashboard' (#44) from dashboard-start-attach-agents into main
test / unit (push) Successful in 17s
test / integration (push) Successful in 1m8s
2026-05-26 03:27:01 -04:00
didericis 309ffaa4ab feat(dashboard): agent picker modal + new-agent (n) flow
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m7s
PRD 0020 chunk 2. Pressing `n` opens a modal that lists every
agent from the manifest with `(N running)` suffixes for ones
that already have bottles up. Type to filter (substring,
case-insensitive); j/k or arrows to navigate; Enter to confirm;
Esc clears the filter on first press, exits the picker on the
second.

On confirmation, the dashboard runs:

  - `prepare_with_preflight` from chunk 1 with curses-modal
    render + prompt callables (the preflight modal centers the
    plan summary + captures [y/N]).
  - `backend.launch(plan).__enter__()` — enters but doesn't bind
    the context to a `with`. The (cm, bottle, identity) tuple
    lands in the main loop's `bottles` dict keyed by slug.
  - `curses.endwin()` → `attach_claude(bottle)` → `stdscr.refresh()`
    handoff. The agent's claude session takes over the terminal;
    on exit the dashboard re-renders with the bottle now visible
    in the agents pane.

Crucially the context manager is held alive in `bottles` — never
`__exit__`'d at quit. Chunk 4 will wire `x` to that exit; for
now bottles started from the dashboard stay running until
explicit cleanup. Matches the PRD's "q does not tear down"
decision.

Footer surfaces `[n] new agent`. 461 unit tests pass (8 new for
`_filter_agents` and `_running_counts`).
2026-05-26 03:22:44 -04:00
didericis a56be6beb5 refactor(start): extract prepare_with_preflight + attach_claude
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m7s
PRD 0020 chunk 1. `cli/start.py`'s `_launch_bottle` did three
things in one function: prepare + preflight, attach claude, and
settle state on teardown. Split them so the dashboard (PRD 0020
chunk 2+) can reuse the prepare + attach pieces piecewise
without going through the CLI's one-shot orchestrator:

  - `prepare_with_preflight(spec, *, stage_dir, render_preflight,
    prompt_yes, dry_run)` — injects render + prompt callables so
    the CLI binds them to stderr/stdin while the dashboard binds
    them to a curses modal. Returns `(plan, identity)`; identity
    is set after `backend.prepare` returns so callers can reap
    the prepare-time state dir on abort via `settle_state` in
    their finally — preserving today's preflight-N cleanup.
  - `attach_claude(bottle, *, remote_control)` — runs claude
    inside the bottle and returns its exit code. The dashboard
    calls this from inside a `curses.endwin` → … →
    `stdscr.refresh()` handoff.
  - `capture_session_state` / `settle_state` lose their leading
    underscore; the dashboard will call them on
    session-end + explicit-stop respectively.

`_launch_bottle` becomes a thin orchestrator over those helpers.
No behavior change; all 453 unit tests pass and `./cli.py start
implementer --dry-run` produces identical preflight output.
2026-05-26 03:12:29 -04:00
didericis 26322bdfd5 docs(prd-0020): record answers to open questions, switch to no-teardown-on-quit 2026-05-26 03:10:26 -04:00
didericis ec20293c0a docs(prd-0020): start + attach to agents from the dashboard
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m7s
Draft a PRD that turns the dashboard into the operator's single
surface — collapses today's two-terminal workflow (one for
`./cli.py start`, one for `./cli.py dashboard`) into a single
dashboard invocation that can spin up new agents, re-attach to
ones it already spun up, and explicitly stop them.

Picks the "handoff" mechanism from `docs/research/claude-code-
pane-in-dashboard.md` (curses.endwin → docker exec -it claude
→ stdscr.refresh) and crucially decouples the bottle's lifetime
from any single claude session: exit claude → back to dashboard
with the bottle still running; quit dashboard → tear down every
bottle the dashboard owns.

Sized into 5 chunks (refactor → picker + new-agent → re-attach
→ explicit stop → quit-cleanup). Seven open questions called
out, the biggest being modal-vs-drop-and-resume for the
preflight Y/N inside curses.
2026-05-26 02:59:42 -04:00
didericis 8cd867f3d2 docs(research): claude-code pane in the dashboard
test / integration (pull_request) Successful in 1m8s
test / unit (pull_request) Successful in 17s
test / unit (push) Successful in 17s
test / integration (push) Successful in 1m2s
Survey the three realistic ways to surface a claude-code session
inside the dashboard TUI:

  1. Handoff — drop curses, foreground claude, restore on exit
     (the existing `e`/`p` pattern, extended). Minimal code,
     side-by-time rather than side-by-side.
  2. Embedded emulator — own a PTY, parse claude-code's ANSI
     stream via `pyte`, paint it into a curses pane. Real
     "pane in the dashboard" but a six-week build with one new
     dep and several integration trap-doors (alt-screen, resize,
     input routing, multi-PTY state).
  3. External multiplexer — delegate pane creation to tmux /
     iTerm / wezterm when detected. Tiny code, but splits the
     operator's mental model and gives up layout control.

Recommendation: ship Option 1 first; defer Option 2 to "only if
Option 1 is observably insufficient"; treat Option 3 as a
niche augmentation for power users.

Calls out four followups worth verifying before committing
(PTY behavior at small sizes, attach-to-existing-exec, SIGWINCH
handling, `-it` vs `-i` for the embedded path).
2026-05-26 02:51:08 -04:00
didericis 942d3a387a Merge pull request 'refactor(egress): write routes.yaml as actual YAML, not JSON-in-yml' (#42) from egress-routes-yaml into main
test / unit (push) Successful in 18s
test / integration (push) Successful in 1m6s
2026-05-26 02:38:44 -04:00
didericis 3c2585cb98 fix(apply): write routes/pipelock yaml in place, not via rename
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m6s
PRD 0018 chunk 3's atomicity fix used write-temp-then-rename to
update bind-mounted config files. POSIX rename atomically swaps
the inode at the host path — but Docker single-file bind mounts
on Linux pin the source inode at mount time, so post-rename the
container's mount points at the now-orphaned old inode and never
sees the new content. The egress sidecar's SIGHUP-driven reload
re-reads the same stale file → "egress route updates aren't
updatable via the supervisor anymore".

Switch egress_apply + pipelock_apply to write in place (same
inode, truncated + rewritten). Lose file-level POSIX atomicity,
but:

  - egress: SIGHUP fires only AFTER the write returns; the
    addon's `load_routes` raises `ValueError` on a partial read
    and keeps the previous in-memory routes, so the in-process
    race window (already narrow) is non-disruptive.
  - pipelock: applies via `docker restart` rather than SIGHUP;
    restart serializes after the host write completes, so the
    container reads the fully-written file on next boot.

macOS Docker Desktop's file-sharing layer (virtiofs / osxfs)
silently re-resolves the path on rename, which is why this bug
didn't surface in dev tests on macOS. Linux native Docker is
the strict reading; the fix works on both.
2026-05-26 02:31:46 -04:00
didericis c9825cf701 refactor(egress): write routes.yaml as actual YAML, not JSON-in-yml
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m7s
`egress_render_routes` now emits hand-rolled YAML in the same style
as `pipelock_render_yaml`. The egress addon parses it via
`yaml_subset.parse_yaml_subset` — the same parser the manifest
loader + pipelock_apply use.

Why bother: routes.yaml is bind-mounted into the egress sidecar
AND surfaced to operators through `routes edit` (PRD 0019). JSON-
in-yml renders ugly in $EDITOR and signals "this is data" rather
than "this is config you can read at a glance". Real YAML reads
cleanly.

Mechanics:

  - `yaml_subset.py` drops its `claude_bottle.log` dependency.
    Errors now raise `YamlSubsetError` (a `ValueError`); the
    manifest loader + pipelock_apply catch it at the boundary
    and forward to `die` / `PipelockApplyError` so callers see
    the same behavior they did before.
  - `Dockerfile.egress` adds one COPY line for `yaml_subset.py`
    so it sits flat in `/app/` next to the addon. The addon
    uses an absolute-import-with-fallback shim so the same file
    works inside the container AND from the host's unit tests.
  - `egress_apply._merge_single_route` round-trips current
    routes.yaml through `parse_yaml_subset` + a new
    `_render_routes_payload` helper instead of `json.loads` +
    `json.dumps`.

End-to-end: rebuilt the egress image, ran `./cli.py start` to a
full bring-up, confirmed the addon's boot log shows `egress:
loaded 9 route(s)` — i.e., the YAML parses inside the container.
453 unit + 3 integration tests pass.
2026-05-26 02:17:42 -04:00
didericis 11d5bf1489 Merge pull request 'feat(dashboard): agent-scoped e/p, drop discover-and-prompt path' (#41) from chunk-4-agent-scoped-edits into main
test / unit (push) Successful in 18s
test / integration (push) Successful in 1m7s
2026-05-26 01:52:40 -04:00
didericis 7b29c81f27 feat(dashboard): agent-scoped e/p, drop discover-and-prompt path
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m6s
PRD 0019 chunk 4 (final). The `e` (routes edit) and `p` (pipelock
edit) keys now require an agent selection in the agents pane.
Pressing them with the proposals pane focused, with no active
agents, or with an out-of-range selection is a no-op with a
status hint ("no agent selected; Tab into the agents pane first").

The discover-and-prompt scaffolding inside
`_operator_edit_routes_flow` / `_operator_edit_allowlist_flow` /
`_operator_edit_flow` is gone. The flows now take an `ActiveAgent`
+ required-service name; they refuse with a clear message when
the bottle lacks the requested sidecar (e.g., `routes edit`
against a bottle with no `bottle.egress.routes` declared). The
`discover_egress_slugs` + `discover_pipelock_slugs` +
`_discover_active_with_service` helpers come out — they had no
remaining callers.

Footer now reads `[e/p] edit selected agent`.
2026-05-26 01:50:28 -04:00
didericis 39e69f0bda Merge pull request 'feat(dashboard): Tab toggle + per-pane selection state' (#40) from chunk-3-pane-selection into main
test / unit (push) Successful in 17s
test / integration (push) Successful in 1m5s
2026-05-26 01:44:24 -04:00
didericis 0abffc4d90 feat(dashboard): Tab toggle + per-pane selection state
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m4s
PRD 0019 chunk 3. The TUI now has two focusable panes — proposals
and agents — and `Tab` toggles which one the `j/k`/arrow keys
move through.

Each pane keeps its own selection index. Switching panes doesn't
lose the position in the other; the cursor (`>` + reverse-video
row) appears only in the focused pane. The label line on each
pane shows "(focused)" when active.

Footer reshuffled: `[Tab] switch pane  [j/k] move  [Enter] view
[a/m/r] proposal  [e/p] edit  [q] quit`. When the agents pane is
focused and there's no status message to display, the idle
status line surfaces the currently-selected agent (or "[no
active agents]" / "[no agent selected]" fallbacks) so the
operator knows what an agent-scoped edit verb will target after
chunk 4 wires them up.

Proposal action keys (a/m/r/Enter) are gated on the proposals
pane being focused — pressing them with the agents pane focused
is a no-op. e/p still use the global discover-and-prompt flow
for one more chunk; chunk 4 swaps them to read the agents-pane
selection.
2026-05-26 01:37:23 -04:00
didericis 897172fcc2 Merge pull request 'feat(dashboard): render active agents pane below proposals' (#39) from chunk-2-render-agents-pane into main
test / unit (push) Successful in 18s
test / integration (push) Successful in 1m3s
2026-05-26 01:34:28 -04:00
didericis cfd8f269ba feat(dashboard): render active agents pane below proposals
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m4s
PRD 0019 chunk 2. The TUI's main render now draws two panes:
proposals on top (existing), active agents on the bottom (new).
Header counts both totals. The agents pane refreshes on the
same 1s tick — agents starting/stopping reflect without
operator action.

Each agent row shows slug, agent name, started-time (HH:MM:SS
of the metadata.json timestamp), and the bracketed list of
sidecars currently up. The `agent` service is filtered out of
the displayed list — it's always present so it'd be noise; the
sidecars are the differentiator. A bottle whose only running
service is `agent` (sidecars still warming up) renders as
`(starting)`.

No selection model yet — that's chunk 3. The cursor stays in
the proposals pane; `j/k`/arrow nav and the proposal action
keys are unchanged.
2026-05-26 01:23:59 -04:00
didericis 8636982e80 Merge pull request 'docs(prd-0019): active agents in dashboard + agent-scoped edit verbs' (#38) from dashboard-active-agents into main
test / unit (push) Successful in 17s
test / integration (push) Successful in 1m8s
2026-05-26 01:14:15 -04:00
didericis 6e4a9f606f feat(dashboard): discover_active_agents helper + ActiveAgent dataclass
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m6s
PRD 0019 chunk 1. New `discover_active_agents()` in dashboard.py
returns one `ActiveAgent(slug, agent_name, started_at, services)`
per currently-running compose project:

  - Slugs come from `list_active_slugs()` (chunk-5 shared helper).
  - The service set per project comes from ONE label-filtered
    `docker ps` call (PRD open question #1: avoids N per-bottle
    `compose ps` invocations on each 1s refresh tick).
  - agent_name + started_at come from each bottle's
    metadata.json; "?" / "" fallbacks when the file is missing
    so the row renders rather than vanishes.

Not wired into the TUI yet — chunk 2 renders the agents pane.
The parser (`_parse_services_by_project`) is split out as a pure
function so the conditional-input shape can be unit-tested
without docker.
2026-05-26 01:11:54 -04:00
didericis 9c9c32a941 docs(prd-0019): drop e/p fallback — selection-only, no-op otherwise
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m6s
When no agent is selected, `e` / `p` do nothing (status line
shows "no agent selected") rather than falling back to today's
global discover-and-prompt. The discover-and-prompt scaffolding
in `_operator_edit_routes_flow` / `_operator_edit_allowlist_flow`
comes out entirely — selection in the agents pane is now the
only way to scope an edit. Old open-question #4 (single-bottle
shortcut behavior in proposals-pane mode) is moot and removed.
2026-05-26 01:03:23 -04:00
didericis 9539982d3f docs(prd-0019): active agents in dashboard + agent-scoped edit verbs
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m3s
Draft a PRD that adds an "active agents" pane to the dashboard
TUI (below the existing proposals pane) and reshapes the operator
`routes edit` (e) / `pipelock edit` (p) verbs to be agent-scoped
when the cursor is in the agents pane — no more global discover
+ disambiguation prompt on every press. Tab toggles which pane
nav keys move through.

Sized into 4 chunks (discovery helper → render pane → selection
state → agent-scoped verbs). Six open questions called out, the
biggest being whether per-bottle `compose ps` on every 1s tick
scales for hosts with many bottles (answer leans toward one
label-filtered `docker ps`).
2026-05-26 00:58:34 -04:00
didericis 6babfcc656 Merge pull request 'refactor(dashboard): discover via docker compose ls' (#37) from chunk-5-dashboard into main
test / unit (push) Successful in 18s
test / integration (push) Successful in 1m5s
2026-05-26 00:24:43 -04:00
didericis 1fa3745832 refactor(dashboard): discover via docker compose ls
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m8s
PRD 0018 chunk 5. The dashboard's operator-edit verbs
(`routes edit`, `pipelock edit`) enumerated running sidecars
via `docker ps --filter name=...` prefix scans. Switch to
`docker compose ls`-based discovery so the dashboard, cleanup
CLI, and launch step all agree on what's running.

Mechanics:

  - `claude_bottle/backend/docker/compose.py` grows three shared
    helpers: `list_compose_projects` (the JSON parse moved out
    of cleanup), `slug_from_compose_project` (inverse of
    `compose_project_name`), and `list_active_slugs` (sugar over
    the first two for the common "what's running?" question).
  - cleanup.py drops its private `_list_compose_projects` +
    `_PROJECT_PREFIX` in favor of the shared ones; `list_active`
    simplifies (one compose-ls call, not two).
  - dashboard.py's `_discover_sidecar_slugs` becomes
    `_discover_active_with_service`: cross-references the active
    slug list with a label-filtered `docker ps` so only bottles
    whose given service container is actually up surface in the
    edit menu. Bottles without an egress sidecar (no
    bottle.egress.routes) no longer appear for `routes edit`.

3 new unit tests cover the slug ↔ compose-project naming
contract; manual probe with a fake compose project confirms
both `discover_egress_slugs` and `discover_pipelock_slugs`
return the expected slug.
2026-05-26 00:14:16 -04:00
didericis 0ae544d2a6 Merge pull request 'refactor(cleanup): compose-ls driven + drop pipelock CIDR allowlist' (#36) from chunk-4-cleanup-cli into main
test / unit (push) Successful in 19s
test / integration (push) Successful in 1m3s
2026-05-26 00:04:29 -04:00
didericis aee249f119 refactor(cleanup): compose-ls driven, plus orphan state-dir reaping
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m9s
PRD 0018 chunk 4. `claude-bottle cleanup` now derives its work
from `docker compose ls --all --format json`, filtered to projects
whose name starts with `claude-bottle-`. Per project: one `compose
down --volumes` removes the containers + the compose-managed
networks atomically.

The plan also enumerates three fallback buckets:

  - Stray containers — `claude-bottle-*` containers with no
    `com.docker.compose.project` label (left over from pre-compose
    code paths). Cleared via `docker rm -f`.
  - Stray networks — `claude-bottle-*` networks with no compose
    project label. Cleared via `docker network rm`.
  - Orphan state dirs — per-bottle `~/.claude-bottle/state/<id>/`
    dirs with no live project AND no `.preserve` marker. The
    `.preserve` marker (capability-block or auto-preserve-on-crash)
    explicitly opts-out of reaping; manual `rm -rf` is the only
    path for preserved state.

cli/cleanup.py collapses to a single y/N prompt — backend.prepare_cleanup
returns everything in one plan, backend.cleanup processes everything,
no more double-prompt for state. The CLI-side state-dir enumeration
+ `_state_summary` flags from PR #25 are gone; the backend's
orphan-detection rules subsume them.
2026-05-25 23:48:02 -04:00
didericis f1c5816d1f refactor(compose): drop pre-create networks + pipelock CIDR allowlist
PRD 0018 chunk 4 spike: empirically verified that pipelock's SSRF
guard checks proxied-request destinations (e.g. api.anthropic.com →
public IP) and not source IPs of incoming connections. The
bottle's own internal CIDR was being added to ssrf.ip_allowlist
defensively, but that defense isn't load-bearing — direct pipelock
probe (`curl --proxy http://pipelock https://api.anthropic.com/`)
returns 404 from upstream rather than blocking on SSRF.

So:

  - Networks become compose-managed (`internal: true` on the
    internal network; the egress one is a normal user-defined
    bridge). Compose creates + removes them via up/down.
  - launch.py drops the `docker network create` + `network_inspect_cidr`
    + pipelock yaml re-render dance.
  - The pre-create/external scaffolding from chunk 3 goes with it.

End-to-end `./cli.py start` still works; cleanup leaves no
orphans. If real-world use surfaces an SSRF block we hadn't
predicted, the allowlist can come back via subnet-pinning rather
than pre-create.
2026-05-25 23:48:02 -04:00
didericis 6927a7ba4b Merge pull request 'feat(launch): switch start to docker compose project per bottle' (#35) from chunk-3-compose-lifecycle into main
test / unit (push) Successful in 18s
test / integration (push) Successful in 1m10s
2026-05-25 23:47:47 -04:00
didericis cefdc8c6e9 feat(launch): switch start to docker compose project per bottle
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m5s
PRD 0018 chunk 3. Each instance is now one `docker compose` project:

  - launch.py renders the compose spec via chunk-1's
    bottle_plan_to_compose, writes it to state/<slug>/docker-compose.yml,
    `docker compose up -d`s, and (on teardown) dumps
    `docker compose logs --no-color --timestamps` to
    state/<slug>/compose.log before `docker compose down`.
  - Networks are pre-created (`docker network create --internal` +
    user-defined bridge) so pipelock yaml can know the internal CIDR
    before compose-up. Compose references them with `external: true`;
    the launch step's ExitStack still owns network removal.
  - Agent still runs `sleep infinity`; claude reaches it via
    `docker exec -it` exactly like before (per the PRD's resolved
    TTY question).
  - metadata.json grows a `compose_project` field so dashboard /
    cleanup tooling can derive compose invocations without
    re-deriving the slug.

Security follow-ups from chunk-2 review:

  (b) CA private keys: pipelock + egress ca-key.pem land at 0o600
      explicitly. The mitmproxy cert+key concat stays 0o644 because
      the egress container's uid-1000 user reads it through the
      bind mount; parent dir at 0o700 still restricts host-side
      reach.
  (c) Apply atomicity: egress_apply + pipelock_apply switch from
      `docker cp` to host-side write-temp-then-rename on the
      bind-mount source. POSIX rename is atomic on the same
      filesystem, so a sidecar SIGHUP racing the apply can't see
      a half-written routes.yaml / pipelock.yaml.

Per-sidecar Docker{Sidecar}.start/stop methods stay in place — the
integration test suite drives them directly to validate each image
in isolation, which is still useful. launch.py no longer calls
them; a follow-up chunk can prune if the integration tests move to
the compose lifecycle.

git-gate entrypoint's chmod 600 on the keyfile + known_hosts now
tolerates EROFS (`|| true`) — the host SSH key is already 0600
(SSH refuses to load otherwise), so the inside-container chmod
was already a no-op in the docker-cp path and now just needs to
not error on the read-only bind mount.

422 unit tests pass; supervise integration test passes; end-to-end
`./cli.py start implementer` brings up the project, attaches,
captures full merged logs on teardown, and reaps all containers +
networks.
2026-05-25 23:16:40 -04:00
didericis b9f6889d09 Merge pull request 'refactor(state): write prepare-time scratch files under state/<slug>/' (#34) from chunk-2-state-bind-mount into main
test / unit (push) Successful in 17s
test / integration (push) Successful in 1m5s
2026-05-25 23:01:19 -04:00
didericis cd82a48399 refactor(state): write prepare-time scratch files under state/<slug>/
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m5s
PRD 0018 chunk 2. Each sidecar's prepare-time output (pipelock yaml +
CAs, egress routes.yaml + CAs, git-gate entrypoint + hooks, supervise
current-config, agent env + prompt) now lands in
~/.claude-bottle/state/<slug>/<service>/ instead of an ephemeral
mktemp dir. The state subdirs become the stable bind-mount sources
that chunk 3's docker compose project will reference.

The SDK launch path is unchanged — `docker cp` still copies from the
plan-held paths into containers, just from new locations. start.py's
session-end cleanup is now in `finally`, which also reaps state dirs
left behind by dry-run / preflight-N / prepare-exception paths
(previously only the post-launch path settled state).
2026-05-25 22:53:47 -04:00
didericis c8c302e50e Merge pull request 'docs(prd-0018): one compose project per bottle instance' (#33) from compose-per-instance into main
test / unit (push) Successful in 18s
test / integration (push) Successful in 1m3s
2026-05-25 22:42:35 -04:00
didericis 3386cabe62 docs(prd-0018): resolve TTY open question — keep exec -it
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m3s
2026-05-25 22:34:26 -04:00
didericis 4760a09263 feat(compose): pure renderer for bottle plan -> compose dict
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m5s
PRD 0018 chunk 1. New module `claude_bottle/backend/docker/compose.py`
exposing `bottle_plan_to_compose(plan) -> dict` — a pure function that
translates a fully-resolved DockerBottlePlan into a Compose v2 spec.

Not wired in yet. Tests cover the conditional-service matrix (git
on/off × egress on/off × supervise on/off) plus per-service shape
(images vs builds, network aliases, bind mounts, env vars, depends_on).
2026-05-25 22:28:50 -04:00
didericis 3251ee1394 docs(prd-0018): one compose project per bottle instance
test / unit (pull_request) Successful in 16s
test / integration (pull_request) Successful in 1m3s
Draft a PRD that replaces the chain of per-sidecar docker SDK calls
in `claude-bottle start` with a single `docker compose` project per
instance. Each `state/<slug>/` dir gets a self-describing set of
artifacts: metadata.json, docker-compose.yml, compose.log, and the
existing transcript/ + live-config/.
2026-05-25 22:15:32 -04:00
didericis a77545dc91 Merge pull request 'refactor(manifest): drop bottle.egress field' (#32) from drop-egress-field into main
test / unit (push) Successful in 17s
test / integration (push) Successful in 1m3s
2026-05-25 22:02:15 -04:00
didericis 1e5b0dcfca refactor: rename egress-proxy → egress everywhere
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m10s
The manifest key is `egress:` now; finish the rename so the rest of
the codebase matches. Files (Dockerfile.egress, claude_bottle/egress.py
etc.), classes (Egress, EgressConfig, EgressRoute, EgressPlan,
DockerEgress), constants (EGRESS_HOSTNAME, EGRESS_ROUTES, ...),
container name prefix (claude-bottle-egress-*), docker network alias
(egress), the introspection host (_egress.local), the MCP tool IDs
(egress-block, list-egress-routes), and the preflight label all drop
the `-proxy` suffix.
2026-05-25 21:59:47 -04:00
didericis 14c8a51c16 refactor(manifest): rename egress_proxy key to egress
test / unit (pull_request) Successful in 16s
test / integration (pull_request) Successful in 1m4s
Now that `bottle.egress` (the old allowlist/dlp_action block) is
gone, the longer `egress_proxy:` disambiguator isn't needed. The
manifest field reads more naturally as just `egress:` with the
same nested `routes: [...]` shape.

Renamed:
  - Manifest YAML key:    `egress_proxy:` → `egress:`
  - Bottle dataclass attr: `bottle.egress_proxy` → `bottle.egress`
  - `_BOTTLE_KEYS` entry, schema docstring, and all
    user-facing error message labels (`egress.routes[N]`,
    `egress has unknown key …`, etc.).

Kept (these refer to the egress-proxy SIDECAR, not the manifest
field):
  - File names: `egress_proxy.py`, `egress_proxy_apply.py`,
    `egress_proxy_addon.py`, `egress_proxy_addon_core.py`.
  - Class names: `EgressProxyConfig`, `EgressProxyRoute`,
    `EgressProxyPlan`, `EgressProxy`, `DockerEgressProxy`.
  - Helper names: `egress_proxy_manifest_routes`,
    `egress_proxy_routes_for_bottle`,
    `egress_proxy_token_env_map`, etc.
  - Constants: `EGRESS_PROXY_HOSTNAME`, `EGRESS_PROXY_ROLES`,
    `EGRESS_PROXY_AUTH_SCHEMES`, `EGRESS_PROXY_FORWARD_PROXY`,
    `EGRESS_PROXY_INTROSPECT_URL`, `EGRESS_PROXY_PORT`, etc.
  - Container name prefix `claude-bottle-egress-proxy-*`, the
    `egress-proxy` docker network alias, the
    `egress-proxy-block` + `list-egress-proxy-routes` MCP tool
    IDs, the `egress-proxy` audit-log component label.

Local bottle migrated (`~/.claude-bottle/bottles/dev.md` already
updated). The legacy `egress_proxy` key isn't surfaced anywhere
anymore; the generic unknown-key validator catches typos with a
"did you mean: egress, env, git, supervise" hint.

409 unit + integration tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 21:25:51 -04:00
didericis 6456904763 refactor(manifest): drop bottle.egress field, egress_proxy is the only allowlist
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m4s
Goal: one allowlist surface (egress_proxy.routes), no second
free-form `egress:` knob. Anything that used to live there now
goes in `egress_proxy.routes` as a bare-pass entry
(`- host: <name>`).

Removed:
  - `BottleEgress` dataclass + DLP_ACTIONS constant + bottle.egress
    field on `Bottle`.
  - `pipelock_bottle_allowlist` helper.
  - `pipelock_allowlist_summary` helper (the compact preflight
    summary stopped using it after PR #31).
  - `allowlist_summary` field on `DockerBottlePlan`.
  - `bottle.egress.allowlist` folding in
    `egress_proxy_routes_for_bottle` — only DEFAULT_ALLOWLIST
    auto-folds now.
  - The two-branch logic in `pipelock_effective_allowlist`
    (egress-proxy-present vs not) — pipelock now just mirrors
    `egress_proxy_routes_for_bottle` unconditionally.

Hard-coded:
  - `request_body_scanning.action = "block"` in
    `pipelock_build_config` (was driven by
    `bottle.egress.dlp_action`). The previous default was already
    "block" — the knob to switch to "warn" was a foot-gun in a
    sandboxed agent context, so it's gone.

Tests:
  - `test_pipelock_allowlist.py` rewritten to assert the
    mirrored-from-egress-proxy semantics directly.
  - `test_manifest_md_load.py`, `test_pipelock_yaml.py`,
    `test_egress_proxy.py` fixtures migrated to put hosts in
    `egress_proxy.routes` instead of `egress.allowlist`.

Local bottle migrated too: `~/.claude-bottle/bottles/dev.md`
loses the `egress: { allowlist: [example.com] }` block, picks up
a bare-pass `- host: example.com` route.

409 unit + integration tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 21:12:56 -04:00
didericis d79a976999 Merge pull request 'refactor(preflight): compact y/N summary' (#31) from compact-preflight-summary into main
test / unit (push) Successful in 18s
test / integration (push) Successful in 1m5s
2026-05-25 20:59:47 -04:00
didericis 572106d98f refactor(cli): drop --format=json end-to-end
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m2s
Companion to the compact preflight in #31 — the JSON format was
the structured alternative to the verbose text summary. With the
new compact text already on screen, no consumer was using the
JSON shape, and the abstract `BottlePlan.to_dict` was the
biggest piece of API surface no one is implementing against.

Removed:
  - `--format` CLI flag from `start` and `resume`.
  - `output_format` kwarg from `_launch_bottle`.
  - `BottlePlan.to_dict` abstract method.
  - `DockerBottlePlan.to_dict` (60-line dict builder).
  - The `_PlanView` dataclass — `print` was the only remaining
    caller, so the env-name computation is inlined.
  - `tests/integration/test_dry_run_plan.py` (JSON-shape
    integration test).
  - `tests/unit/test_cli_start_format.py` (flag-conflict unit).

Plan-introspection is still possible by reading the
`DockerBottlePlan` dataclass directly — fields like `image`,
`container_name`, `stage_dir`, `use_runsc` are all there. Tooling
that needs a stable wire shape can JSON-serialize the dataclass
themselves.

411 unit + integration tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 20:54:51 -04:00
didericis 5d5f118fb4 refactor(preflight): compact summary — agent / env / skills / bottle / gates
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m8s
Trim the y/N preflight to the parts the operator actually scans
before pressing y:

  agent
  env (one per line)
  skills (one per line)
  bottle
    git gate (one upstream per line)
    egress-proxy (one route per line, with [auth:scheme] when set)

Dropped from the display (still on the plan dataclass / json
output for tooling): image, dockerfile, derived-image (cwd) line,
container, stage dir, docker runtime, git remotes list, egress
allowlist summary, tls interception note, supervise note, prompt
metadata, remote-control flag.

`remote_control` kwarg kept on `.print()` for callsite stability
but unused in the compact format.

A `_multi(label, values)` helper does the "first value next to
the label, remainder continuation-indented" pattern that env /
skills / git gate / egress-proxy all share — keeps the columns
aligned to the label width.

Verified against my own dev bottle: output is byte-for-byte the
spec the operator asked for.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 20:44:27 -04:00
didericis 5edff68d52 Merge pull request 'feat(egress-proxy): retarget remediation flow (PRD 0017 chunk 3)' (#30) from egress-proxy-block-remediation into main
test / unit (push) Successful in 18s
test / integration (push) Successful in 1m4s
2026-05-25 20:34:23 -04:00
didericis 6c886200d9 revert(egress-proxy): drop wildcard host support entirely
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m3s
The apex-vs-subdomain question, the cert/SNI mismatch when
pipelock-passthrough hosts have wildcard certs, and the
mirror-divergence corner cases stacked up faster than the feature
earned its keep. Going back to exact-host match only.

Addon (`match_route`): single pass, case-insensitive exact match.
`*.foo.com` in a route table is now a literal string that won't
match anything — operators that want subdomains declare them
individually.

Pipelock mirror (`_pipelock_safe_hosts`): silently drops hosts
that don't fit pipelock's `[A-Za-z0-9_.-]+` charset (wildcards,
IPv6 literals, stray chars). Previously normalised wildcards to
their suffix; now just drops them, which matches egress-proxy's
behavior of not matching them either.

8 wildcard test cases removed; 2 lightweight "wildcards are not
supported" assertions retained as documentation. 386 unit pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 19:48:35 -04:00
didericis 6177c0518e fix(egress-proxy-addon): wildcard hosts also match the apex
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m6s
`*.example.com` now matches `example.com` itself in addition to
every subdomain. RFC 6125 TLS-wildcard semantics excluded the
apex; an allowlist's natural reading of `*.example.com` is "all
of example.com" — and the pipelock mirror already strips
`*.example.com` to `example.com`, so without the apex match the
two layers disagreed (pipelock allowed the apex, egress-proxy
blocked it).

Behavior:
  - `*.example.com` matches `example.com`     (apex)
  - `*.example.com` matches `foo.example.com` (subdomain)
  - `*.example.com` matches `a.b.example.com` (nested)
  - `*.example.com` does NOT match `barexample.com` (label
    boundary required)

Test renamed: `test_wildcard_does_not_match_apex` →
`test_wildcard_matches_apex`. 395 tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 19:16:33 -04:00
didericis 811a6fbfe9 feat(egress-proxy-addon): wildcard host matching with exact-match precedence
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m3s
PRD 0017 v1 deliberately punted wildcards ("Exact match in v1 —
globs / wildcards are a follow-up"). Now that the supervise mirror
strips `*.` to its suffix for pipelock, the addon needs to actually
match wildcard hosts on its side or the route is dead weight.

Addon `match_route` now does two passes:
  1. Exact (case-insensitive) literal match on the hostname.
  2. Wildcard suffix match: a route whose host starts with `*.`
     matches any request host that ends with `.<suffix>`. So
     `*.example.com` matches `foo.example.com` and
     `a.b.example.com`, but NOT the apex `example.com` and not
     `barexample.com` (the leading `.` of the suffix is
     required).

Exact wins — operators can layer a specific route (e.g.
`api.github.com` with auth) on top of a broader wildcard (e.g.
`*.github.com` bare-pass).

8 new unit tests: direct subdomain match, nested subdomain match,
apex rejection, overlapping-suffix rejection, case-insensitive,
exact-wins-over-wildcard (both route orders), no-match
fall-through. 395 unit + integration pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 19:10:22 -04:00