Commit Graph

220 Commits

Author SHA1 Message Date
didericis-claude 9c333bc130 feat(smolmachines): smolvm subprocess wrapper (PRD 0023 chunk 2b)
test / unit (pull_request) Successful in 21s
test / integration (pull_request) Successful in 41s
claude_bottle/backend/smolmachines/smolvm.py — one thin Python
function per smolvm CLI subcommand the launch flow needs:

  - pack_create(image, output)            → smolvm pack create
  - machine_create(name, from_path,
                   smolfile)               → smolvm machine create
  - machine_start(name)                   → smolvm machine start
  - machine_stop(name)                    → smolvm machine stop
  - machine_delete(name)                  → smolvm machine delete -f
  - machine_exec(name, argv, env,
                 workdir, timeout)         → smolvm machine exec
  - machine_cp(src, dst)                  → smolvm machine cp
  - is_available()                        → shutil.which check

The wrapper hides the CLI's inconsistent name-flag style
(positional NAME on create/delete, --name on start/stop/exec/
status) behind a uniform `name=` kwarg.

Two return shapes:
  - SmolvmRunResult (returncode + stdout + stderr) from
    machine_exec, because callers care about the in-VM
    command's exit code.
  - Raises SmolvmError on non-zero for all other commands;
    failure to create/start/stop a VM is fatal to the launch
    flow, not branched on.

Tests:
  - 15 unit cases mocking subprocess.run, covering argv shape
    per subcommand (the --name vs positional inconsistency
    locked down), SmolvmError on non-zero for non-exec paths,
    SmolvmRunResult passthrough on exec, empty-path cp no-op.
  - 2 integration cases against the real smolvm binary
    (gated on Darwin + smolvm on PATH + not GITEA_ACTIONS):
    smolvm --help responds, machine ls --json parses as a
    list (the contract chunk 4's list_active will consume).

531 unit tests passing. Real-smolvm smoke green locally.

Bundle bringup + launch wiring + the localhost-reach /
egress-port-bypass probes land in chunks 2c + 2d.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 04:11:36 -04:00
didericis-claude c73d717f71 feat(smolmachines): rewrite Smolfile to smolvm 0.8.0 schema + drop gvproxy (PRD 0023 chunk 2a)
test / unit (pull_request) Successful in 21s
test / integration (pull_request) Successful in 39s
First sub-PR of chunk 2: rewrite the renderer chunk 1 shipped to
match smolvm 0.8.0's actual Smolfile shape, delete the dead
gvproxy renderer + its tests, simplify the prepare flow now that
there's no gvproxy socket + no loopback-port allocation.

Smolfile renderer:
- Old shape (under the abandoned gvproxy design): name = ...,
  command = [...], [[net]] attachment = "unixgram",
  socket = "...".
- New shape (smolvm 0.8.0): env = [...] (sorted K=V pairs),
  [network] allow_cidrs = ["<bundle-ip>/32"]. Nothing else.
  image / entrypoint / cmd come from the .smolmachine artifact
  built in chunk 2b; cpus / memory left at smolvm defaults.
- Tests assert no leakage of TSI's --outbound-localhost-only or
  the old gvproxy/unixgram keys.

util.py:
- smolmachines_gvproxy_subnet → smolmachines_bundle_subnet,
  returning (subnet, gateway, bundle_ip). bundle_ip is always
  at .2 (gateway .1); subnet is /24, third octet derived from
  the slug hash, skipping the docker-default 17 to avoid the
  common 192.168.17.x collision.
- allocate_loopback_port: deleted. The bundle gets a pinned
  docker IP now; the agent dials that IP directly through TSI.
- smolmachines_preflight: dropped the gvproxy check; only
  smolvm is required.

prepare.py:
- Drops the gvproxy.yaml render + the loopback port allocation
  + the gvproxy_socket field on the plan.
- Derives subnet / gateway / bundle_ip from the slug and
  populates the new SmolmachinesBottlePlan fields.
- Agent env now uses IP-literal URLs (http://<bundle-ip>:8888
  etc) since the guest will have no DNS resolver inside TSI's
  allowlist.

bottle_plan.py:
- Old fields: gvproxy_config_path, gvproxy_socket,
  gvproxy_subnet, gvproxy_gateway, host_port_map.
- New fields: bundle_subnet, bundle_gateway, bundle_ip,
  smolfile_path. (smolmachine artifact path lands in chunk 2b.)

Net: -410 lines. Full unit suite: 516 passing.

The VM lifecycle + bundle bringup + launch wiring + smoke tests
land in chunk 2b.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 04:01:07 -04:00
didericis 2aca9e609a refactor(backend): extract shared print_multi for plan preflights
test / unit (pull_request) Successful in 21s
test / integration (pull_request) Successful in 42s
Addresses PR #62 review comments on
claude_bottle/backend/smolmachines/bottle_plan.py:

- Lift the multi-value label printer (was a nested helper inside
  DockerBottlePlan.print) into a new module
  claude_bottle/backend/print_util.py:print_multi. Both backends
  use it for env / skills / git / egress lines.

- Strip the three smolmachines-preflight lines the review flagged:
  the gvproxy subnet line, the smolfile path line, and the
  gvproxy-config path line. Internal detail — operators see the
  agent / env / skills / bottle / git / egress that already
  matter on the docker side, and nothing else.

- Add `git → upstream` to the smolmachines git output to match
  what's useful at preflight time (the docker version shows
  upstream_host:port; this is similar shape).

Leaves the slug=spec.identity-or-mint pattern alone pending a
reply on PR comment #432 — the docker backend uses the same
pattern to preserve identity across `resume`, so dropping it
would silently break the resume path once smolmachines launch
lands.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 02:36:03 -04:00
didericis 20f411b22e feat(smolmachines): backend skeleton + Smolfile/gvproxy renderers (PRD 0023 chunk 1)
test / unit (pull_request) Successful in 22s
test / integration (pull_request) Successful in 43s
Ships the smolmachines backend's prepare side: subpackage layout,
`_BACKENDS` registration under "smolmachines", preflight check
for `smolvm` + `gvproxy` on PATH, and the two config-file
renderers (Smolfile TOML + gvproxy YAML). Launch raises
NotImplementedError until chunk 2.

New module layout (mirrors backend/docker/):
  claude_bottle/backend/smolmachines/
    __init__.py            re-exports SmolmachinesBottleBackend
    backend.py             SmolmachinesBottleBackend façade
    bottle.py              SmolmachinesBottle stub (NotImpl until ch2)
    bottle_plan.py         SmolmachinesBottlePlan + .print()
    bottle_cleanup_plan.py SmolmachinesBottleCleanupPlan stub
    prepare.py             resolve_plan: writes both config files
    smolfile.py            TOML renderer (stdlib, no tomli_w dep)
    gvproxy_config.py      YAML renderer (same shape as pipelock_yaml)
    util.py                preflight + per-slug subnet + loopback port

The renderers are pure functions. `resolve_plan` runs the
preflight, allocates one host-side loopback port per active
sidecar (pipelock always; git-gate / supervise conditional),
derives a per-slug gvproxy subnet (hash-mod-254, skipping the
docker-default 17), and writes:

  - <stage>/gvproxy.yaml: subnet + DNS rule resolving only
    `proxy.internal` + port_forwards (one per active sidecar).
  - <stage>/smolfile.toml: guest command/env + virtio-net device
    backed by gvproxy's unixgram socket. No TSI flags — see
    PRD 0023 "Why gvproxy, not TSI".

The agent's HTTPS_PROXY etc. point at `proxy.internal:<gateway-
port>` so the guest dials through gvproxy. gvproxy resolves only
`proxy.internal` → the gateway IP, and forwards exactly the
listed ports to the host-side sidecar bundle (PRD 0024); every
other destination — host LAN, host loopback, public internet
directly — is unreachable by construction.

29 new unit tests covering renderer correctness, subnet
derivation stability + collision-avoidance, loopback port
allocation, and preflight error paths. Full unit suite: 532
passing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 02:22:08 -04:00
didericis 5b9ceaaaee fix(sidecars): per-daemon pipelock restart keeps supervise socket alive
test / unit (pull_request) Successful in 21s
test / integration (pull_request) Successful in 43s
`apply_allowlist_change` used `docker restart <bundle>` to make
pipelock reload, which bounced ALL four daemons — including
supervise, whose MCP socket the agent's claude-code client had
open. That dropped the connection. A second apply works because
supervise has come back up by then.

Fix: per-daemon restart via SIGUSR1.

- New `_Supervisor.restart_daemon(name)` terminates one named
  child and spawns a replacement in place. Other daemons keep
  running.
- main() wires SIGUSR1 → `restart_daemon("pipelock")`. Pipelock
  has no in-process reload, so this is its analog of egress's
  SIGHUP-reload-addon path. Pipelock is the only daemon that
  currently needs hot-config reload via restart; if others
  acquire the need, add a new signal.
- `apply_allowlist_change` now `docker kill --signal USR1
  <bundle>` instead of `docker restart`. Supervise / egress /
  git-gate keep running across the apply.

Tests:
- New `_Supervisor.restart_daemon` cases: replaces in place
  (different pid post-restart, sibling daemon unchanged),
  unknown name is a no-op, restart-during-shutdown is a no-op.
- `test_pipelock_apply` rewritten to bring up the bundle image
  with `CLAUDE_BOTTLE_SIDECAR_DAEMONS=pipelock` so the
  supervisor is PID 1 and handles SIGUSR1. The previous
  standalone-pipelock setup wouldn't survive SIGUSR1 (pipelock
  default disposition is terminate). Test builds the bundle
  image in setUpClass (cached layers make repeat runs fast).

531 tests passing locally (unit + integration).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 02:12:37 -04:00
didericis 0848344438 fix(sidecars): apply_routes_change targets the bundle + SIGHUP forwarding
test / unit (pull_request) Successful in 20s
test / integration (pull_request) Successful in 42s
Two bugs surfaced when applying an egress route change:

1. egress_apply.py still targeted claude-bottle-egress-<slug> —
   the legacy per-sidecar container that no longer exists (it's
   a docker-network alias on the bundle now). Switched it to
   sidecar_bundle_container_name(slug), matching the chunk-5
   fix already made to pipelock_apply.py.

2. `docker kill --signal HUP <bundle>` lands SIGHUP on the
   supervisor (PID 1 in the bundle), which previously had no
   SIGHUP handler — the signal was ignored. Added
   `_Supervisor.forward_signal(sig, daemon_name)` and a SIGHUP
   handler in main() that forwards to the egress daemon so
   mitmdump's addon reload still works under the bundle.

Tests:
- New _Supervisor.forward_signal cases: forwards to the named
  child (Python subprocess as the SIGHUP target — bash trap +
  stdout=PIPE deferral interferes with the production-style
  test); unknown-daemon name is a no-op.

Stale-reference cleanup (separate issue surfaced while looking
at this):
- claude_bottle/{egress,git_gate,egress_addon,
  egress_addon_core,supervise_server}.py: Dockerfile.egress /
  Dockerfile.git-gate / Dockerfile.supervise references updated
  to Dockerfile.sidecars (the old per-sidecar Dockerfiles were
  deleted in PRD 0024 chunk 5).
- tests/README.md: dropped the entry for
  test_pipelock_sidecar_smoke (deleted in chunk 3) and added
  the new bundle integration tests.
- git_gate.py: stale `DockerGitGate.start via docker cp`
  reference (the method was deleted in chunk 3) rewritten to
  the bind-mount path the renderer uses now.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 01:56:38 -04:00
didericis 62f6f8db34 refactor(sidecars): bundle is the only shape (PRD 0024 chunk 5)
test / unit (pull_request) Successful in 21s
test / integration (pull_request) Successful in 43s
The CLAUDE_BOTTLE_SIDECAR_BUNDLE feature flag is gone. Every
bottle ships with the agent + bundle pair — no opt-in, no legacy
four-sidecar fallback.

Changes:

- Renderer (compose.py): bottle_plan_to_compose unconditionally
  emits {agent, sidecars}. Deleted _pipelock_service,
  _git_gate_service, _egress_service, _supervise_service helpers.
  _agent_service.depends_on collapses to ["sidecars"].

- sidecar_bundle.py: deleted sidecar_bundle_enabled (the flag
  parser). SIDECAR_BUNDLE_IMAGE + container-name helper stay.

- pipelock_apply.py: docker cp + docker restart now target
  sidecar_bundle_container_name(slug). Bundle restart bounces
  all four daemons together (per-daemon reload is the eventual
  feature, not v1).

- Per-sidecar modules trimmed:
  - egress.py: dropped EGRESS_IMAGE, EGRESS_DOCKERFILE,
    build_egress_image, egress_url. Kept EGRESS_PORT, CA paths,
    egress_container_name (still used by the renderer's network
    aliases).
  - git_gate.py: dropped GIT_GATE_IMAGE, GIT_GATE_DOCKERFILE,
    build_git_gate_image. Kept git_gate_host + GIT_GATE_PORT.
  - supervise.py: dropped SUPERVISE_IMAGE, SUPERVISE_DOCKERFILE,
    build_supervise_image, supervise_url.

- Deleted Dockerfile.{egress,git-gate,supervise}. The bundle's
  Dockerfile.sidecars is the only sidecar image now.

- test_compose.py: deleted TestPipelockAlwaysPresent,
  TestConditionalGitGate, TestConditionalEgress,
  TestConditionalSupervise, TestFullMatrix (legacy-shape only),
  TestSidecarBundleFlag (flag is gone). TestSidecarBundleShape
  drops its patch.dict wrapper. TestAgentAlwaysPresent's
  depends_on cases collapse to one.

- test_pipelock_apply.py: bringup container name uses
  sidecar_bundle_container_name(slug) to match the production
  target.

- README.md Architecture section rewritten to describe the
  agent + bundle pair.

Net: -626 lines.

Test status: 498 unit + 27 integration + 1 skipped (chunk-4
pending — superseded by this chunk's rewrite). Locally verified
end-to-end bottle launch produces exactly 2 containers
(claude-bottle-<slug> + claude-bottle-sidecars-<slug>).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 01:37:21 -04:00
didericis 2287b0dd08 test(sidecars): integration sweep for the bundle path (PRD 0024 chunk 4)
test / unit (pull_request) Successful in 20s
test / integration (pull_request) Successful in 40s
Three deliverables:

1. Rewrite test_pipelock_apply bringup with a direct `docker run`.
   Replaces the .start-based bringup deleted in chunk 3. Stages
   the yaml + CAs to the real pipelock_state_dir so the bind-
   mount target matches what apply_allowlist_change writes to —
   the legacy .start path did this implicitly because it lived
   inside the production flow; the new bringup needs to be
   explicit about the path. All 4 cases pass.

2. New tests/integration/test_sidecar_bundle_compose.py: end-
   to-end smoke with CLAUDE_BOTTLE_SIDECAR_BUNDLE=1. Brings up
   a real bottle via the compose path and verifies the agent
   can reach pipelock + supervise through the bundle's legacy
   aliases (no agent-side config changes between flag positions).
   Skipped under act_runner — multi-stage build + bind mounts.

3. Two bundle-path bugs surfaced and fixed while running PRD
   0022 with the flag on:

   - egress_entrypoint.sh: add `--set confdir=/home/mitmproxy/
     .mitmproxy` so mitmdump finds the bind-mounted CA. The
     legacy Dockerfile.egress runs as user mitmproxy (~mitmproxy
     resolves correctly); the bundle runs as root and otherwise
     would look in /root/.mitmproxy/ and mint a NEW CA the agent
     doesn't trust. Symptom: PRD 0022 attack-3 curl failed with
     "unable to get local issuer certificate".

   - sidecar_init.py: add `--listen 0.0.0.0:8888` to pipelock's
     argv. Without it pipelock defaults to 127.0.0.1, so the
     in-bundle egress's upstream connect to the
     `claude-bottle-pipelock-<slug>` alias arrives over the
     docker network and gets refused. The legacy renderer
     passed this flag verbatim; the bundle dropped it. Symptom:
     egress returned HTTP 502 with "Connect call failed
     ('172.x.x.x', 8888)".

   PRD 0022's 5-attack sandbox-escape suite now passes with the
   bundle flag on AND off.

Test status:
- Unit: 533 passing.
- Integration: 9 passing locally with flag off, 5 passing with
  flag on. Bundle compose smoke + PRD 0022 sandbox-escape both
  green under CLAUDE_BOTTLE_SIDECAR_BUNDLE=1.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 01:15:14 -04:00
didericis 539234f29e refactor(sidecars): drop vestigial start/stop methods (PRD 0024 chunk 3)
test / unit (pull_request) Successful in 21s
test / integration (pull_request) Successful in 41s
Compose-up has owned per-container lifecycle since PRD 0018 ch3;
the .start() / .stop() methods on DockerPipelockProxy /
DockerEgress / DockerGitGate / DockerSupervise (and their
abstractmethod declarations in the four base ABCs) were already
documented as vestigial. With the bundle path in flight
(PRD 0024 ch2), they are truly dead — collapse to nothing.

Changes:
- Removed start/stop methods from the four DockerSidecar
  classes. Plan dataclasses, image/path constants,
  container-name helpers, and the .prepare() methods all stay
  (the renderer + apply path still need them).
- Removed the matching @abstractmethod declarations in the
  base ABCs so concrete subclasses don't have to stub them.
- launch.launch() and prepare.resolve_plan() no longer take
  proxy/git_gate/egress/supervise instance parameters. backend.py
  loses the four instance attributes it threaded through.
  prepare.resolve_plan() instantiates the four classes itself
  to call their .prepare() methods.
- Deleted four integration tests that only exercised the
  removed lifecycle: test_pipelock_sidecar_smoke,
  test_supervise_sidecar, test_git_gate_sidecar,
  test_git_gate_mirror.
- Dropped the .stop-idempotency case in test_orphan_cleanup;
  the network-cleanup cases stay (those test real production
  code).
- Marked test_pipelock_apply @skip pending chunk 4 — its
  bringup helper used .start; chunk 4 rewrites it with direct
  `docker run`.

Dockerfile deletion deferred to chunk 5 (when the bundle flag
default flips) — the legacy compose path still needs
Dockerfile.{egress,git-gate,supervise} until then.

Net: 708 lines removed, 80 added.

533 unit tests + 27 integration tests passing (5 skipped: the
chunk-4-pending case + existing GITEA_ACTIONS guards).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 01:01:10 -04:00
didericis a1180adec1 feat(compose): emit bundle shape behind feature flag (PRD 0024 chunk 2)
test / unit (pull_request) Successful in 21s
test / integration (pull_request) Successful in 1m12s
The docker backend's compose renderer now emits a single
`sidecars` service in place of the four per-sidecar services
when CLAUDE_BOTTLE_SIDECAR_BUNDLE is truthy. Default (unset/0/
false) keeps the legacy five-service shape so existing operators
don't have to migrate atomically; chunks 4-5 flip the default
and delete the flag.

New module claude_bottle/backend/docker/sidecar_bundle.py owns
the bundle image constant (CLAUDE_BOTTLE_SIDECAR_IMAGE env var
override + claude-bottle-sidecars:latest default), the
Dockerfile reference, the container-name helper, and the
flag-parser.

The bundle service:
- joins both internal + egress networks with aliases for every
  legacy shortname + per-slug long form so the agent's
  HTTPS_PROXY URL (which dials `egress` or
  `claude-bottle-pipelock-<slug>`) keeps resolving with no
  agent-side change
- carries CLAUDE_BOTTLE_SIDECAR_DAEMONS=<csv> for the init
  supervisor to narrow which daemons to start
- carries the union of the four prior services' daemon-private
  env vars (EGRESS_UPSTREAM_PROXY, SUPERVISE_*, token env names)
- does NOT carry HTTPS_PROXY/HTTP_PROXY/NO_PROXY — those would
  route git-gate's git fetches through pipelock by mistake
- union'd bind-mounts at the same in-container paths as before

HTTPS_PROXY scoping moved into egress_entrypoint.sh so only
mitmdump's subprocess sees it. In the legacy four-sidecar shape
the env vars also lived in the egress service's compose env;
the shell script's export is additionally defensive.

Tests:
- All 44 existing TestCompose cases pass unchanged (flag off →
  legacy shape).
- 20 new TestSidecarBundleShape cases assert on the bundle's
  services / aliases / env / volumes / depends_on under the
  flag.
- 8 new TestSidecarBundleFlag cases lock down the env-var
  parser (unset / 0 / false / no / off → disabled; everything
  else → enabled).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 00:43:08 -04:00
didericis 62109a1caf fix(sidecars): child death no longer tears down the bundle
test / unit (pull_request) Successful in 20s
test / integration (pull_request) Successful in 1m8s
Reverses chunk 1's "any unexpected child death tears down the
rest" policy. New behavior: a daemon dying is logged but does
NOT initiate shutdown — the surviving daemons keep running and
whatever the dead one served starts failing visibly on the
agent side. The supervisor exits only when (a) it receives
SIGTERM/SIGINT, or (b) every child has died on its own.

Eventual design is restart-the-dead-daemon plus a notification
to the supervise sidecar so the operator sees the event
explicitly; this commit ships only the "log and leave alone"
half. PRD 0024 open question 1 updated to reflect the new
intent.

Tests updated: replaced "crash propagates exit code via
auto-teardown" with three cases that exercise the new policy
(crash without shutdown leaves survivors up, crash-then-signal
surfaces the nonzero code, all-children-die-unattended still
converges the loop).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 00:19:50 -04:00
didericis 61f63684ac feat(sidecars): bundle image + Python init supervisor (PRD 0024 chunk 1)
test / unit (pull_request) Successful in 22s
test / integration (pull_request) Successful in 1m12s
New Dockerfile.sidecars multi-stage build: pulls the pinned
pipelock and gitleaks binaries into a mitmproxy-base final image,
installs git + openssh-client, and ships the project's egress
addon + supervise server alongside a stdlib-Python init at
/app/sidecar_init.py.

The init supervisor (claude_bottle/sidecar_init.py) is PID 1 in
the bundle. It spawns the daemons named in
CLAUDE_BOTTLE_SIDECAR_DAEMONS (or all four by default),
propagates SIGTERM/SIGINT to children with an 8s grace before
SIGKILL, and exits with the first-unexpected-child exit code so
a daemon crash tears down the bundle (per PRD 0024 open
question 1's default).

claude_bottle/egress_entrypoint.sh extracted verbatim from
Dockerfile.egress's prior inline sh -c so the supervisor can
call it as a normal child.

Tests:
- unit: _selected_daemons env-var subset behavior (7 cases),
  _Supervisor signal/exit-code semantics including SIGKILL
  escalation, and end-to-end main() via subprocess.
- integration: builds the image and probes that pipelock,
  gitleaks, mitmdump, and the supervise Python module are
  present + executable, plus a no-daemons-selected smoke test
  of the entrypoint wiring. Skipped under act_runner (200+MB
  base pulls + multi-stage build).

Renderer collapse and the deletion of Dockerfile.{egress,git-gate,
supervise} land in chunk 2 + 3.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 00:05:06 -04:00
didericis 23f50f7720 fix(pipelock): scan all request headers + fix attack-3 destination
test / unit (pull_request) Successful in 19s
test / integration (pull_request) Failing after 49s
Two related changes the PRD 0022 sandbox-escape test surfaced:

  1. `pipelock_build_config` now emits
     `request_body_scanning.scan_headers: true` and
     `header_mode: all`. Pipelock's default `header_mode:
     sensitive` only checks Authorization / Cookie / X-Api-Key
     / X-Token / Proxy-Authorization / X-Goog-Api-Key — an
     agent attempting exfil could trivially pick a
     non-sensitive header (`X-Custom: $SECRET`) and slip
     through. `all` closes the gap; pipelock caps it by the
     same max_body_bytes the body scan uses.

  2. Test 3 (HTTP exfil shapes) now targets
     raw.githubusercontent.com instead of api.anthropic.com.
     api.anthropic.com is in `DEFAULT_TLS_PASSTHROUGH` —
     pipelock can't MITM it because real LLM conversation
     bodies false-positive on DLP scanners (BIP-39 etc.). The
     trade-off is documented in `pipelock.DEFAULT_TLS_PASSTHROUGH`;
     the test now exercises a host where the sandbox is
     actually supposed to block.

All 5 sandbox-escape attacks now produce HTTP 403 with the
expected sandbox marker (`egress:`, `pipelock`, or `blocked:`):

  - Attack 1 (non-allowlisted host)        ✓ egress
  - Attack 2 (non-allowlisted IP + spoof)  ✓ egress
  - Attack 3a (URL path)                   ✓ pipelock DLP
  - Attack 3b (URL query)                  ✓ pipelock DLP
  - Attack 3c (request body)               ✓ pipelock DLP
  - Attack 3d (request header)             ✓ pipelock DLP (scan_headers)
  - Attack 4a (crafted subdomain)          ✓ egress
  - Attack 4b (direct dig @8.8.8.8)        ✓ network isolation
  - Attack 5 (README push, 3 secret shapes) ✓ gitleaks (pre-upstream)

489 unit tests pass (1 updated for the new request_body_scanning
shape). Full integration suite passes in ~6s.
2026-05-26 22:38:38 -04:00
didericis 3a7b7d054b feat(dashboard): auto-focus dashboard pane + proposals on new arrival
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m10s
When a fresh proposal arrives, the dashboard now also:

  - Runs `tmux select-pane -t \$TMUX_PANE` (the dashboard's own
    pane id, captured at startup) so tmux focus jumps to the
    dashboard from wherever the operator was (typically claude
    in the right pane).
  - Flips internal focus to PANE_PROPOSALS so j/k navigates the
    queued items immediately.
  - Lands the selected cursor on the first new proposal —
    proposals are sorted by arrival ascending, so the earliest
    new arrival in the batch gets the cursor.

Stacks with the bell + label highlight from the previous
commit. The operator gets:
  1. Audible bell (or tmux activity marker)
  2. Tmux focus on the dashboard pane
  3. Dashboard's internal focus on the proposals list
  4. Cursor on the actual new proposal
  5. Pane label flashing `(new!)` in bold green

— all without leaving the keyboard.
2026-05-26 16:04:23 -04:00
didericis 9ac05c1a63 feat(dashboard): highlight proposals pane + bell on new proposal
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m8s
When a fresh proposal lands in the supervise queue, the
dashboard:

  1. Rings the terminal bell via `curses.beep()` so tmux's
     `monitor-bell` (or the terminal's own bell-on-activity)
     surfaces a notice in the dashboard pane even when the
     operator is focused on claude in the right pane.
  2. Bolds + green-attrs the `proposals:` pane label and
     suffixes it with `(new!)` so a glance at the dashboard
     screen catches the alert at a glance.

The highlight tracks the existing per-row green-highlight
window (`_NEW_PROPOSAL_HIGHLIGHT_SEC`). The bell only fires for
NEWLY arrived proposals after the first tick — pre-existing
queue entries on dashboard startup don't ring.
2026-05-26 15:55:47 -04:00
didericis ac914b6cb9 feat(dashboard): focus right pane after new-agent bringup completes
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m10s
The new-agent (`n`) flow's tmux branch was leaving keyboard
focus in the dashboard pane after compose-up + provision
finished and claude landed in the right pane — same situation
as Enter re-attach before its `focus_right_pane` fix. The
operator just spun an agent up; they want to type at it.

Pass `focus_right_pane=True` to `_attach_in_tmux` from the
new-agent flow. `tmux select-pane` runs after the respawn.
2026-05-26 15:37:07 -04:00
didericis 1a1ba6abd5 fix(dashboard): fall back to fresh claude when --continue has no session
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m7s
`--continue` exits non-zero when an agent has been spun up but
never typed at — there's no transcript to resume. Re-attaching
to such an agent via Enter (tmux mode) was crashing the pane.

Wrap the resume invocation in `sh -c '<cmd> --continue || <cmd>'`
so a failed `--continue` cleanly falls through to a fresh
claude. The shell adds microseconds and the fallback only
kicks in when --continue would have failed anyway.

New `_build_resume_argv_with_fallback(bottle)` builds the
shell-wrapped docker exec argv with proper shlex quoting (so
paths-with-spaces in `--append-system-prompt-file` survive).
Only the tmux re-attach path uses it; first-attach + foreground
handoff are unchanged.

489 unit tests pass (4 new for the fallback builder).
2026-05-26 15:34:21 -04:00
didericis 7e20d75f00 feat(dashboard): focus right pane on Enter re-attach (in tmux)
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m8s
The Enter key on a focused agents-pane row is the operator's
explicit "I want to interact with this agent" signal — after
respawning the right pane with claude, move tmux's keyboard
focus to that pane so the operator can start typing
immediately. Without this, every Enter required a manual tmux
nav (C-b →) to actually use the session.

Mechanics:

  - `_attach_in_tmux` gains `focus_right_pane: bool = False`.
  - When True, runs `tmux select-pane -t <pane_id>` after the
    respawn.
  - `_attach_to_bottle` (the Enter handler's helper) passes
    True.
  - Other callers (new-agent flow, stop's auto-attach) leave
    it False so the operator stays in the dashboard for
    follow-up navigation.

`_tmux_select_pane` is a small subprocess wrapper, best-effort
on failure.
2026-05-26 15:25:22 -04:00
didericis 8d6e382af5 feat(dashboard): auto-focus next agent on stop, or close pane
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m5s
After `x` stops a dashboard-owned bottle, slide focus to the
next agent in the agents pane (the one filling the stopped
row, or the new last row if the stopped was last) and respawn
the right pane with that agent's claude session via `--continue`.
If no agents remain, close the right pane via `tmux kill-pane`.

Two new helpers:

  - `_tmux_close_right_pane(tmux_state)` — kills the tracked
    pane (if it exists) and clears pane_id / slug.
  - `_pick_next_after_stop(agents_before, selected_index,
    stopped_slug)` — pure chooser returning (new_index, agent)
    or None. Tested directly.

Outside tmux, only the selected_agent index slides; no
auto-attach (foreground handoff would take over the terminal,
disruptive). 485 unit tests pass (6 new for the pick helper).
2026-05-26 15:21:20 -04:00
didericis 9622bdc619 feat(dashboard): default focus to agents pane
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m9s
The dashboard is primarily an agent-management surface
(PRD 0020 + 0021); landing on the proposals pane was a holdover
from when proposals were the only thing the dashboard showed.
Default focus is now `PANE_AGENTS`, so j/k navigates the agents
list immediately on launch — the operator Tabs to proposals
when something queues. Focus choice still persists across
operations.
2026-05-26 15:16:06 -04:00
didericis 9646bc1c4c refactor(dashboard): extract _route_op_to_right_pane helper
test / unit (pull_request) Successful in 19s
test / integration (pull_request) Successful in 1m7s
Both `_new_agent_flow` (bringup) and `_stop_bottle_flow`
(teardown) were doing the same five-step dance: open the log
path, mkdir parents, empty the file, ensure the right pane is
tailing it, redirect fd 2 to the same file. Extract into a
context manager:

    with _route_op_to_right_pane(tmux_state, slug, log_name) as routed:
        if routed:
            <run op>

Yields True when routing succeeded (fd 2 redirected, pane
tailing), False on fallback conditions (not in tmux, no
tmux_state, or tmux failed to spawn a pane). The fallback
paths still differ between callers — bringup follows up
with `_attach_in_tmux`, teardown does the curses-endwin
compose-down — so the helper stops at "is stderr routed
or not" and lets callers branch from there. Net diff:
~60 lines deleted, the routing-to-right-pane concept now
lives in one place.
2026-05-26 15:13:20 -04:00
didericis 933d8cf6c3 feat(dashboard): route stop output into right tmux pane
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m7s
PRD 0021 follow-up. Mirrors the bringup-into-right-pane fix
on the explicit-stop path: when `\$TMUX` is set, the stop
flow respawns the right pane with `tail -F
state/<slug>/teardown.log` (via `_ensure_right_pane` —
reuses the existing right pane if it's the agent's claude
session) and redirects fd 2 to that log for the duration of
`capture_session_state` + `cm.__exit__`. compose-down +
network-remove messages stream into the right pane.

After `settle_state` removes the state dir, the tail keeps
its buffered output visible (tail -F handles file removal
gracefully); the next attach respawns the pane with claude.

Falls back to the existing curses-endwin path on tmux
failure, or when the dashboard isn't in tmux at all.
2026-05-26 15:08:49 -04:00
didericis e90d7dba76 fix(dashboard): repaint stdscr immediately after modal closes
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m7s
After the operator pressed `y` on the preflight modal (or
picked an agent in the picker), the modal's curses sub-window
stayed on screen until the dashboard's main loop ticked again
— which during a 5-10s launch made it look like the
confirmation never registered.

Add `_erase_modal` (touchwin + refresh on stdscr) and call it
at every exit from `_preflight_modal` and `_picker_modal`.
The pre-modal frame buffered on stdscr immediately overwrites
the sub-window's area; the launch proceeds with a clean
dashboard underneath.
2026-05-26 15:01:56 -04:00
didericis 0936c40428 fix(dashboard): reuse existing right pane on new-agent start
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m13s
PRD 0021 follow-up. The new-agent flow was calling a dedicated
`_tmux_split_pane_tail` that ALWAYS created a new pane —
so every `n` start spawned a fresh right pane next to any
existing one, accumulating panes instead of reusing them.

Replace with a generic `_ensure_right_pane(tmux_state, argv)`
that respawns the dashboard's tracked right pane if one is
alive, splits a new one only when none is tracked or the
tracked pane was closed. Both the new-agent tail-during-
bringup path AND the existing claude-attach path now route
through this helper.

Net effect: starting a second agent reuses the same right
pane — bringup tail replaces the prior claude session,
then claude (for the new agent) replaces the tail. Closing
the right pane manually via `C-b x` still triggers a fresh
split on the next attach.
2026-05-26 14:50:56 -04:00
didericis 83ec9669c9 feat(dashboard): route launch output into right tmux pane
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m8s
PRD 0021 follow-up. When starting a new agent via `n` while
in tmux, the dashboard now:

  1. Pre-creates the right pane with `tail -F
     state/<slug>/bringup.log`.
  2. Redirects fd 2 (stderr) to that log file via dup2 — affects
     both Python `info()` calls AND subprocess inheritors'
     stderr (docker compose up, network creates, provision).
  3. Runs `backend.launch().__enter__()` with the redirect in
     place; everything streams into the right pane via tail.
  4. Restores stderr.
  5. Respawns the right pane (tail → claude session).

Net effect: dashboard pane stays uncluttered during bringup,
and the operator watches the compose-up + provision output in
the same pane that's about to hold the claude session — no
visual handoff between "starting" and "started."

Curses never needs to come down on the tmux path (the pane is
already created in the dashboard's neighbor pane, and stderr
is redirected away from the terminal entirely).

If `_tmux_split_pane_tail` fails (tmux missing, server died),
falls through to the existing curses-endwin handoff so the
operator still gets a session.
2026-05-26 14:41:53 -04:00
didericis 2ba84c5ba0 feat(dashboard): stop hook clears tmux state + right-pane row marker
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m6s
PRD 0021 chunk 4 (final). Two adjustments to close the
split-pane loop:

1. `_stop_bottle_flow` clears `tmux_state['slug']` when the
   stopped bottle was the right-pane occupant. The pane itself
   stays in place (claude exits with "container not found");
   the operator presses Enter on a different agent to
   repurpose it via respawn-pane.
2. `_render` accepts `right_pane_slug` and marks the matching
   agents-pane row with a `*` prefix + A_BOLD (when it's not
   also the focused row — focused selection still wins for
   visibility). Gives the operator a clear visual link
   between which agent the dashboard says is "active right
   now" and which one is visible to their right.

Wired through `_main_loop`: passes `tmux_state` to
`_stop_bottle_flow` on `x`, and `tmux_state.get('slug')` to
`_render` on every tick.

479 unit tests pass (1 new for the tmux_state-preservation
on non-owned stop). PRD 0021 implementation complete pending
merge.
2026-05-26 14:29:59 -04:00
didericis 4991d5b3ee feat(dashboard): new-agent flow spawns into right tmux pane
PRD 0021 chunk 3. The `n` flow (PRD 0020 chunk 2) now routes
the first claude session of a freshly-started bottle into the
right tmux pane when `\$TMUX` is set — same `_attach_in_tmux`
state machine the Enter re-attach uses, just with
`resume=False` so claude starts fresh.

Outside tmux the existing foreground handoff is unchanged.

The compose-up phase (`backend.launch.__enter__`) still drops
curses for its stderr output; we restore curses BEFORE
spawning into the right pane so the dashboard re-renders
alongside the new claude session instead of waiting for
attach to return.
2026-05-26 14:27:37 -04:00
didericis 9944878277 feat(dashboard): tmux split-pane helpers + Enter dispatch
PRD 0021 chunk 2. New tmux integration: when `\$TMUX` is set
and the operator presses Enter on a focused agent row, the
dashboard spawns / respawns the right pane with that bottle's
claude session instead of taking over the terminal via
curses.endwin.

Mechanics:

  - `_in_tmux()` — true when `\$TMUX` is set.
  - `_tmux_split_pane_create` — first attach: `tmux split-window
    -h -P -F '#{pane_id}'` opens a right pane and prints its id
    for tracking.
  - `_tmux_respawn_pane` — subsequent attaches: `tmux
    respawn-pane -k -t <id>` swaps the content without
    re-splitting.
  - `_tmux_pane_exists` — `tmux list-panes` check before
    respawn so a manually-closed pane gracefully falls back to
    a fresh split.
  - `_attach_in_tmux` — owns the create-or-respawn state
    machine, mutates `tmux_state` ({pane_id, slug}) so the
    main loop tracks the right-pane occupant.
  - `_attach_via_handoff` — the previous curses-endwin path,
    extracted as the fallback when tmux is missing or fails.
  - `_attach_to_bottle` dispatches: in tmux + state available →
    `_attach_in_tmux`; otherwise → handoff.

Main loop gets `tmux_state: dict = {"pane_id": None, "slug":
None}`. Chunks 3 + 4 wire it through the new-agent flow and
the stop hook.

`FileNotFoundError`-safe `subprocess.run` calls around every
tmux invocation — a missing tmux binary cleanly falls back to
the handoff for that keypress. 478 unit tests pass (10 new
for the pure argv builders + `_claude_runtime_args`).
2026-05-26 14:26:40 -04:00
didericis 2303cbc0be refactor(bottle): extract claude_docker_argv from exec_claude
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m10s
PRD 0021 chunk 1. The tmux split-pane helpers (chunk 2+) need
the same docker-exec argv that `exec_claude` builds — including
the `--append-system-prompt-file <path>` flag the bottle's
provisioner copies into place. Extract the argv construction
into a pure `claude_docker_argv(argv, *, tty)` method so both
foreground (`subprocess.run`) and tmux paths
(`tmux respawn-pane …`) build from the same source.

`exec_claude` becomes a one-liner that runs subprocess.run on
the argv. No behavior change; 472 unit tests pass (7 new for
the pure builder).
2026-05-26 14:21:04 -04:00
didericis ae6d11f09d fix(dashboard): use os._exit on quit so bottles survive the dashboard
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m9s
The `bottles` dict held `@contextmanager`-wrapped launch contexts.
On normal Python interpreter shutdown those context managers'
generators got GC'd, which raised GeneratorExit at the yield
point and ran the `finally` block — invoking each bottle's
teardown and tearing down the compose project. Net effect: `q`
WAS implicitly stopping every dashboard-launched bottle even
though the keypress handler just `return`'d.

`os._exit(0)` skips all Python-level cleanup (GC, atexit, etc.),
so the docker compose projects survive the dashboard exit
untouched. Curses gets explicit `endwin()` first because the
brutal exit skips curses.wrapper's normal terminal restoration.

Matches PRD 0020's resolved-question answer (`q` does NOT tear
down bottles; teardown is always explicit via `x` or
`./cli.py cleanup`).
2026-05-26 13:48:16 -04:00
didericis 14d5c78370 fix(attach): use --continue (no picker) instead of --resume
`--resume` alone surfaces claude's session picker even when only
one session exists. `--continue` jumps to the most recent session
non-interactively, which is the actual behavior the dashboard's
Enter re-attach wants for typical bottle-with-one-session cases.
2026-05-26 13:48:16 -04:00
didericis 832e92c7a6 feat(attach): pass --resume on dashboard re-attach
Re-entering a running bottle from the dashboard (Enter on the
agents pane) now invokes claude with `--resume` so the session
picks up the prior conversation history rather than starting a
fresh transcript. The first-attach paths (`./cli.py start` and
the dashboard's new-agent `n` flow) leave it off — the
transcript doesn't exist yet there.

`attach_claude` gains a `resume: bool = False` kwarg;
`_attach_to_bottle` in the dashboard passes `True`.
2026-05-26 13:48:16 -04:00
didericis 3ed3745982 feat(dashboard): x stops a dashboard-owned bottle (PRD 0020 chunk 4)
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m7s
Final PRD 0020 chunk. `x` on a focused agents-pane row tears
down the selected bottle if the dashboard owns it (started via
the chunk-2 `n` flow): pops `(cm, bottle, identity)` from the
main loop's bottles map, snapshots the transcript best-effort,
calls `cm.__exit__(None, None, None)` to drive the existing
compose-down + network-remove sequence, then `settle_state` to
honor any pre-existing preserve marker.

On a non-owned slug (discovered via `list_active_slugs` but not
in the dashboard's bottles dict — i.e., previous-dashboard or
external `./cli.py start` bottle), `x` is a no-op with a status
hint pointing at `./cli.py cleanup`. Matches the PRD's
cross-dashboard re-attach model: the dashboard can re-attach
either kind, but can only tear down its own.

The PRD's chunk 5 ("quit-cleanup") is satisfied by the existing
no-op behavior of `q` — per the user's resolved-question
answer, quit leaves bottles running unchanged. No code change
needed for that.

Footer surfaces `[x] stop`. 465 unit tests pass (1 new for the
non-owned no-op path; the owned path is integration territory
because it drives a real compose-down).
2026-05-26 03:46:57 -04:00
didericis 572306ddb6 feat(dashboard): Enter on agents pane re-attaches to bottle
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m11s
PRD 0020 chunk 3. Enter on a focused agents-pane row drops to a
claude session inside the selected bottle. Works for both
dashboard-owned bottles (looks up the stored Bottle handle in
the main loop's `bottles` dict) and externally-discovered ones
(synthesizes a DockerBottle from the slug → `claude-bottle-<slug>`
container name).

For the synthesized path, the `--append-system-prompt-file`
target resolves via metadata.json + the manifest's agent prompt
if both can be read; otherwise the re-attach runs without the
flag (claude defaults to no system prompt, the bottle's other
state is untouched).

Shares the curses.endwin → attach → refresh handoff with the
chunk-2 new-agent flow via a new `_attach_to_bottle` helper.
Footer reshuffled to advertise `[Enter] view/attach`. 464 unit
tests pass (3 new for `_bottle_for_slug`).
2026-05-26 03:39:58 -04:00
didericis 309ffaa4ab feat(dashboard): agent picker modal + new-agent (n) flow
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m7s
PRD 0020 chunk 2. Pressing `n` opens a modal that lists every
agent from the manifest with `(N running)` suffixes for ones
that already have bottles up. Type to filter (substring,
case-insensitive); j/k or arrows to navigate; Enter to confirm;
Esc clears the filter on first press, exits the picker on the
second.

On confirmation, the dashboard runs:

  - `prepare_with_preflight` from chunk 1 with curses-modal
    render + prompt callables (the preflight modal centers the
    plan summary + captures [y/N]).
  - `backend.launch(plan).__enter__()` — enters but doesn't bind
    the context to a `with`. The (cm, bottle, identity) tuple
    lands in the main loop's `bottles` dict keyed by slug.
  - `curses.endwin()` → `attach_claude(bottle)` → `stdscr.refresh()`
    handoff. The agent's claude session takes over the terminal;
    on exit the dashboard re-renders with the bottle now visible
    in the agents pane.

Crucially the context manager is held alive in `bottles` — never
`__exit__`'d at quit. Chunk 4 will wire `x` to that exit; for
now bottles started from the dashboard stay running until
explicit cleanup. Matches the PRD's "q does not tear down"
decision.

Footer surfaces `[n] new agent`. 461 unit tests pass (8 new for
`_filter_agents` and `_running_counts`).
2026-05-26 03:22:44 -04:00
didericis a56be6beb5 refactor(start): extract prepare_with_preflight + attach_claude
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m7s
PRD 0020 chunk 1. `cli/start.py`'s `_launch_bottle` did three
things in one function: prepare + preflight, attach claude, and
settle state on teardown. Split them so the dashboard (PRD 0020
chunk 2+) can reuse the prepare + attach pieces piecewise
without going through the CLI's one-shot orchestrator:

  - `prepare_with_preflight(spec, *, stage_dir, render_preflight,
    prompt_yes, dry_run)` — injects render + prompt callables so
    the CLI binds them to stderr/stdin while the dashboard binds
    them to a curses modal. Returns `(plan, identity)`; identity
    is set after `backend.prepare` returns so callers can reap
    the prepare-time state dir on abort via `settle_state` in
    their finally — preserving today's preflight-N cleanup.
  - `attach_claude(bottle, *, remote_control)` — runs claude
    inside the bottle and returns its exit code. The dashboard
    calls this from inside a `curses.endwin` → … →
    `stdscr.refresh()` handoff.
  - `capture_session_state` / `settle_state` lose their leading
    underscore; the dashboard will call them on
    session-end + explicit-stop respectively.

`_launch_bottle` becomes a thin orchestrator over those helpers.
No behavior change; all 453 unit tests pass and `./cli.py start
implementer --dry-run` produces identical preflight output.
2026-05-26 03:12:29 -04:00
didericis 3c2585cb98 fix(apply): write routes/pipelock yaml in place, not via rename
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m6s
PRD 0018 chunk 3's atomicity fix used write-temp-then-rename to
update bind-mounted config files. POSIX rename atomically swaps
the inode at the host path — but Docker single-file bind mounts
on Linux pin the source inode at mount time, so post-rename the
container's mount points at the now-orphaned old inode and never
sees the new content. The egress sidecar's SIGHUP-driven reload
re-reads the same stale file → "egress route updates aren't
updatable via the supervisor anymore".

Switch egress_apply + pipelock_apply to write in place (same
inode, truncated + rewritten). Lose file-level POSIX atomicity,
but:

  - egress: SIGHUP fires only AFTER the write returns; the
    addon's `load_routes` raises `ValueError` on a partial read
    and keeps the previous in-memory routes, so the in-process
    race window (already narrow) is non-disruptive.
  - pipelock: applies via `docker restart` rather than SIGHUP;
    restart serializes after the host write completes, so the
    container reads the fully-written file on next boot.

macOS Docker Desktop's file-sharing layer (virtiofs / osxfs)
silently re-resolves the path on rename, which is why this bug
didn't surface in dev tests on macOS. Linux native Docker is
the strict reading; the fix works on both.
2026-05-26 02:31:46 -04:00
didericis c9825cf701 refactor(egress): write routes.yaml as actual YAML, not JSON-in-yml
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m7s
`egress_render_routes` now emits hand-rolled YAML in the same style
as `pipelock_render_yaml`. The egress addon parses it via
`yaml_subset.parse_yaml_subset` — the same parser the manifest
loader + pipelock_apply use.

Why bother: routes.yaml is bind-mounted into the egress sidecar
AND surfaced to operators through `routes edit` (PRD 0019). JSON-
in-yml renders ugly in $EDITOR and signals "this is data" rather
than "this is config you can read at a glance". Real YAML reads
cleanly.

Mechanics:

  - `yaml_subset.py` drops its `claude_bottle.log` dependency.
    Errors now raise `YamlSubsetError` (a `ValueError`); the
    manifest loader + pipelock_apply catch it at the boundary
    and forward to `die` / `PipelockApplyError` so callers see
    the same behavior they did before.
  - `Dockerfile.egress` adds one COPY line for `yaml_subset.py`
    so it sits flat in `/app/` next to the addon. The addon
    uses an absolute-import-with-fallback shim so the same file
    works inside the container AND from the host's unit tests.
  - `egress_apply._merge_single_route` round-trips current
    routes.yaml through `parse_yaml_subset` + a new
    `_render_routes_payload` helper instead of `json.loads` +
    `json.dumps`.

End-to-end: rebuilt the egress image, ran `./cli.py start` to a
full bring-up, confirmed the addon's boot log shows `egress:
loaded 9 route(s)` — i.e., the YAML parses inside the container.
453 unit + 3 integration tests pass.
2026-05-26 02:17:42 -04:00
didericis 7b29c81f27 feat(dashboard): agent-scoped e/p, drop discover-and-prompt path
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m6s
PRD 0019 chunk 4 (final). The `e` (routes edit) and `p` (pipelock
edit) keys now require an agent selection in the agents pane.
Pressing them with the proposals pane focused, with no active
agents, or with an out-of-range selection is a no-op with a
status hint ("no agent selected; Tab into the agents pane first").

The discover-and-prompt scaffolding inside
`_operator_edit_routes_flow` / `_operator_edit_allowlist_flow` /
`_operator_edit_flow` is gone. The flows now take an `ActiveAgent`
+ required-service name; they refuse with a clear message when
the bottle lacks the requested sidecar (e.g., `routes edit`
against a bottle with no `bottle.egress.routes` declared). The
`discover_egress_slugs` + `discover_pipelock_slugs` +
`_discover_active_with_service` helpers come out — they had no
remaining callers.

Footer now reads `[e/p] edit selected agent`.
2026-05-26 01:50:28 -04:00
didericis 0abffc4d90 feat(dashboard): Tab toggle + per-pane selection state
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m4s
PRD 0019 chunk 3. The TUI now has two focusable panes — proposals
and agents — and `Tab` toggles which one the `j/k`/arrow keys
move through.

Each pane keeps its own selection index. Switching panes doesn't
lose the position in the other; the cursor (`>` + reverse-video
row) appears only in the focused pane. The label line on each
pane shows "(focused)" when active.

Footer reshuffled: `[Tab] switch pane  [j/k] move  [Enter] view
[a/m/r] proposal  [e/p] edit  [q] quit`. When the agents pane is
focused and there's no status message to display, the idle
status line surfaces the currently-selected agent (or "[no
active agents]" / "[no agent selected]" fallbacks) so the
operator knows what an agent-scoped edit verb will target after
chunk 4 wires them up.

Proposal action keys (a/m/r/Enter) are gated on the proposals
pane being focused — pressing them with the agents pane focused
is a no-op. e/p still use the global discover-and-prompt flow
for one more chunk; chunk 4 swaps them to read the agents-pane
selection.
2026-05-26 01:37:23 -04:00
didericis cfd8f269ba feat(dashboard): render active agents pane below proposals
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m4s
PRD 0019 chunk 2. The TUI's main render now draws two panes:
proposals on top (existing), active agents on the bottom (new).
Header counts both totals. The agents pane refreshes on the
same 1s tick — agents starting/stopping reflect without
operator action.

Each agent row shows slug, agent name, started-time (HH:MM:SS
of the metadata.json timestamp), and the bracketed list of
sidecars currently up. The `agent` service is filtered out of
the displayed list — it's always present so it'd be noise; the
sidecars are the differentiator. A bottle whose only running
service is `agent` (sidecars still warming up) renders as
`(starting)`.

No selection model yet — that's chunk 3. The cursor stays in
the proposals pane; `j/k`/arrow nav and the proposal action
keys are unchanged.
2026-05-26 01:23:59 -04:00
didericis 6e4a9f606f feat(dashboard): discover_active_agents helper + ActiveAgent dataclass
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m6s
PRD 0019 chunk 1. New `discover_active_agents()` in dashboard.py
returns one `ActiveAgent(slug, agent_name, started_at, services)`
per currently-running compose project:

  - Slugs come from `list_active_slugs()` (chunk-5 shared helper).
  - The service set per project comes from ONE label-filtered
    `docker ps` call (PRD open question #1: avoids N per-bottle
    `compose ps` invocations on each 1s refresh tick).
  - agent_name + started_at come from each bottle's
    metadata.json; "?" / "" fallbacks when the file is missing
    so the row renders rather than vanishes.

Not wired into the TUI yet — chunk 2 renders the agents pane.
The parser (`_parse_services_by_project`) is split out as a pure
function so the conditional-input shape can be unit-tested
without docker.
2026-05-26 01:11:54 -04:00
didericis 1fa3745832 refactor(dashboard): discover via docker compose ls
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m8s
PRD 0018 chunk 5. The dashboard's operator-edit verbs
(`routes edit`, `pipelock edit`) enumerated running sidecars
via `docker ps --filter name=...` prefix scans. Switch to
`docker compose ls`-based discovery so the dashboard, cleanup
CLI, and launch step all agree on what's running.

Mechanics:

  - `claude_bottle/backend/docker/compose.py` grows three shared
    helpers: `list_compose_projects` (the JSON parse moved out
    of cleanup), `slug_from_compose_project` (inverse of
    `compose_project_name`), and `list_active_slugs` (sugar over
    the first two for the common "what's running?" question).
  - cleanup.py drops its private `_list_compose_projects` +
    `_PROJECT_PREFIX` in favor of the shared ones; `list_active`
    simplifies (one compose-ls call, not two).
  - dashboard.py's `_discover_sidecar_slugs` becomes
    `_discover_active_with_service`: cross-references the active
    slug list with a label-filtered `docker ps` so only bottles
    whose given service container is actually up surface in the
    edit menu. Bottles without an egress sidecar (no
    bottle.egress.routes) no longer appear for `routes edit`.

3 new unit tests cover the slug ↔ compose-project naming
contract; manual probe with a fake compose project confirms
both `discover_egress_slugs` and `discover_pipelock_slugs`
return the expected slug.
2026-05-26 00:14:16 -04:00
didericis aee249f119 refactor(cleanup): compose-ls driven, plus orphan state-dir reaping
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m9s
PRD 0018 chunk 4. `claude-bottle cleanup` now derives its work
from `docker compose ls --all --format json`, filtered to projects
whose name starts with `claude-bottle-`. Per project: one `compose
down --volumes` removes the containers + the compose-managed
networks atomically.

The plan also enumerates three fallback buckets:

  - Stray containers — `claude-bottle-*` containers with no
    `com.docker.compose.project` label (left over from pre-compose
    code paths). Cleared via `docker rm -f`.
  - Stray networks — `claude-bottle-*` networks with no compose
    project label. Cleared via `docker network rm`.
  - Orphan state dirs — per-bottle `~/.claude-bottle/state/<id>/`
    dirs with no live project AND no `.preserve` marker. The
    `.preserve` marker (capability-block or auto-preserve-on-crash)
    explicitly opts-out of reaping; manual `rm -rf` is the only
    path for preserved state.

cli/cleanup.py collapses to a single y/N prompt — backend.prepare_cleanup
returns everything in one plan, backend.cleanup processes everything,
no more double-prompt for state. The CLI-side state-dir enumeration
+ `_state_summary` flags from PR #25 are gone; the backend's
orphan-detection rules subsume them.
2026-05-25 23:48:02 -04:00
didericis f1c5816d1f refactor(compose): drop pre-create networks + pipelock CIDR allowlist
PRD 0018 chunk 4 spike: empirically verified that pipelock's SSRF
guard checks proxied-request destinations (e.g. api.anthropic.com →
public IP) and not source IPs of incoming connections. The
bottle's own internal CIDR was being added to ssrf.ip_allowlist
defensively, but that defense isn't load-bearing — direct pipelock
probe (`curl --proxy http://pipelock https://api.anthropic.com/`)
returns 404 from upstream rather than blocking on SSRF.

So:

  - Networks become compose-managed (`internal: true` on the
    internal network; the egress one is a normal user-defined
    bridge). Compose creates + removes them via up/down.
  - launch.py drops the `docker network create` + `network_inspect_cidr`
    + pipelock yaml re-render dance.
  - The pre-create/external scaffolding from chunk 3 goes with it.

End-to-end `./cli.py start` still works; cleanup leaves no
orphans. If real-world use surfaces an SSRF block we hadn't
predicted, the allowlist can come back via subnet-pinning rather
than pre-create.
2026-05-25 23:48:02 -04:00
didericis cefdc8c6e9 feat(launch): switch start to docker compose project per bottle
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m5s
PRD 0018 chunk 3. Each instance is now one `docker compose` project:

  - launch.py renders the compose spec via chunk-1's
    bottle_plan_to_compose, writes it to state/<slug>/docker-compose.yml,
    `docker compose up -d`s, and (on teardown) dumps
    `docker compose logs --no-color --timestamps` to
    state/<slug>/compose.log before `docker compose down`.
  - Networks are pre-created (`docker network create --internal` +
    user-defined bridge) so pipelock yaml can know the internal CIDR
    before compose-up. Compose references them with `external: true`;
    the launch step's ExitStack still owns network removal.
  - Agent still runs `sleep infinity`; claude reaches it via
    `docker exec -it` exactly like before (per the PRD's resolved
    TTY question).
  - metadata.json grows a `compose_project` field so dashboard /
    cleanup tooling can derive compose invocations without
    re-deriving the slug.

Security follow-ups from chunk-2 review:

  (b) CA private keys: pipelock + egress ca-key.pem land at 0o600
      explicitly. The mitmproxy cert+key concat stays 0o644 because
      the egress container's uid-1000 user reads it through the
      bind mount; parent dir at 0o700 still restricts host-side
      reach.
  (c) Apply atomicity: egress_apply + pipelock_apply switch from
      `docker cp` to host-side write-temp-then-rename on the
      bind-mount source. POSIX rename is atomic on the same
      filesystem, so a sidecar SIGHUP racing the apply can't see
      a half-written routes.yaml / pipelock.yaml.

Per-sidecar Docker{Sidecar}.start/stop methods stay in place — the
integration test suite drives them directly to validate each image
in isolation, which is still useful. launch.py no longer calls
them; a follow-up chunk can prune if the integration tests move to
the compose lifecycle.

git-gate entrypoint's chmod 600 on the keyfile + known_hosts now
tolerates EROFS (`|| true`) — the host SSH key is already 0600
(SSH refuses to load otherwise), so the inside-container chmod
was already a no-op in the docker-cp path and now just needs to
not error on the read-only bind mount.

422 unit tests pass; supervise integration test passes; end-to-end
`./cli.py start implementer` brings up the project, attaches,
captures full merged logs on teardown, and reaps all containers +
networks.
2026-05-25 23:16:40 -04:00
didericis cd82a48399 refactor(state): write prepare-time scratch files under state/<slug>/
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m5s
PRD 0018 chunk 2. Each sidecar's prepare-time output (pipelock yaml +
CAs, egress routes.yaml + CAs, git-gate entrypoint + hooks, supervise
current-config, agent env + prompt) now lands in
~/.claude-bottle/state/<slug>/<service>/ instead of an ephemeral
mktemp dir. The state subdirs become the stable bind-mount sources
that chunk 3's docker compose project will reference.

The SDK launch path is unchanged — `docker cp` still copies from the
plan-held paths into containers, just from new locations. start.py's
session-end cleanup is now in `finally`, which also reaps state dirs
left behind by dry-run / preflight-N / prepare-exception paths
(previously only the post-launch path settled state).
2026-05-25 22:53:47 -04:00
didericis 4760a09263 feat(compose): pure renderer for bottle plan -> compose dict
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m5s
PRD 0018 chunk 1. New module `claude_bottle/backend/docker/compose.py`
exposing `bottle_plan_to_compose(plan) -> dict` — a pure function that
translates a fully-resolved DockerBottlePlan into a Compose v2 spec.

Not wired in yet. Tests cover the conditional-service matrix (git
on/off × egress on/off × supervise on/off) plus per-service shape
(images vs builds, network aliases, bind mounts, env vars, depends_on).
2026-05-25 22:28:50 -04:00
didericis 1e5b0dcfca refactor: rename egress-proxy → egress everywhere
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m10s
The manifest key is `egress:` now; finish the rename so the rest of
the codebase matches. Files (Dockerfile.egress, claude_bottle/egress.py
etc.), classes (Egress, EgressConfig, EgressRoute, EgressPlan,
DockerEgress), constants (EGRESS_HOSTNAME, EGRESS_ROUTES, ...),
container name prefix (claude-bottle-egress-*), docker network alias
(egress), the introspection host (_egress.local), the MCP tool IDs
(egress-block, list-egress-routes), and the preflight label all drop
the `-proxy` suffix.
2026-05-25 21:59:47 -04:00
didericis 14c8a51c16 refactor(manifest): rename egress_proxy key to egress
test / unit (pull_request) Successful in 16s
test / integration (pull_request) Successful in 1m4s
Now that `bottle.egress` (the old allowlist/dlp_action block) is
gone, the longer `egress_proxy:` disambiguator isn't needed. The
manifest field reads more naturally as just `egress:` with the
same nested `routes: [...]` shape.

Renamed:
  - Manifest YAML key:    `egress_proxy:` → `egress:`
  - Bottle dataclass attr: `bottle.egress_proxy` → `bottle.egress`
  - `_BOTTLE_KEYS` entry, schema docstring, and all
    user-facing error message labels (`egress.routes[N]`,
    `egress has unknown key …`, etc.).

Kept (these refer to the egress-proxy SIDECAR, not the manifest
field):
  - File names: `egress_proxy.py`, `egress_proxy_apply.py`,
    `egress_proxy_addon.py`, `egress_proxy_addon_core.py`.
  - Class names: `EgressProxyConfig`, `EgressProxyRoute`,
    `EgressProxyPlan`, `EgressProxy`, `DockerEgressProxy`.
  - Helper names: `egress_proxy_manifest_routes`,
    `egress_proxy_routes_for_bottle`,
    `egress_proxy_token_env_map`, etc.
  - Constants: `EGRESS_PROXY_HOSTNAME`, `EGRESS_PROXY_ROLES`,
    `EGRESS_PROXY_AUTH_SCHEMES`, `EGRESS_PROXY_FORWARD_PROXY`,
    `EGRESS_PROXY_INTROSPECT_URL`, `EGRESS_PROXY_PORT`, etc.
  - Container name prefix `claude-bottle-egress-proxy-*`, the
    `egress-proxy` docker network alias, the
    `egress-proxy-block` + `list-egress-proxy-routes` MCP tool
    IDs, the `egress-proxy` audit-log component label.

Local bottle migrated (`~/.claude-bottle/bottles/dev.md` already
updated). The legacy `egress_proxy` key isn't surfaced anywhere
anymore; the generic unknown-key validator catches typos with a
"did you mean: egress, env, git, supervise" hint.

409 unit + integration tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 21:25:51 -04:00