feat(launch): switch start to docker compose project per bottle #35

Merged
didericis merged 1 commits from chunk-3-compose-lifecycle into main 2026-05-25 23:47:48 -04:00
Owner

Summary

PRD 0018 chunk 3. Each claude-bottle start invocation is now one docker compose project per slug.

Flow in launch.py:

  1. Build the agent's base + derived image.
  2. Pre-create the two per-bottle networks (so pipelock yaml can embed the internal CIDR before compose-up).
  3. Mint per-bottle CAs into state/<slug>/{pipelock,egress}/.
  4. Re-render pipelock yaml with the now-known internal CIDR.
  5. Populate launch-time fields on every inner plan.
  6. Render the compose spec via chunk 1's bottle_plan_to_compose, write to state/<slug>/docker-compose.yml.
  7. docker compose up -d — token + OAuth values flow through subprocess env so environment: [NAME] bare-name entries inherit without rendering values into the file.
  8. Provision (CA install, prompt, skills, git, supervise config) — unchanged, uses docker exec.
  9. Yield a DockerBottle. exec_claude still runs docker exec -it (resolved TTY question).

Teardown:

  • Dump docker compose logs --no-color --timestamps to state/<slug>/compose.log (best-effort).
  • docker compose down for the project.
  • Remove the two pre-created networks.

metadata.json grows a compose_project field so dashboard / cleanup / resume tooling can derive docker compose -p <project> ... invocations without re-deriving the slug.

Security follow-ups from chunk-2 review

(b) CA private keys at 0o600. pipelock + egress ca-key.pem land at 0o600 explicitly. The mitmproxy cert+key concat (mitmproxy-ca.pem) stays 0o644 because the egress container's uid-1000 user reads it through the bind mount; parent dir at 0o700 still restricts host-side reach.

(c) Apply atomicity. egress_apply + pipelock_apply switch from docker cp to host-side write-temp-then-rename on the bind-mount source. POSIX rename is atomic on the same filesystem, so a sidecar SIGHUP racing the apply can't see a half-written routes.yaml / pipelock.yaml.

Per-sidecar start/stop

The PRD planned to delete these in this chunk; in practice the integration test suite drives them directly to validate each sidecar image in isolation, which is still useful coverage. launch.py no longer calls them — they're test utilities now. A follow-up chunk can prune if the integration tests move to the compose lifecycle.

git-gate chmod fix

The entrypoint's chmod 600 on the keyfile + known_hosts now tolerates EROFS via || true. SSH already refuses to load keys at anything other than 0600 on the host, so the inside-container chmod was already a no-op in the docker-cp path; it just needs to not error on the read-only bind mount.

Status

  • 422 unit tests pass
  • Supervise integration test passes
  • End-to-end ./cli.py start implementer brings up the project, attaches, captures full merged logs on teardown to state/<slug>/compose.log, and reaps all containers + networks
## Summary PRD 0018 chunk 3. Each `claude-bottle start` invocation is now one `docker compose` project per slug. Flow in `launch.py`: 1. Build the agent's base + derived image. 2. Pre-create the two per-bottle networks (so pipelock yaml can embed the internal CIDR before compose-up). 3. Mint per-bottle CAs into `state/<slug>/{pipelock,egress}/`. 4. Re-render pipelock yaml with the now-known internal CIDR. 5. Populate launch-time fields on every inner plan. 6. Render the compose spec via chunk 1's `bottle_plan_to_compose`, write to `state/<slug>/docker-compose.yml`. 7. `docker compose up -d` — token + OAuth values flow through subprocess env so `environment: [NAME]` bare-name entries inherit without rendering values into the file. 8. Provision (CA install, prompt, skills, git, supervise config) — unchanged, uses `docker exec`. 9. Yield a `DockerBottle`. `exec_claude` still runs `docker exec -it` (resolved TTY question). Teardown: - Dump `docker compose logs --no-color --timestamps` to `state/<slug>/compose.log` (best-effort). - `docker compose down` for the project. - Remove the two pre-created networks. `metadata.json` grows a `compose_project` field so dashboard / cleanup / resume tooling can derive `docker compose -p <project> ...` invocations without re-deriving the slug. ## Security follow-ups from chunk-2 review **(b) CA private keys at 0o600.** pipelock + egress `ca-key.pem` land at 0o600 explicitly. The mitmproxy cert+key concat (`mitmproxy-ca.pem`) stays 0o644 because the egress container's uid-1000 user reads it through the bind mount; parent dir at 0o700 still restricts host-side reach. **(c) Apply atomicity.** `egress_apply` + `pipelock_apply` switch from `docker cp` to host-side write-temp-then-rename on the bind-mount source. POSIX rename is atomic on the same filesystem, so a sidecar SIGHUP racing the apply can't see a half-written `routes.yaml` / `pipelock.yaml`. ## Per-sidecar `start`/`stop` The PRD planned to delete these in this chunk; in practice the integration test suite drives them directly to validate each sidecar image in isolation, which is still useful coverage. `launch.py` no longer calls them — they're test utilities now. A follow-up chunk can prune if the integration tests move to the compose lifecycle. ## git-gate chmod fix The entrypoint's `chmod 600` on the keyfile + known_hosts now tolerates EROFS via `|| true`. SSH already refuses to load keys at anything other than 0600 on the host, so the inside-container chmod was already a no-op in the docker-cp path; it just needs to not error on the read-only bind mount. ## Status - 422 unit tests pass - Supervise integration test passes - End-to-end `./cli.py start implementer` brings up the project, attaches, captures full merged logs on teardown to `state/<slug>/compose.log`, and reaps all containers + networks
didericis added 1 commit 2026-05-25 23:17:04 -04:00
feat(launch): switch start to docker compose project per bottle
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m5s
cefdc8c6e9
PRD 0018 chunk 3. Each instance is now one `docker compose` project:

  - launch.py renders the compose spec via chunk-1's
    bottle_plan_to_compose, writes it to state/<slug>/docker-compose.yml,
    `docker compose up -d`s, and (on teardown) dumps
    `docker compose logs --no-color --timestamps` to
    state/<slug>/compose.log before `docker compose down`.
  - Networks are pre-created (`docker network create --internal` +
    user-defined bridge) so pipelock yaml can know the internal CIDR
    before compose-up. Compose references them with `external: true`;
    the launch step's ExitStack still owns network removal.
  - Agent still runs `sleep infinity`; claude reaches it via
    `docker exec -it` exactly like before (per the PRD's resolved
    TTY question).
  - metadata.json grows a `compose_project` field so dashboard /
    cleanup tooling can derive compose invocations without
    re-deriving the slug.

Security follow-ups from chunk-2 review:

  (b) CA private keys: pipelock + egress ca-key.pem land at 0o600
      explicitly. The mitmproxy cert+key concat stays 0o644 because
      the egress container's uid-1000 user reads it through the
      bind mount; parent dir at 0o700 still restricts host-side
      reach.
  (c) Apply atomicity: egress_apply + pipelock_apply switch from
      `docker cp` to host-side write-temp-then-rename on the
      bind-mount source. POSIX rename is atomic on the same
      filesystem, so a sidecar SIGHUP racing the apply can't see
      a half-written routes.yaml / pipelock.yaml.

Per-sidecar Docker{Sidecar}.start/stop methods stay in place — the
integration test suite drives them directly to validate each image
in isolation, which is still useful. launch.py no longer calls
them; a follow-up chunk can prune if the integration tests move to
the compose lifecycle.

git-gate entrypoint's chmod 600 on the keyfile + known_hosts now
tolerates EROFS (`|| true`) — the host SSH key is already 0600
(SSH refuses to load otherwise), so the inside-container chmod
was already a no-op in the docker-cp path and now just needs to
not error on the read-only bind mount.

422 unit tests pass; supervise integration test passes; end-to-end
`./cli.py start implementer` brings up the project, attaches,
captures full merged logs on teardown, and reaps all containers +
networks.
didericis merged commit 6927a7ba4b into main 2026-05-25 23:47:48 -04:00
Sign in to join this conversation.