feat(launch): switch start to docker compose project per bottle

PRD 0018 chunk 3. Each instance is now one `docker compose` project: - launch.py renders the compose spec via chunk-1's bottle_plan_to_compose, writes it to state/<slug>/docker-compose.yml, `docker compose up -d`s, and (on teardown) dumps `docker compose logs --no-color --timestamps` to state/<slug>/compose.log before `docker compose down`. - Networks are pre-created (`docker network create --internal` + user-defined bridge) so pipelock yaml can know the internal CIDR before compose-up. Compose references them with `external: true`; the launch step's ExitStack still owns network removal. - Agent still runs `sleep infinity`; claude reaches it via `docker exec -it` exactly like before (per the PRD's resolved TTY question). - metadata.json grows a `compose_project` field so dashboard / cleanup tooling can derive compose invocations without re-deriving the slug. Security follow-ups from chunk-2 review: (b) CA private keys: pipelock + egress ca-key.pem land at 0o600 explicitly. The mitmproxy cert+key concat stays 0o644 because the egress container's uid-1000 user reads it through the bind mount; parent dir at 0o700 still restricts host-side reach. (c) Apply atomicity: egress_apply + pipelock_apply switch from `docker cp` to host-side write-temp-then-rename on the bind-mount source. POSIX rename is atomic on the same filesystem, so a sidecar SIGHUP racing the apply can't see a half-written routes.yaml / pipelock.yaml. Per-sidecar Docker{Sidecar}.start/stop methods stay in place — the integration test suite drives them directly to validate each image in isolation, which is still useful. launch.py no longer calls them; a follow-up chunk can prune if the integration tests move to the compose lifecycle. git-gate entrypoint's chmod 600 on the keyfile + known_hosts now tolerates EROFS (`|| true`) — the host SSH key is already 0600 (SSH refuses to load otherwise), so the inside-container chmod was already a no-op in the docker-cp path and now just needs to not error on the read-only bind mount. 422 unit tests pass; supervise integration test passes; end-to-end `./cli.py start implementer` brings up the project, attaches, captures full merged logs on teardown, and reaps all containers + networks.
2026-05-25 23:16:40 -04:00
parent b9f6889d09
commit cefdc8c6e9
11 changed files with 362 additions and 302 deletions
@@ -114,6 +114,10 @@ def egress_tls_init(stage_dir: Path) -> tuple[Path, Path]:
    )
    if keygen.returncode != 0:
        die(f"egress ca keygen failed: {keygen.stderr.strip()}")
+    # Standalone private key — never docker-cp'd, never bind-mounted
+    # (mitmproxy reads the cert+key concat below). Lock to owner-
+    # only so it doesn't sit at the default umask on disk.
+    key_path.chmod(0o600)

    # `subjectKeyIdentifier=hash` makes openssl compute the SKI as
    # SHA-1(pubkey), matching how mitmproxy computes the AKI on the
@@ -149,6 +153,12 @@ def egress_tls_init(stage_dir: Path) -> tuple[Path, Path]:

    cert_path.chmod(0o644)
    # mitmproxy reads cert + key from a single concatenated PEM file.
+    # This file IS bind-mounted into the egress container (chunk 3+),
+    # where mitmproxy runs as uid 1000 — so the host file has to be
+    # world-readable for the container's user to read it through the
+    # mount. Owner-only mode on the parent dir (state/<slug>/, under
+    # ~/.claude-bottle which inherits ~'s 0o700) is what actually
+    # restricts who can reach this file on the host.
    mitm = work / "mitmproxy-ca.pem"
    mitm.write_bytes(cert_path.read_bytes() + key_path.read_bytes())
    mitm.chmod(0o644)