diff --git a/docs/prds/0024-consolidate-sidecar-bundle.md b/docs/prds/0024-consolidate-sidecar-bundle.md new file mode 100644 index 0000000..8f140bb --- /dev/null +++ b/docs/prds/0024-consolidate-sidecar-bundle.md @@ -0,0 +1,455 @@ +# PRD 0024: Consolidate per-bottle sidecars into a single bundle + +- **Status:** Draft +- **Author:** didericis +- **Created:** 2026-05-26 + +## Summary + +Replace the four per-bottle sidecar containers in the Docker +backend (pipelock, egress, git-gate, supervise) with a single +container image — `claude-bottle-sidecars` — that runs all four +daemons under a small stdlib-Python init supervisor. Same +per-bottle lifetime, same scope, fewer containers per bottle, +one Dockerfile to maintain instead of three. Outcome: the +docker backend's compose file goes from five services +(`agent`, `pipelock`, `egress`, `git-gate`, `supervise`) to +two (`agent`, `sidecars`); the smolmachines backend defined in +PRD 0023 reuses the same image as its sole sidecar container. + +## Problem + +The four sidecars are tightly coupled in lifetime and scope: + +- All four start when a bottle starts and stop when it stops. + There is no scenario where one runs without the others. +- `egress` is `pipelock`'s upstream over the internal network — + nothing on the agent side ever addresses egress directly. Its + separateness today is a docker-compose-ism: one Dockerfile per + service was the easiest way to ship the chunk-by-chunk + rollouts of PRDs 0001, 0008, 0013, and 0017. +- `git-gate` and `supervise` run their own daemons but with the + same "started + stopped with the bottle" lifecycle. + +Three concrete costs of keeping them split: + +1. **Compose-file surface area.** Five `services:` entries per + bottle. The renderer in `backend/docker/compose.py` has to + know each one's image, env, healthcheck, port-mapping, + dependency wiring (`depends_on`), and CA / config bind mounts. + That's a lot of moving parts for what is really one logical + sidecar. +2. **Cold start latency.** Docker creates and starts four + containers in dependency order even for a trivial agent run. + Each container costs ~50-100ms of compose orchestration even + when the image is cached. +3. **Cross-backend duplication.** PRD 0023's smolmachines + backend would otherwise need its own four-process supervisor + on the host side. A shared bundle image collapses both + backends onto the same sidecar primitive. + +This PRD is also the prerequisite for chunk 3 of PRD 0023. + +## Goals / Success Criteria + +The feature works when all of the following are observable: + +- `cli.py start ` on the Docker backend produces a + compose project with exactly two services (`agent`, + `sidecars`) and three published agent-facing ports + (HTTPS_PROXY, git-gate, supervise) on the `sidecars` + container. +- All existing integration tests pass with no behavior change + visible to the agent. The four daemons inside the bundle + speak the same protocols on the same well-known in-container + ports as before; only the container hostname changes. +- The sandbox-escape suite from PRD 0022 stays green. +- `docker logs claude-bottle-sidecars-` shows interleaved + output from all four daemons, prefixed by the supervisor with + the daemon name. Each daemon's exit propagates through the + supervisor to the container's exit code. +- Sending SIGTERM to the bundle container (the docker stop path) + shuts down all four daemons cleanly within the existing + compose stop-grace timeout (10s). + +The feature is **done** when all of the following ship: + +- A new `Dockerfile.sidecars` (multi-stage) that: + - Copies the `pipelock` binary from the upstream pipelock + image (currently `ghcr.io/luckypipewrench/pipelock` pinned + by digest in `claude_bottle/backend/docker/pipelock.py`). + - Copies the `gitleaks` binary from `zricethezav/gitleaks` + (currently pinned by digest in `Dockerfile.git-gate`). + - Installs `mitmdump` (via `pip install mitmproxy==`). + - Installs the system deps `git-daemon` + `openssh-client` + that git-gate needs. + - Copies the existing addon + server Python from + `claude_bottle/egress_addon.py`, `egress_addon_core.py`, + `yaml_subset.py`, `supervise.py`, `supervise_server.py`. + - Drops in a new `claude_bottle/sidecar_init.py` (stdlib + Python) as the container's `ENTRYPOINT`. +- A new `claude_bottle/sidecar_init.py` — a small Python init + supervisor that: + - Reads which daemons to run from env (defaults: all four). + - Spawns each as a `subprocess.Popen` with prefixed + line-buffered output. + - Catches `SIGTERM` / `SIGINT`, propagates to each child, + `waitpid()`s with a per-child grace deadline, escalates to + `SIGKILL` past the deadline. + - Exits with code 0 only if every child exited 0; otherwise + exits 1. (Or: any-child-died → tear down the rest and exit + that child's code — see open question 2.) +- `claude_bottle/backend/docker/compose.py` renderer updated to + emit one `sidecars` service in place of the four. The four + in-container ports (8888 / 9099 / 9418 / 9100, today) all + land on the same container; the agent-facing ports + (HTTPS_PROXY, git-gate-SSH, supervise-MCP) are published as + before, just from one container instead of three. +- `claude_bottle/backend/docker/{pipelock,egress,git_gate,supervise}.py` + collapsed: the platform-neutral pieces stay + (`PipelockProxy`, `Egress`, `GitGate`, `Supervise` ABCs and + their plans), the docker-specific subclasses lose their + per-container start/stop / image-build / healthcheck logic + and gain shared bundle-aware helpers. Container name helpers + (`pipelock_container_name(slug)` etc.) become a single + `sidecar_bundle_container_name(slug)`. +- `Dockerfile.egress`, `Dockerfile.git-gate`, and + `Dockerfile.supervise` deleted. The bundle is the only image. +- Tests: + - Unit: the compose renderer emits exactly two services and + one sidecars service has all three published ports. + - Unit: the sidecar-init supervisor propagates SIGTERM and + returns nonzero when a child crashes. + - Integration: existing PRD 0001 / 0008 / 0013 / 0017 + integration tests run against the bundle and pass. + - Integration: PRD 0022 sandbox-escape suite stays green. +- `CLAUDE.md` updated to describe the bundle and the + daemons-inside layout. + +## Non-goals + +- **No protocol changes between sidecars.** pipelock still + speaks the same HTTPS-proxy protocol on the same port; egress + is still pipelock's upstream; git-gate still listens on + git-daemon's port; supervise still serves the same MCP HTTP + endpoint. Only the container they run in changes. +- **No config-schema changes.** `pipelock.yaml`, + `routes.yaml`, the git-gate access-hook, and the supervise + queue path all stay where they are; the bundle just bind-mounts + them at the same in-container paths as before. +- **No host-bind-mount surgery.** Each daemon's existing bind + mounts (per-bottle CA paths, the supervise queue dir, the + git-gate creds dir) remain. The bundle aggregates them onto + one container. +- **No supervisord / s6 / runit.** A 50-line stdlib Python init + is the supervisor. Adding a new init system for this is more + weight than the problem deserves and conflicts with the + project's stdlib-first ethos. +- **No selective daemon disable surfaced to the manifest.** The + init understands "skip git-gate / supervise when the bottle + doesn't use them" via env vars set by the compose renderer, + but operators don't get a manifest knob — the existing + `bottle.git` / `bottle.supervise` flags continue to drive it. +- **No agent-image changes.** The agent container (PRD 0023's + microVM in the smolmachines case) is unaffected; this PRD is + strictly about consolidating the sidecar chain. + +## Scope + +### In scope + +- New `Dockerfile.sidecars` (multi-stage) bringing pipelock, + mitmproxy, gitleaks, git-daemon, openssh-client, and the + project's addon + server Python into one image. +- New `claude_bottle/sidecar_init.py` supervising the four + daemons. +- `backend/docker/compose.py` renderer collapse (five services + → two). +- `backend/docker/{pipelock,egress,git_gate,supervise}.py` + reshape: keep the abstract `Plan` / proxy classes; remove + per-container lifecycle code that compose-up no longer needs. +- Image name and tag pinning (env var override + default; see + open question 3). +- Test updates: unit and integration tests that probe the + four-container shape get rewritten against the one-container + shape. +- README + CLAUDE.md doc updates. + +### Out of scope + +- The smolmachines backend itself (PRD 0023). This PRD just + produces the image; PRD 0023 consumes it. +- Per-daemon resource limits (CPU / memory caps) on the bundle. + Today nothing in the project sets them; consolidation + doesn't change that. +- Healthcheck redesign. The agent's `depends_on: + service_healthy` against the bundle covers all four daemons; + defining a single bundle-level healthcheck that aggregates + the per-daemon readiness is open question 4. +- Multi-arch image builds (arm64 + amd64). The current + per-sidecar images are amd64-only or whatever their bases + ship; we keep that posture. + +## Proposed Design + +### Bundle image + +`Dockerfile.sidecars` is a four-stage multi-stage build, one +stage per source binary, plus a final stage that assembles them: + +```dockerfile +# Stage 1: pull pipelock binary +FROM ghcr.io/luckypipewrench/pipelock@sha256: AS pipelock-src +# pipelock binary is at /usr/local/bin/pipelock in this image. + +# Stage 2: pull gitleaks binary +FROM zricethezav/gitleaks@sha256: AS gitleaks-src +# gitleaks binary is at /usr/bin/gitleaks in this image. + +# Stage 3: mitmproxy base (already has Python + mitmdump installed) +FROM mitmproxy/mitmproxy:11.1.3 AS final +USER root + +# System deps for the git-gate daemon side +RUN apt-get update \ + && apt-get install -y --no-install-recommends \ + git git-daemon-run openssh-client ca-certificates \ + && rm -rf /var/lib/apt/lists/* + +# Drop in the project's Python addon + server code +COPY claude_bottle/egress_addon_core.py /app/egress_addon_core.py +COPY claude_bottle/egress_addon.py /app/egress_addon.py +COPY claude_bottle/yaml_subset.py /app/yaml_subset.py +COPY claude_bottle/supervise.py /app/supervise.py +COPY claude_bottle/supervise_server.py /app/supervise_server.py +COPY claude_bottle/sidecar_init.py /app/sidecar_init.py + +# Pull the standalone binaries into the final stage +COPY --from=pipelock-src /usr/local/bin/pipelock /usr/local/bin/pipelock +COPY --from=gitleaks-src /usr/bin/gitleaks /usr/bin/gitleaks + +# Layout the bundle uses at runtime — preserved verbatim from the +# four previous Dockerfiles so existing docker-cp paths still work. +RUN mkdir -p \ + /etc/pipelock \ + /etc/egress \ + /etc/git-gate \ + /git-gate/creds \ + /git \ + /run/supervise/queue \ + /home/mitmproxy/.mitmproxy + +EXPOSE 8888 9099 9418 9100 + +ENTRYPOINT ["python3", "/app/sidecar_init.py"] +``` + +The final stage starts from the mitmproxy image because +mitmproxy has the heaviest install footprint (Python + mitmdump ++ deps); copying the other two binaries in is cheaper than the +reverse. Pinning each base by digest is unchanged from the +existing Dockerfiles. + +### Init supervisor + +`claude_bottle/sidecar_init.py` (sketch — actual code lands as +part of implementation): + +```python +"""Per-bottle sidecar supervisor. + +Spawns the configured daemons, forwards SIGTERM/SIGINT, exits +with the first non-zero child code (or 0 if every child exited +cleanly during normal shutdown).""" + +DAEMONS = [ + ("egress", ["sh", "-c", EGRESS_ENTRYPOINT_SH]), + ("pipelock", ["/usr/local/bin/pipelock", "run", + "--config", "/etc/pipelock/pipelock.yaml"]), + ("git-gate", ["/git-gate-entrypoint.sh"]), + ("supervise", ["python3", "/app/supervise_server.py"]), +] + +# Order matters only for first-launch race-window reasons: +# egress starts first so pipelock's upstream connect succeeds +# during pipelock startup. git-gate and supervise are +# independent. +``` + +The env-driven daemon subset is the same handshake as today's +compose renderer: bottles without `git` skip git-gate, bottles +with `supervise: false` skip supervise. + +### Compose renderer collapse + +`bottle_plan_to_compose` emits one `sidecars` service in place +of the four. The service inherits the union of the four's +existing bind mounts; environment variables get prefixed by +daemon name where they clash (none clash today, but the renderer +becomes the central place to enforce that). Container hostname +becomes `sidecars` (or `claude-bottle-sidecars-` for the +externally-visible name). The agent service's HTTPS_PROXY and +git-gate URL move from per-sidecar hostnames to the single +`sidecars` hostname: + +```yaml +# Before (sketch — five services) +services: + agent: + environment: + HTTPS_PROXY: "http://pipelock:8888" + GIT_GATE_URL: "git://git-gate:9418/repo" + MCP_SUPERVISE_URL: "http://supervise:9100" + pipelock: { image: ghcr.io/luckypipewrench/pipelock:... } + egress: { image: claude-bottle-egress:latest } + git-gate: { image: claude-bottle-git-gate:latest } + supervise:{ image: claude-bottle-supervise:latest } + +# After (two services) +services: + agent: + environment: + HTTPS_PROXY: "http://sidecars:8888" + GIT_GATE_URL: "git://sidecars:9418/repo" + MCP_SUPERVISE_URL: "http://sidecars:9100" + sidecars: + image: claude-bottle-sidecars: + # union of the four prior services' volumes / env / ports +``` + +`depends_on` collapses: the agent depends on `sidecars` only. + +### Backend Python collapse + +The four `claude_bottle/backend/docker/.py` files keep +their platform-neutral abstractions (proxy/plan classes) but +shed the docker-container-lifecycle code that compose-up +already owns. Container-name helpers consolidate: + +```python +# was: +def pipelock_container_name(slug): ... +def egress_container_name(slug): ... +def git_gate_container_name(slug): ... +def supervise_container_name(slug): ... + +# becomes: +def sidecar_bundle_container_name(slug: str) -> str: + return f"claude-bottle-sidecars-{slug}" +``` + +Per-daemon "is the container up?" helpers used by orphan +cleanup converge on a single check against the bundle name. + +### External dependencies + +None new. The bundle build pulls the same upstream images we +already pull; the consolidation is a packaging change. + +### Migration + +This PRD's change is large but mechanical. A pre-merge dry-run: + +1. Land the bundle image build (`Dockerfile.sidecars` + + `sidecar_init.py`) without changing the renderer. + Confirm `docker build -f Dockerfile.sidecars .` succeeds + and the resulting container runs all four daemons. +2. Switch the renderer to emit the two-service shape behind an + env-var feature flag (e.g. + `CLAUDE_BOTTLE_SIDECAR_BUNDLE=1`). +3. Update integration tests in-place; flip the default once + green; delete the flag and the old Dockerfiles in a + follow-up commit on the same branch. + +The compose-per-instance work in PRD 0018 already separated +sidecar lifecycle from agent lifecycle, so this PRD is +materially a renderer + image-build change — not a backend +rewrite. + +## Sizing — into chunks + +1. **Bundle image + init supervisor.** Write `Dockerfile.sidecars` + and `sidecar_init.py`, ship them, add a unit test that + builds the image in CI and asserts the four daemons start. + No renderer change yet. +2. **Compose renderer collapse.** Update + `bottle_plan_to_compose` to emit two services. Feature flag + it via env var. Update unit tests to assert on both shapes + (flag on vs off) during the migration window. +3. **Backend Python collapse.** Trim the four docker + sidecar modules, consolidate container-name helpers, update + orphan-cleanup logic to look for the bundle by name. Delete + old Dockerfiles. +4. **Integration test sweep.** Bring every integration test + that probes a four-container shape (`pipelock_container_name`, + `egress_container_name`, etc.) onto the bundle. Confirm + PRD 0022 stays green. +5. **Docs + flag removal.** Flip the default, remove the + feature flag, update README + CLAUDE.md. + +## Open questions + +1. **Init failure semantics.** When one daemon crashes mid-run, + should the bundle exit (killing the bottle) or restart just + that daemon? Today, with four separate containers, docker + restarts the crashed one and the bottle stays up. Default + for this PRD: bundle exits on any child death; the bottle + tears down. Restart logic can land later if operators hit + it in practice. +2. **Exit-code propagation.** If multiple daemons die in quick + succession (likely under SIGTERM), which exit code wins? + First-to-die is simplest. Worst-case (highest nonzero + exit code) gives clearest signal in logs. Default to + first-to-die unless an operator scenario disagrees. +3. **Image pin policy.** Pin `claude-bottle-sidecars` by tag + (`:latest` rebuilt per-release) or by digest written into a + `CLAUDE_BOTTLE_SIDECAR_IMAGE` env var like the existing + `CLAUDE_BOTTLE_PIPELOCK_IMAGE`? Default to env-var override + + a documented tag; digest pinning is an operator opt-in. +4. **Healthcheck aggregation.** Today each sidecar service has + its own compose healthcheck and `agent.depends_on: + service_healthy: { pipelock: true, ... }`. With one + container, the bundle needs one healthcheck that returns + ready iff all daemons are listening. Cheapest: TCP probe on + pipelock's port + git-gate's port + supervise's port from + inside the container, scripted into a small `/app/healthcheck.sh`. + Resolve in chunk 1. +5. **Log interleaving + debuggability.** All four daemons' + stdout/stderr merge into one container log. The init + prefixes each line with the daemon name, but operators may + want per-daemon log files for easier triage. Default: no + per-daemon files in v1; revisit if debug-time pain shows up. +6. **Backwards compat for an installed-base test fixture.** + Some integration tests synthesize compose files by hand and + assert on per-sidecar container names. They'll need + touching in chunk 4. List them up front in the chunk-4 + commit so the diff isn't a surprise. + +## References + +- `Dockerfile.egress`, `Dockerfile.git-gate`, + `Dockerfile.supervise` — the three Dockerfiles this PRD + collapses into `Dockerfile.sidecars`. +- `claude_bottle/backend/docker/compose.py` — the renderer this + PRD slims down. +- `claude_bottle/backend/docker/pipelock.py` — current home of + `PIPELOCK_IMAGE` and the pinned digest the bundle's first + stage reuses. +- PRD 0017 + (`docs/prds/0017-egress-proxy-via-mitmproxy.md`) — defines + egress's role as pipelock's upstream; this PRD relies on + that being implementable over localhost just as easily as + over the internal docker network. +- PRD 0018 + (`docs/prds/0018-compose-per-instance.md`) — the + compose-per-instance refactor this PRD builds on. PRD 0018 + separated sidecar lifecycle from agent lifecycle, which is + what makes a single-bundle compose service a renderer-only + change instead of a backend rewrite. +- PRD 0022 + (`docs/prds/0022-sandbox-escape-integration-test.md`) — must + remain green through the migration. +- PRD 0023 + (`docs/prds/0023-smolmachines-backend.md`) — the second + consumer of this bundle; depends on this PRD's image being + available before its chunk 3.