docs(prd-0024): consolidate per-bottle sidecars into a single bundle #54

Merged
didericis merged 1 commits from prd-0024-consolidate-sidecar-bundle into main 2026-05-26 23:57:33 -04:00
@@ -0,0 +1,455 @@
# PRD 0024: Consolidate per-bottle sidecars into a single bundle
- **Status:** Draft
- **Author:** didericis
- **Created:** 2026-05-26
## Summary
Replace the four per-bottle sidecar containers in the Docker
backend (pipelock, egress, git-gate, supervise) with a single
container image — `claude-bottle-sidecars` — that runs all four
daemons under a small stdlib-Python init supervisor. Same
per-bottle lifetime, same scope, fewer containers per bottle,
one Dockerfile to maintain instead of three. Outcome: the
docker backend's compose file goes from five services
(`agent`, `pipelock`, `egress`, `git-gate`, `supervise`) to
two (`agent`, `sidecars`); the smolmachines backend defined in
PRD 0023 reuses the same image as its sole sidecar container.
## Problem
The four sidecars are tightly coupled in lifetime and scope:
- All four start when a bottle starts and stop when it stops.
There is no scenario where one runs without the others.
- `egress` is `pipelock`'s upstream over the internal network —
nothing on the agent side ever addresses egress directly. Its
separateness today is a docker-compose-ism: one Dockerfile per
service was the easiest way to ship the chunk-by-chunk
rollouts of PRDs 0001, 0008, 0013, and 0017.
- `git-gate` and `supervise` run their own daemons but with the
same "started + stopped with the bottle" lifecycle.
Three concrete costs of keeping them split:
1. **Compose-file surface area.** Five `services:` entries per
bottle. The renderer in `backend/docker/compose.py` has to
know each one's image, env, healthcheck, port-mapping,
dependency wiring (`depends_on`), and CA / config bind mounts.
That's a lot of moving parts for what is really one logical
sidecar.
2. **Cold start latency.** Docker creates and starts four
containers in dependency order even for a trivial agent run.
Each container costs ~50-100ms of compose orchestration even
when the image is cached.
3. **Cross-backend duplication.** PRD 0023's smolmachines
backend would otherwise need its own four-process supervisor
on the host side. A shared bundle image collapses both
backends onto the same sidecar primitive.
This PRD is also the prerequisite for chunk 3 of PRD 0023.
## Goals / Success Criteria
The feature works when all of the following are observable:
- `cli.py start <agent>` on the Docker backend produces a
compose project with exactly two services (`agent`,
`sidecars`) and three published agent-facing ports
(HTTPS_PROXY, git-gate, supervise) on the `sidecars`
container.
- All existing integration tests pass with no behavior change
visible to the agent. The four daemons inside the bundle
speak the same protocols on the same well-known in-container
ports as before; only the container hostname changes.
- The sandbox-escape suite from PRD 0022 stays green.
- `docker logs claude-bottle-sidecars-<slug>` shows interleaved
output from all four daemons, prefixed by the supervisor with
the daemon name. Each daemon's exit propagates through the
supervisor to the container's exit code.
- Sending SIGTERM to the bundle container (the docker stop path)
shuts down all four daemons cleanly within the existing
compose stop-grace timeout (10s).
The feature is **done** when all of the following ship:
- A new `Dockerfile.sidecars` (multi-stage) that:
- Copies the `pipelock` binary from the upstream pipelock
image (currently `ghcr.io/luckypipewrench/pipelock` pinned
by digest in `claude_bottle/backend/docker/pipelock.py`).
- Copies the `gitleaks` binary from `zricethezav/gitleaks`
(currently pinned by digest in `Dockerfile.git-gate`).
- Installs `mitmdump` (via `pip install mitmproxy==<pinned>`).
- Installs the system deps `git-daemon` + `openssh-client`
that git-gate needs.
- Copies the existing addon + server Python from
`claude_bottle/egress_addon.py`, `egress_addon_core.py`,
`yaml_subset.py`, `supervise.py`, `supervise_server.py`.
- Drops in a new `claude_bottle/sidecar_init.py` (stdlib
Python) as the container's `ENTRYPOINT`.
- A new `claude_bottle/sidecar_init.py` — a small Python init
supervisor that:
- Reads which daemons to run from env (defaults: all four).
- Spawns each as a `subprocess.Popen` with prefixed
line-buffered output.
- Catches `SIGTERM` / `SIGINT`, propagates to each child,
`waitpid()`s with a per-child grace deadline, escalates to
`SIGKILL` past the deadline.
- Exits with code 0 only if every child exited 0; otherwise
exits 1. (Or: any-child-died → tear down the rest and exit
that child's code — see open question 2.)
- `claude_bottle/backend/docker/compose.py` renderer updated to
emit one `sidecars` service in place of the four. The four
in-container ports (8888 / 9099 / 9418 / 9100, today) all
land on the same container; the agent-facing ports
(HTTPS_PROXY, git-gate-SSH, supervise-MCP) are published as
before, just from one container instead of three.
- `claude_bottle/backend/docker/{pipelock,egress,git_gate,supervise}.py`
collapsed: the platform-neutral pieces stay
(`PipelockProxy`, `Egress`, `GitGate`, `Supervise` ABCs and
their plans), the docker-specific subclasses lose their
per-container start/stop / image-build / healthcheck logic
and gain shared bundle-aware helpers. Container name helpers
(`pipelock_container_name(slug)` etc.) become a single
`sidecar_bundle_container_name(slug)`.
- `Dockerfile.egress`, `Dockerfile.git-gate`, and
`Dockerfile.supervise` deleted. The bundle is the only image.
- Tests:
- Unit: the compose renderer emits exactly two services and
one sidecars service has all three published ports.
- Unit: the sidecar-init supervisor propagates SIGTERM and
returns nonzero when a child crashes.
- Integration: existing PRD 0001 / 0008 / 0013 / 0017
integration tests run against the bundle and pass.
- Integration: PRD 0022 sandbox-escape suite stays green.
- `CLAUDE.md` updated to describe the bundle and the
daemons-inside layout.
## Non-goals
- **No protocol changes between sidecars.** pipelock still
speaks the same HTTPS-proxy protocol on the same port; egress
is still pipelock's upstream; git-gate still listens on
git-daemon's port; supervise still serves the same MCP HTTP
endpoint. Only the container they run in changes.
- **No config-schema changes.** `pipelock.yaml`,
`routes.yaml`, the git-gate access-hook, and the supervise
queue path all stay where they are; the bundle just bind-mounts
them at the same in-container paths as before.
- **No host-bind-mount surgery.** Each daemon's existing bind
mounts (per-bottle CA paths, the supervise queue dir, the
git-gate creds dir) remain. The bundle aggregates them onto
one container.
- **No supervisord / s6 / runit.** A 50-line stdlib Python init
is the supervisor. Adding a new init system for this is more
weight than the problem deserves and conflicts with the
project's stdlib-first ethos.
- **No selective daemon disable surfaced to the manifest.** The
init understands "skip git-gate / supervise when the bottle
doesn't use them" via env vars set by the compose renderer,
but operators don't get a manifest knob — the existing
`bottle.git` / `bottle.supervise` flags continue to drive it.
- **No agent-image changes.** The agent container (PRD 0023's
microVM in the smolmachines case) is unaffected; this PRD is
strictly about consolidating the sidecar chain.
## Scope
### In scope
- New `Dockerfile.sidecars` (multi-stage) bringing pipelock,
mitmproxy, gitleaks, git-daemon, openssh-client, and the
project's addon + server Python into one image.
- New `claude_bottle/sidecar_init.py` supervising the four
daemons.
- `backend/docker/compose.py` renderer collapse (five services
→ two).
- `backend/docker/{pipelock,egress,git_gate,supervise}.py`
reshape: keep the abstract `Plan` / proxy classes; remove
per-container lifecycle code that compose-up no longer needs.
- Image name and tag pinning (env var override + default; see
open question 3).
- Test updates: unit and integration tests that probe the
four-container shape get rewritten against the one-container
shape.
- README + CLAUDE.md doc updates.
### Out of scope
- The smolmachines backend itself (PRD 0023). This PRD just
produces the image; PRD 0023 consumes it.
- Per-daemon resource limits (CPU / memory caps) on the bundle.
Today nothing in the project sets them; consolidation
doesn't change that.
- Healthcheck redesign. The agent's `depends_on:
service_healthy` against the bundle covers all four daemons;
defining a single bundle-level healthcheck that aggregates
the per-daemon readiness is open question 4.
- Multi-arch image builds (arm64 + amd64). The current
per-sidecar images are amd64-only or whatever their bases
ship; we keep that posture.
## Proposed Design
### Bundle image
`Dockerfile.sidecars` is a four-stage multi-stage build, one
stage per source binary, plus a final stage that assembles them:
```dockerfile
# Stage 1: pull pipelock binary
FROM ghcr.io/luckypipewrench/pipelock@sha256:<pinned> AS pipelock-src
# pipelock binary is at /usr/local/bin/pipelock in this image.
# Stage 2: pull gitleaks binary
FROM zricethezav/gitleaks@sha256:<pinned> AS gitleaks-src
# gitleaks binary is at /usr/bin/gitleaks in this image.
# Stage 3: mitmproxy base (already has Python + mitmdump installed)
FROM mitmproxy/mitmproxy:11.1.3 AS final
USER root
# System deps for the git-gate daemon side
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
git git-daemon-run openssh-client ca-certificates \
&& rm -rf /var/lib/apt/lists/*
# Drop in the project's Python addon + server code
COPY claude_bottle/egress_addon_core.py /app/egress_addon_core.py
COPY claude_bottle/egress_addon.py /app/egress_addon.py
COPY claude_bottle/yaml_subset.py /app/yaml_subset.py
COPY claude_bottle/supervise.py /app/supervise.py
COPY claude_bottle/supervise_server.py /app/supervise_server.py
COPY claude_bottle/sidecar_init.py /app/sidecar_init.py
# Pull the standalone binaries into the final stage
COPY --from=pipelock-src /usr/local/bin/pipelock /usr/local/bin/pipelock
COPY --from=gitleaks-src /usr/bin/gitleaks /usr/bin/gitleaks
# Layout the bundle uses at runtime — preserved verbatim from the
# four previous Dockerfiles so existing docker-cp paths still work.
RUN mkdir -p \
/etc/pipelock \
/etc/egress \
/etc/git-gate \
/git-gate/creds \
/git \
/run/supervise/queue \
/home/mitmproxy/.mitmproxy
EXPOSE 8888 9099 9418 9100
ENTRYPOINT ["python3", "/app/sidecar_init.py"]
```
The final stage starts from the mitmproxy image because
mitmproxy has the heaviest install footprint (Python + mitmdump
+ deps); copying the other two binaries in is cheaper than the
reverse. Pinning each base by digest is unchanged from the
existing Dockerfiles.
### Init supervisor
`claude_bottle/sidecar_init.py` (sketch — actual code lands as
part of implementation):
```python
"""Per-bottle sidecar supervisor.
Spawns the configured daemons, forwards SIGTERM/SIGINT, exits
with the first non-zero child code (or 0 if every child exited
cleanly during normal shutdown)."""
DAEMONS = [
("egress", ["sh", "-c", EGRESS_ENTRYPOINT_SH]),
("pipelock", ["/usr/local/bin/pipelock", "run",
"--config", "/etc/pipelock/pipelock.yaml"]),
("git-gate", ["/git-gate-entrypoint.sh"]),
("supervise", ["python3", "/app/supervise_server.py"]),
]
# Order matters only for first-launch race-window reasons:
# egress starts first so pipelock's upstream connect succeeds
# during pipelock startup. git-gate and supervise are
# independent.
```
The env-driven daemon subset is the same handshake as today's
compose renderer: bottles without `git` skip git-gate, bottles
with `supervise: false` skip supervise.
### Compose renderer collapse
`bottle_plan_to_compose` emits one `sidecars` service in place
of the four. The service inherits the union of the four's
existing bind mounts; environment variables get prefixed by
daemon name where they clash (none clash today, but the renderer
becomes the central place to enforce that). Container hostname
becomes `sidecars` (or `claude-bottle-sidecars-<slug>` for the
externally-visible name). The agent service's HTTPS_PROXY and
git-gate URL move from per-sidecar hostnames to the single
`sidecars` hostname:
```yaml
# Before (sketch — five services)
services:
agent:
environment:
HTTPS_PROXY: "http://pipelock:8888"
GIT_GATE_URL: "git://git-gate:9418/repo"
MCP_SUPERVISE_URL: "http://supervise:9100"
pipelock: { image: ghcr.io/luckypipewrench/pipelock:... }
egress: { image: claude-bottle-egress:latest }
git-gate: { image: claude-bottle-git-gate:latest }
supervise:{ image: claude-bottle-supervise:latest }
# After (two services)
services:
agent:
environment:
HTTPS_PROXY: "http://sidecars:8888"
GIT_GATE_URL: "git://sidecars:9418/repo"
MCP_SUPERVISE_URL: "http://sidecars:9100"
sidecars:
image: claude-bottle-sidecars:<pinned>
# union of the four prior services' volumes / env / ports
```
`depends_on` collapses: the agent depends on `sidecars` only.
### Backend Python collapse
The four `claude_bottle/backend/docker/<sidecar>.py` files keep
their platform-neutral abstractions (proxy/plan classes) but
shed the docker-container-lifecycle code that compose-up
already owns. Container-name helpers consolidate:
```python
# was:
def pipelock_container_name(slug): ...
def egress_container_name(slug): ...
def git_gate_container_name(slug): ...
def supervise_container_name(slug): ...
# becomes:
def sidecar_bundle_container_name(slug: str) -> str:
return f"claude-bottle-sidecars-{slug}"
```
Per-daemon "is the container up?" helpers used by orphan
cleanup converge on a single check against the bundle name.
### External dependencies
None new. The bundle build pulls the same upstream images we
already pull; the consolidation is a packaging change.
### Migration
This PRD's change is large but mechanical. A pre-merge dry-run:
1. Land the bundle image build (`Dockerfile.sidecars` +
`sidecar_init.py`) without changing the renderer.
Confirm `docker build -f Dockerfile.sidecars .` succeeds
and the resulting container runs all four daemons.
2. Switch the renderer to emit the two-service shape behind an
env-var feature flag (e.g.
`CLAUDE_BOTTLE_SIDECAR_BUNDLE=1`).
3. Update integration tests in-place; flip the default once
green; delete the flag and the old Dockerfiles in a
follow-up commit on the same branch.
The compose-per-instance work in PRD 0018 already separated
sidecar lifecycle from agent lifecycle, so this PRD is
materially a renderer + image-build change — not a backend
rewrite.
## Sizing — into chunks
1. **Bundle image + init supervisor.** Write `Dockerfile.sidecars`
and `sidecar_init.py`, ship them, add a unit test that
builds the image in CI and asserts the four daemons start.
No renderer change yet.
2. **Compose renderer collapse.** Update
`bottle_plan_to_compose` to emit two services. Feature flag
it via env var. Update unit tests to assert on both shapes
(flag on vs off) during the migration window.
3. **Backend Python collapse.** Trim the four docker
sidecar modules, consolidate container-name helpers, update
orphan-cleanup logic to look for the bundle by name. Delete
old Dockerfiles.
4. **Integration test sweep.** Bring every integration test
that probes a four-container shape (`pipelock_container_name`,
`egress_container_name`, etc.) onto the bundle. Confirm
PRD 0022 stays green.
5. **Docs + flag removal.** Flip the default, remove the
feature flag, update README + CLAUDE.md.
## Open questions
1. **Init failure semantics.** When one daemon crashes mid-run,
should the bundle exit (killing the bottle) or restart just
that daemon? Today, with four separate containers, docker
restarts the crashed one and the bottle stays up. Default
for this PRD: bundle exits on any child death; the bottle
tears down. Restart logic can land later if operators hit
it in practice.
2. **Exit-code propagation.** If multiple daemons die in quick
succession (likely under SIGTERM), which exit code wins?
First-to-die is simplest. Worst-case (highest nonzero
exit code) gives clearest signal in logs. Default to
first-to-die unless an operator scenario disagrees.
3. **Image pin policy.** Pin `claude-bottle-sidecars` by tag
(`:latest` rebuilt per-release) or by digest written into a
`CLAUDE_BOTTLE_SIDECAR_IMAGE` env var like the existing
`CLAUDE_BOTTLE_PIPELOCK_IMAGE`? Default to env-var override
+ a documented tag; digest pinning is an operator opt-in.
4. **Healthcheck aggregation.** Today each sidecar service has
its own compose healthcheck and `agent.depends_on:
service_healthy: { pipelock: true, ... }`. With one
container, the bundle needs one healthcheck that returns
ready iff all daemons are listening. Cheapest: TCP probe on
pipelock's port + git-gate's port + supervise's port from
inside the container, scripted into a small `/app/healthcheck.sh`.
Resolve in chunk 1.
5. **Log interleaving + debuggability.** All four daemons'
stdout/stderr merge into one container log. The init
prefixes each line with the daemon name, but operators may
want per-daemon log files for easier triage. Default: no
per-daemon files in v1; revisit if debug-time pain shows up.
6. **Backwards compat for an installed-base test fixture.**
Some integration tests synthesize compose files by hand and
assert on per-sidecar container names. They'll need
touching in chunk 4. List them up front in the chunk-4
commit so the diff isn't a surprise.
## References
- `Dockerfile.egress`, `Dockerfile.git-gate`,
`Dockerfile.supervise` — the three Dockerfiles this PRD
collapses into `Dockerfile.sidecars`.
- `claude_bottle/backend/docker/compose.py` — the renderer this
PRD slims down.
- `claude_bottle/backend/docker/pipelock.py` — current home of
`PIPELOCK_IMAGE` and the pinned digest the bundle's first
stage reuses.
- PRD 0017
(`docs/prds/0017-egress-proxy-via-mitmproxy.md`) — defines
egress's role as pipelock's upstream; this PRD relies on
that being implementable over localhost just as easily as
over the internal docker network.
- PRD 0018
(`docs/prds/0018-compose-per-instance.md`) — the
compose-per-instance refactor this PRD builds on. PRD 0018
separated sidecar lifecycle from agent lifecycle, which is
what makes a single-bundle compose service a renderer-only
change instead of a backend rewrite.
- PRD 0022
(`docs/prds/0022-sandbox-escape-integration-test.md`) — must
remain green through the migration.
- PRD 0023
(`docs/prds/0023-smolmachines-backend.md`) — the second
consumer of this bundle; depends on this PRD's image being
available before its chunk 3.