docs(prd-0024): consolidate per-bottle sidecars into a single bundle
Replace pipelock + egress + git-gate + supervise as four separate containers with one bundle image (claude-bottle-sidecars) running all four daemons under a small stdlib Python init supervisor. Compose file collapses from five services to two; same daemons, same ports, same protocols, one container. Sized: bundle image + init → renderer collapse (feature-flagged) → backend Python trim → integration sweep → flag removal. Prerequisite for PRD 0023 chunk 3 (smolmachines backend reuses the same bundle as its sole host-side sidecar container). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,455 @@
|
||||
# PRD 0024: Consolidate per-bottle sidecars into a single bundle
|
||||
|
||||
- **Status:** Draft
|
||||
- **Author:** didericis
|
||||
- **Created:** 2026-05-26
|
||||
|
||||
## Summary
|
||||
|
||||
Replace the four per-bottle sidecar containers in the Docker
|
||||
backend (pipelock, egress, git-gate, supervise) with a single
|
||||
container image — `claude-bottle-sidecars` — that runs all four
|
||||
daemons under a small stdlib-Python init supervisor. Same
|
||||
per-bottle lifetime, same scope, fewer containers per bottle,
|
||||
one Dockerfile to maintain instead of three. Outcome: the
|
||||
docker backend's compose file goes from five services
|
||||
(`agent`, `pipelock`, `egress`, `git-gate`, `supervise`) to
|
||||
two (`agent`, `sidecars`); the smolmachines backend defined in
|
||||
PRD 0023 reuses the same image as its sole sidecar container.
|
||||
|
||||
## Problem
|
||||
|
||||
The four sidecars are tightly coupled in lifetime and scope:
|
||||
|
||||
- All four start when a bottle starts and stop when it stops.
|
||||
There is no scenario where one runs without the others.
|
||||
- `egress` is `pipelock`'s upstream over the internal network —
|
||||
nothing on the agent side ever addresses egress directly. Its
|
||||
separateness today is a docker-compose-ism: one Dockerfile per
|
||||
service was the easiest way to ship the chunk-by-chunk
|
||||
rollouts of PRDs 0001, 0008, 0013, and 0017.
|
||||
- `git-gate` and `supervise` run their own daemons but with the
|
||||
same "started + stopped with the bottle" lifecycle.
|
||||
|
||||
Three concrete costs of keeping them split:
|
||||
|
||||
1. **Compose-file surface area.** Five `services:` entries per
|
||||
bottle. The renderer in `backend/docker/compose.py` has to
|
||||
know each one's image, env, healthcheck, port-mapping,
|
||||
dependency wiring (`depends_on`), and CA / config bind mounts.
|
||||
That's a lot of moving parts for what is really one logical
|
||||
sidecar.
|
||||
2. **Cold start latency.** Docker creates and starts four
|
||||
containers in dependency order even for a trivial agent run.
|
||||
Each container costs ~50-100ms of compose orchestration even
|
||||
when the image is cached.
|
||||
3. **Cross-backend duplication.** PRD 0023's smolmachines
|
||||
backend would otherwise need its own four-process supervisor
|
||||
on the host side. A shared bundle image collapses both
|
||||
backends onto the same sidecar primitive.
|
||||
|
||||
This PRD is also the prerequisite for chunk 3 of PRD 0023.
|
||||
|
||||
## Goals / Success Criteria
|
||||
|
||||
The feature works when all of the following are observable:
|
||||
|
||||
- `cli.py start <agent>` on the Docker backend produces a
|
||||
compose project with exactly two services (`agent`,
|
||||
`sidecars`) and three published agent-facing ports
|
||||
(HTTPS_PROXY, git-gate, supervise) on the `sidecars`
|
||||
container.
|
||||
- All existing integration tests pass with no behavior change
|
||||
visible to the agent. The four daemons inside the bundle
|
||||
speak the same protocols on the same well-known in-container
|
||||
ports as before; only the container hostname changes.
|
||||
- The sandbox-escape suite from PRD 0022 stays green.
|
||||
- `docker logs claude-bottle-sidecars-<slug>` shows interleaved
|
||||
output from all four daemons, prefixed by the supervisor with
|
||||
the daemon name. Each daemon's exit propagates through the
|
||||
supervisor to the container's exit code.
|
||||
- Sending SIGTERM to the bundle container (the docker stop path)
|
||||
shuts down all four daemons cleanly within the existing
|
||||
compose stop-grace timeout (10s).
|
||||
|
||||
The feature is **done** when all of the following ship:
|
||||
|
||||
- A new `Dockerfile.sidecars` (multi-stage) that:
|
||||
- Copies the `pipelock` binary from the upstream pipelock
|
||||
image (currently `ghcr.io/luckypipewrench/pipelock` pinned
|
||||
by digest in `claude_bottle/backend/docker/pipelock.py`).
|
||||
- Copies the `gitleaks` binary from `zricethezav/gitleaks`
|
||||
(currently pinned by digest in `Dockerfile.git-gate`).
|
||||
- Installs `mitmdump` (via `pip install mitmproxy==<pinned>`).
|
||||
- Installs the system deps `git-daemon` + `openssh-client`
|
||||
that git-gate needs.
|
||||
- Copies the existing addon + server Python from
|
||||
`claude_bottle/egress_addon.py`, `egress_addon_core.py`,
|
||||
`yaml_subset.py`, `supervise.py`, `supervise_server.py`.
|
||||
- Drops in a new `claude_bottle/sidecar_init.py` (stdlib
|
||||
Python) as the container's `ENTRYPOINT`.
|
||||
- A new `claude_bottle/sidecar_init.py` — a small Python init
|
||||
supervisor that:
|
||||
- Reads which daemons to run from env (defaults: all four).
|
||||
- Spawns each as a `subprocess.Popen` with prefixed
|
||||
line-buffered output.
|
||||
- Catches `SIGTERM` / `SIGINT`, propagates to each child,
|
||||
`waitpid()`s with a per-child grace deadline, escalates to
|
||||
`SIGKILL` past the deadline.
|
||||
- Exits with code 0 only if every child exited 0; otherwise
|
||||
exits 1. (Or: any-child-died → tear down the rest and exit
|
||||
that child's code — see open question 2.)
|
||||
- `claude_bottle/backend/docker/compose.py` renderer updated to
|
||||
emit one `sidecars` service in place of the four. The four
|
||||
in-container ports (8888 / 9099 / 9418 / 9100, today) all
|
||||
land on the same container; the agent-facing ports
|
||||
(HTTPS_PROXY, git-gate-SSH, supervise-MCP) are published as
|
||||
before, just from one container instead of three.
|
||||
- `claude_bottle/backend/docker/{pipelock,egress,git_gate,supervise}.py`
|
||||
collapsed: the platform-neutral pieces stay
|
||||
(`PipelockProxy`, `Egress`, `GitGate`, `Supervise` ABCs and
|
||||
their plans), the docker-specific subclasses lose their
|
||||
per-container start/stop / image-build / healthcheck logic
|
||||
and gain shared bundle-aware helpers. Container name helpers
|
||||
(`pipelock_container_name(slug)` etc.) become a single
|
||||
`sidecar_bundle_container_name(slug)`.
|
||||
- `Dockerfile.egress`, `Dockerfile.git-gate`, and
|
||||
`Dockerfile.supervise` deleted. The bundle is the only image.
|
||||
- Tests:
|
||||
- Unit: the compose renderer emits exactly two services and
|
||||
one sidecars service has all three published ports.
|
||||
- Unit: the sidecar-init supervisor propagates SIGTERM and
|
||||
returns nonzero when a child crashes.
|
||||
- Integration: existing PRD 0001 / 0008 / 0013 / 0017
|
||||
integration tests run against the bundle and pass.
|
||||
- Integration: PRD 0022 sandbox-escape suite stays green.
|
||||
- `CLAUDE.md` updated to describe the bundle and the
|
||||
daemons-inside layout.
|
||||
|
||||
## Non-goals
|
||||
|
||||
- **No protocol changes between sidecars.** pipelock still
|
||||
speaks the same HTTPS-proxy protocol on the same port; egress
|
||||
is still pipelock's upstream; git-gate still listens on
|
||||
git-daemon's port; supervise still serves the same MCP HTTP
|
||||
endpoint. Only the container they run in changes.
|
||||
- **No config-schema changes.** `pipelock.yaml`,
|
||||
`routes.yaml`, the git-gate access-hook, and the supervise
|
||||
queue path all stay where they are; the bundle just bind-mounts
|
||||
them at the same in-container paths as before.
|
||||
- **No host-bind-mount surgery.** Each daemon's existing bind
|
||||
mounts (per-bottle CA paths, the supervise queue dir, the
|
||||
git-gate creds dir) remain. The bundle aggregates them onto
|
||||
one container.
|
||||
- **No supervisord / s6 / runit.** A 50-line stdlib Python init
|
||||
is the supervisor. Adding a new init system for this is more
|
||||
weight than the problem deserves and conflicts with the
|
||||
project's stdlib-first ethos.
|
||||
- **No selective daemon disable surfaced to the manifest.** The
|
||||
init understands "skip git-gate / supervise when the bottle
|
||||
doesn't use them" via env vars set by the compose renderer,
|
||||
but operators don't get a manifest knob — the existing
|
||||
`bottle.git` / `bottle.supervise` flags continue to drive it.
|
||||
- **No agent-image changes.** The agent container (PRD 0023's
|
||||
microVM in the smolmachines case) is unaffected; this PRD is
|
||||
strictly about consolidating the sidecar chain.
|
||||
|
||||
## Scope
|
||||
|
||||
### In scope
|
||||
|
||||
- New `Dockerfile.sidecars` (multi-stage) bringing pipelock,
|
||||
mitmproxy, gitleaks, git-daemon, openssh-client, and the
|
||||
project's addon + server Python into one image.
|
||||
- New `claude_bottle/sidecar_init.py` supervising the four
|
||||
daemons.
|
||||
- `backend/docker/compose.py` renderer collapse (five services
|
||||
→ two).
|
||||
- `backend/docker/{pipelock,egress,git_gate,supervise}.py`
|
||||
reshape: keep the abstract `Plan` / proxy classes; remove
|
||||
per-container lifecycle code that compose-up no longer needs.
|
||||
- Image name and tag pinning (env var override + default; see
|
||||
open question 3).
|
||||
- Test updates: unit and integration tests that probe the
|
||||
four-container shape get rewritten against the one-container
|
||||
shape.
|
||||
- README + CLAUDE.md doc updates.
|
||||
|
||||
### Out of scope
|
||||
|
||||
- The smolmachines backend itself (PRD 0023). This PRD just
|
||||
produces the image; PRD 0023 consumes it.
|
||||
- Per-daemon resource limits (CPU / memory caps) on the bundle.
|
||||
Today nothing in the project sets them; consolidation
|
||||
doesn't change that.
|
||||
- Healthcheck redesign. The agent's `depends_on:
|
||||
service_healthy` against the bundle covers all four daemons;
|
||||
defining a single bundle-level healthcheck that aggregates
|
||||
the per-daemon readiness is open question 4.
|
||||
- Multi-arch image builds (arm64 + amd64). The current
|
||||
per-sidecar images are amd64-only or whatever their bases
|
||||
ship; we keep that posture.
|
||||
|
||||
## Proposed Design
|
||||
|
||||
### Bundle image
|
||||
|
||||
`Dockerfile.sidecars` is a four-stage multi-stage build, one
|
||||
stage per source binary, plus a final stage that assembles them:
|
||||
|
||||
```dockerfile
|
||||
# Stage 1: pull pipelock binary
|
||||
FROM ghcr.io/luckypipewrench/pipelock@sha256:<pinned> AS pipelock-src
|
||||
# pipelock binary is at /usr/local/bin/pipelock in this image.
|
||||
|
||||
# Stage 2: pull gitleaks binary
|
||||
FROM zricethezav/gitleaks@sha256:<pinned> AS gitleaks-src
|
||||
# gitleaks binary is at /usr/bin/gitleaks in this image.
|
||||
|
||||
# Stage 3: mitmproxy base (already has Python + mitmdump installed)
|
||||
FROM mitmproxy/mitmproxy:11.1.3 AS final
|
||||
USER root
|
||||
|
||||
# System deps for the git-gate daemon side
|
||||
RUN apt-get update \
|
||||
&& apt-get install -y --no-install-recommends \
|
||||
git git-daemon-run openssh-client ca-certificates \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Drop in the project's Python addon + server code
|
||||
COPY claude_bottle/egress_addon_core.py /app/egress_addon_core.py
|
||||
COPY claude_bottle/egress_addon.py /app/egress_addon.py
|
||||
COPY claude_bottle/yaml_subset.py /app/yaml_subset.py
|
||||
COPY claude_bottle/supervise.py /app/supervise.py
|
||||
COPY claude_bottle/supervise_server.py /app/supervise_server.py
|
||||
COPY claude_bottle/sidecar_init.py /app/sidecar_init.py
|
||||
|
||||
# Pull the standalone binaries into the final stage
|
||||
COPY --from=pipelock-src /usr/local/bin/pipelock /usr/local/bin/pipelock
|
||||
COPY --from=gitleaks-src /usr/bin/gitleaks /usr/bin/gitleaks
|
||||
|
||||
# Layout the bundle uses at runtime — preserved verbatim from the
|
||||
# four previous Dockerfiles so existing docker-cp paths still work.
|
||||
RUN mkdir -p \
|
||||
/etc/pipelock \
|
||||
/etc/egress \
|
||||
/etc/git-gate \
|
||||
/git-gate/creds \
|
||||
/git \
|
||||
/run/supervise/queue \
|
||||
/home/mitmproxy/.mitmproxy
|
||||
|
||||
EXPOSE 8888 9099 9418 9100
|
||||
|
||||
ENTRYPOINT ["python3", "/app/sidecar_init.py"]
|
||||
```
|
||||
|
||||
The final stage starts from the mitmproxy image because
|
||||
mitmproxy has the heaviest install footprint (Python + mitmdump
|
||||
+ deps); copying the other two binaries in is cheaper than the
|
||||
reverse. Pinning each base by digest is unchanged from the
|
||||
existing Dockerfiles.
|
||||
|
||||
### Init supervisor
|
||||
|
||||
`claude_bottle/sidecar_init.py` (sketch — actual code lands as
|
||||
part of implementation):
|
||||
|
||||
```python
|
||||
"""Per-bottle sidecar supervisor.
|
||||
|
||||
Spawns the configured daemons, forwards SIGTERM/SIGINT, exits
|
||||
with the first non-zero child code (or 0 if every child exited
|
||||
cleanly during normal shutdown)."""
|
||||
|
||||
DAEMONS = [
|
||||
("egress", ["sh", "-c", EGRESS_ENTRYPOINT_SH]),
|
||||
("pipelock", ["/usr/local/bin/pipelock", "run",
|
||||
"--config", "/etc/pipelock/pipelock.yaml"]),
|
||||
("git-gate", ["/git-gate-entrypoint.sh"]),
|
||||
("supervise", ["python3", "/app/supervise_server.py"]),
|
||||
]
|
||||
|
||||
# Order matters only for first-launch race-window reasons:
|
||||
# egress starts first so pipelock's upstream connect succeeds
|
||||
# during pipelock startup. git-gate and supervise are
|
||||
# independent.
|
||||
```
|
||||
|
||||
The env-driven daemon subset is the same handshake as today's
|
||||
compose renderer: bottles without `git` skip git-gate, bottles
|
||||
with `supervise: false` skip supervise.
|
||||
|
||||
### Compose renderer collapse
|
||||
|
||||
`bottle_plan_to_compose` emits one `sidecars` service in place
|
||||
of the four. The service inherits the union of the four's
|
||||
existing bind mounts; environment variables get prefixed by
|
||||
daemon name where they clash (none clash today, but the renderer
|
||||
becomes the central place to enforce that). Container hostname
|
||||
becomes `sidecars` (or `claude-bottle-sidecars-<slug>` for the
|
||||
externally-visible name). The agent service's HTTPS_PROXY and
|
||||
git-gate URL move from per-sidecar hostnames to the single
|
||||
`sidecars` hostname:
|
||||
|
||||
```yaml
|
||||
# Before (sketch — five services)
|
||||
services:
|
||||
agent:
|
||||
environment:
|
||||
HTTPS_PROXY: "http://pipelock:8888"
|
||||
GIT_GATE_URL: "git://git-gate:9418/repo"
|
||||
MCP_SUPERVISE_URL: "http://supervise:9100"
|
||||
pipelock: { image: ghcr.io/luckypipewrench/pipelock:... }
|
||||
egress: { image: claude-bottle-egress:latest }
|
||||
git-gate: { image: claude-bottle-git-gate:latest }
|
||||
supervise:{ image: claude-bottle-supervise:latest }
|
||||
|
||||
# After (two services)
|
||||
services:
|
||||
agent:
|
||||
environment:
|
||||
HTTPS_PROXY: "http://sidecars:8888"
|
||||
GIT_GATE_URL: "git://sidecars:9418/repo"
|
||||
MCP_SUPERVISE_URL: "http://sidecars:9100"
|
||||
sidecars:
|
||||
image: claude-bottle-sidecars:<pinned>
|
||||
# union of the four prior services' volumes / env / ports
|
||||
```
|
||||
|
||||
`depends_on` collapses: the agent depends on `sidecars` only.
|
||||
|
||||
### Backend Python collapse
|
||||
|
||||
The four `claude_bottle/backend/docker/<sidecar>.py` files keep
|
||||
their platform-neutral abstractions (proxy/plan classes) but
|
||||
shed the docker-container-lifecycle code that compose-up
|
||||
already owns. Container-name helpers consolidate:
|
||||
|
||||
```python
|
||||
# was:
|
||||
def pipelock_container_name(slug): ...
|
||||
def egress_container_name(slug): ...
|
||||
def git_gate_container_name(slug): ...
|
||||
def supervise_container_name(slug): ...
|
||||
|
||||
# becomes:
|
||||
def sidecar_bundle_container_name(slug: str) -> str:
|
||||
return f"claude-bottle-sidecars-{slug}"
|
||||
```
|
||||
|
||||
Per-daemon "is the container up?" helpers used by orphan
|
||||
cleanup converge on a single check against the bundle name.
|
||||
|
||||
### External dependencies
|
||||
|
||||
None new. The bundle build pulls the same upstream images we
|
||||
already pull; the consolidation is a packaging change.
|
||||
|
||||
### Migration
|
||||
|
||||
This PRD's change is large but mechanical. A pre-merge dry-run:
|
||||
|
||||
1. Land the bundle image build (`Dockerfile.sidecars` +
|
||||
`sidecar_init.py`) without changing the renderer.
|
||||
Confirm `docker build -f Dockerfile.sidecars .` succeeds
|
||||
and the resulting container runs all four daemons.
|
||||
2. Switch the renderer to emit the two-service shape behind an
|
||||
env-var feature flag (e.g.
|
||||
`CLAUDE_BOTTLE_SIDECAR_BUNDLE=1`).
|
||||
3. Update integration tests in-place; flip the default once
|
||||
green; delete the flag and the old Dockerfiles in a
|
||||
follow-up commit on the same branch.
|
||||
|
||||
The compose-per-instance work in PRD 0018 already separated
|
||||
sidecar lifecycle from agent lifecycle, so this PRD is
|
||||
materially a renderer + image-build change — not a backend
|
||||
rewrite.
|
||||
|
||||
## Sizing — into chunks
|
||||
|
||||
1. **Bundle image + init supervisor.** Write `Dockerfile.sidecars`
|
||||
and `sidecar_init.py`, ship them, add a unit test that
|
||||
builds the image in CI and asserts the four daemons start.
|
||||
No renderer change yet.
|
||||
2. **Compose renderer collapse.** Update
|
||||
`bottle_plan_to_compose` to emit two services. Feature flag
|
||||
it via env var. Update unit tests to assert on both shapes
|
||||
(flag on vs off) during the migration window.
|
||||
3. **Backend Python collapse.** Trim the four docker
|
||||
sidecar modules, consolidate container-name helpers, update
|
||||
orphan-cleanup logic to look for the bundle by name. Delete
|
||||
old Dockerfiles.
|
||||
4. **Integration test sweep.** Bring every integration test
|
||||
that probes a four-container shape (`pipelock_container_name`,
|
||||
`egress_container_name`, etc.) onto the bundle. Confirm
|
||||
PRD 0022 stays green.
|
||||
5. **Docs + flag removal.** Flip the default, remove the
|
||||
feature flag, update README + CLAUDE.md.
|
||||
|
||||
## Open questions
|
||||
|
||||
1. **Init failure semantics.** When one daemon crashes mid-run,
|
||||
should the bundle exit (killing the bottle) or restart just
|
||||
that daemon? Today, with four separate containers, docker
|
||||
restarts the crashed one and the bottle stays up. Default
|
||||
for this PRD: bundle exits on any child death; the bottle
|
||||
tears down. Restart logic can land later if operators hit
|
||||
it in practice.
|
||||
2. **Exit-code propagation.** If multiple daemons die in quick
|
||||
succession (likely under SIGTERM), which exit code wins?
|
||||
First-to-die is simplest. Worst-case (highest nonzero
|
||||
exit code) gives clearest signal in logs. Default to
|
||||
first-to-die unless an operator scenario disagrees.
|
||||
3. **Image pin policy.** Pin `claude-bottle-sidecars` by tag
|
||||
(`:latest` rebuilt per-release) or by digest written into a
|
||||
`CLAUDE_BOTTLE_SIDECAR_IMAGE` env var like the existing
|
||||
`CLAUDE_BOTTLE_PIPELOCK_IMAGE`? Default to env-var override
|
||||
+ a documented tag; digest pinning is an operator opt-in.
|
||||
4. **Healthcheck aggregation.** Today each sidecar service has
|
||||
its own compose healthcheck and `agent.depends_on:
|
||||
service_healthy: { pipelock: true, ... }`. With one
|
||||
container, the bundle needs one healthcheck that returns
|
||||
ready iff all daemons are listening. Cheapest: TCP probe on
|
||||
pipelock's port + git-gate's port + supervise's port from
|
||||
inside the container, scripted into a small `/app/healthcheck.sh`.
|
||||
Resolve in chunk 1.
|
||||
5. **Log interleaving + debuggability.** All four daemons'
|
||||
stdout/stderr merge into one container log. The init
|
||||
prefixes each line with the daemon name, but operators may
|
||||
want per-daemon log files for easier triage. Default: no
|
||||
per-daemon files in v1; revisit if debug-time pain shows up.
|
||||
6. **Backwards compat for an installed-base test fixture.**
|
||||
Some integration tests synthesize compose files by hand and
|
||||
assert on per-sidecar container names. They'll need
|
||||
touching in chunk 4. List them up front in the chunk-4
|
||||
commit so the diff isn't a surprise.
|
||||
|
||||
## References
|
||||
|
||||
- `Dockerfile.egress`, `Dockerfile.git-gate`,
|
||||
`Dockerfile.supervise` — the three Dockerfiles this PRD
|
||||
collapses into `Dockerfile.sidecars`.
|
||||
- `claude_bottle/backend/docker/compose.py` — the renderer this
|
||||
PRD slims down.
|
||||
- `claude_bottle/backend/docker/pipelock.py` — current home of
|
||||
`PIPELOCK_IMAGE` and the pinned digest the bundle's first
|
||||
stage reuses.
|
||||
- PRD 0017
|
||||
(`docs/prds/0017-egress-proxy-via-mitmproxy.md`) — defines
|
||||
egress's role as pipelock's upstream; this PRD relies on
|
||||
that being implementable over localhost just as easily as
|
||||
over the internal docker network.
|
||||
- PRD 0018
|
||||
(`docs/prds/0018-compose-per-instance.md`) — the
|
||||
compose-per-instance refactor this PRD builds on. PRD 0018
|
||||
separated sidecar lifecycle from agent lifecycle, which is
|
||||
what makes a single-bundle compose service a renderer-only
|
||||
change instead of a backend rewrite.
|
||||
- PRD 0022
|
||||
(`docs/prds/0022-sandbox-escape-integration-test.md`) — must
|
||||
remain green through the migration.
|
||||
- PRD 0023
|
||||
(`docs/prds/0023-smolmachines-backend.md`) — the second
|
||||
consumer of this bundle; depends on this PRD's image being
|
||||
available before its chunk 3.
|
||||
Reference in New Issue
Block a user