bot-bottle/docs/prds/0018-compose-per-instance.md

# PRD 0018: One Compose project per bottle instance

- **Status:** Active
- **Author:** didericis
- **Created:** 2026-05-25

## Summary

Replace the current pattern of orchestrating each sidecar with its own
`docker` SDK calls with **one `docker compose` project per bottle
instance**. The compose project is generated at `start` time, written
to disk under the instance's state dir, and brought up with
`docker compose up`. Tearing the instance down is `docker compose
down`. Logs come from `docker compose logs` and land in a single file
per instance, so reading what happened in a session is one `less`
away.

State for each instance (`~/.bot-bottle/state/<slug>/`) becomes a
self-describing folder:

```
metadata.json           # agent_name, cwd, started_at, compose project name, ...
docker-compose.yml      # the exact compose spec used to start this instance
compose.log             # full dump of `docker compose logs --no-color`
transcript/             # snapshotted agent conversation (existing)
live-config/            # routes.yaml, allowlist — bind-mounted into sidecars (existing)
```

Anything that needs to look at "what did instance X actually run?" can
read those four artifacts. The compose file plus the metadata
together fully describe the container topology.

## Problem

Today `start` builds each sidecar (`pipelock`, `egress`, `git-gate`,
`supervise`) and the agent container with a chain of individual SDK
calls in `bot_bottle/backend/docker/launch.py`:

- A per-sidecar `Docker{Sidecar}.start()` method does
  `docker create` → `docker cp` (stage files) → `docker network
  connect` → `docker start`.
- Two networks are created up front (`network_create` calls).
- The agent container starts last via its own `docker run`.

This is fine, but it has three rough edges:

1. **No single artifact describes the topology.** To understand what
   ran for instance `<slug>`, you have to read the Python that built
   the SDK calls. Nothing is on disk you can `cat`.

2. **Logs are scattered.** Each container's logs sit in Docker's per-
   container journal. To debug a session post-mortem you have to
   remember to run `docker logs bot-bottle-pipelock-<slug>` etc.
   before the containers age out, and there's no merged view.

3. **Teardown is bespoke.** Each sidecar's `stop()` is its own
   method, ordered carefully in `start.py`'s `ExitStack`. A leftover
   container or network from a crash takes the `cleanup` CLI to find.

Compose is purpose-built for this shape: declarative spec, one
project name per environment, merged logs, atomic up/down.

## Goals / Success Criteria

1. `bot-bottle start <agent>` writes
   `~/.bot-bottle/state/<slug>/docker-compose.yml` and brings the
   project up with `docker compose -p <project> up`.
2. The compose file is the source of truth for the container
   topology — every sidecar that runs is declared as a `services:`
   entry, every network is a `networks:` entry, every bind mount is
   a `volumes:` entry.
3. `~/.bot-bottle/state/<slug>/compose.log` contains the full
   merged stdout/stderr of every service for the session, in
   `docker compose logs --no-color` format.
4. `metadata.json` records the compose project name alongside the
   existing fields (`agent_name`, `cwd`, `started_at`), so other
   tools can derive `docker compose -p <project> ...` invocations
   without re-deriving the slug.
5. Session teardown is `docker compose -p <project> down`. The
   existing per-sidecar `stop()` lifecycle methods come out.
6. The `cleanup` CLI uses `docker compose ls` (filtered to
   `bot-bottle-*` projects) instead of name-prefix scans across
   `docker ps -a` and `docker network ls`.
7. The existing remediation flows (`pipelock-block`,
   `egress-block`, `capability-block`) keep working without
   protocol changes — they write to host paths under
   `state/<slug>/live-config/`, sidecars `SIGHUP`-reload from the
   bind mount, no compose-side restart needed.

## Non-goals

- **Multi-host compose.** No swarm, no remote contexts. Each instance
  is one local Docker daemon.
- **Replacing the manifest format.** Manifests stay; compose is an
  implementation detail of the Docker backend.
- **Replacing the backend abstraction (PRD 0003).** `Backend` stays
  abstract; only the Docker implementation changes.
- **A long-lived "bot-bottle daemon."** Each `start` invocation
  still owns a single compose project for the lifetime of the
  session. No persistent service.
- **Image pre-building.** Compose's `build:` directive triggers
  builds on first `up`, same as today; no separate build step.
- **Backwards compatibility with running instances at upgrade.** If
  an instance was started by the pre-compose code, the user kills
  it and starts a new one. There's no migration path for live
  containers.

## Scope

### In scope

- New module `bot_bottle/backend/docker/compose.py` that renders a
  compose dict from a `BottlePlan` and writes it to
  `state/<slug>/docker-compose.yml`.
- `DockerBackend.start` rewritten to:
  1. Build the plan (existing `prepare`).
  2. Stage bind-mount inputs (CAs, routes.yaml, env file, hooks)
     into host paths under `state/<slug>/`.
  3. Render + write the compose file.
  4. Exec `docker compose -p <project> up -d`.
  5. `docker attach bot-bottle-<slug>` for the agent's TTY.
  6. On exit: `docker compose -p <project> logs --no-color`
     → `state/<slug>/compose.log`, then `docker compose -p
     <project> down --volumes`.
- Sidecar stage files move from `docker cp`-into-container to
  bind-mounts from `state/<slug>/`. This deletes a lot of code
  in `pipelock.py`, `git_gate.py`, `egress.py`, `supervise.py`.
- `metadata.json` gains a `compose_project` field.
- `cleanup` CLI rewritten to use `docker compose ls` for discovery.
- The per-sidecar `Docker{Sidecar}.start/stop` lifecycle methods
  collapse into `Docker{Sidecar}.compose_service()` returning a
  service-dict fragment. Their apply / introspection helpers (
  `egress_apply.py`, `supervise.py`'s handlers) are unchanged.

### Out of scope

- Changing the manifest layer (`bot_bottle/manifest.py`,
  `egress.py`'s plan dataclasses, `pipelock.py`'s plan dataclasses).
- Changing the agent's runtime contract (proxy env vars, CA bundle
  paths, current-config mount path).
- Changing audit-log shape or location (
  `~/.bot-bottle/audit/<component>-<slug>.log` stays).
- Changing the MCP server's tool list or wire format.
- Dropping the `--rm` semantics for the agent: the agent container
  is still ephemeral; compose's `down --volumes` handles cleanup.

## Proposed design

### Project name

`compose_project = f"bot-bottle-{slug}"`. The slug stays the
existing `slugify(agent_name)-<5-char-random-base36>` from
`bottle_state.py`. Compose adds its own prefix to networks
(`<project>_<network>`) and to default container names — which is
why each service gets an explicit `container_name:` (below).

### Service / container naming

Service names inside the compose file are short (`agent`,
`pipelock`, `egress`, `git-gate`, `supervise`). Each service sets
an explicit `container_name:` matching today's pattern:

```yaml
services:
  pipelock:
    container_name: bot-bottle-pipelock-<slug>
  egress:
    container_name: bot-bottle-egress-<slug>
  # ...
```

This keeps the dashboard's container-discovery output stable for
operators who've memorized the names. The compose project name
(`bot-bottle-<slug>`) is the only new identifier.

### Networks

The two existing networks (`bot-bottle-net-<slug>` internal +
`bot-bottle-egress-<slug>` upstream-bridge) become compose
networks:

```yaml
networks:
  internal:
    name: bot-bottle-net-<slug>
    internal: true
  egress:
    name: bot-bottle-egress-<slug>
```

Each service's `networks:` list mirrors today's wiring.

### Bind mounts replace `docker cp`

The current pattern of `docker create` → `docker cp file
container:/path` → `docker start` (used by every sidecar to land
routes.yaml, CAs, hooks) becomes host bind-mounts. The host paths
live under `state/<slug>/`:

```
state/<slug>/
  live-config/
    routes.yaml
    allowlist
  pipelock-ca/
    ca.pem
    ca-key.pem
  egress-ca/
    ca.pem
    ca-key.pem
  git-gate/
    entrypoint.sh
    hooks/
    ...
  env/
    agent.env
```

Each sidecar service mounts the relevant sub-tree read-only at the
in-container path it expects. Permissions on the host paths are
locked to 0600/0700 at write time (existing `mode=0o600` discipline
in `prepare.py` extends naturally).

### Conditional services

The compose renderer takes the same `BottlePlan` the SDK calls
read today and only emits services for sidecars that apply:

- `pipelock` — always.
- `egress` — only if `bottle.egress.routes` is non-empty.
- `git-gate` — only if `bottle.git` is non-empty.
- `supervise` — only if `bottle.supervise` is true.
- `agent` — always.

Conditional `depends_on:` edges keep the agent waiting on
sidecars that exist.

### Logging

`docker compose up -d` starts everything detached. The agent is
attached for the user's TTY via `docker attach bot-bottle-
<slug>`. Sidecars stream into Docker's per-container journals
during the session, exactly as today, and `docker compose logs -f`
gives a merged tail if the user wants it (the dashboard can shell
to this).

At session end (success or crash), `start.py`'s ExitStack runs:

1. `snapshot_transcript(slug)` (unchanged).
2. `docker compose -p <project> logs --no-color --timestamps` →
   `state/<slug>/compose.log`.
3. `docker compose -p <project> down --volumes`.
4. `cleanup_state(slug)` (unchanged — still removes the state dir
   unless `.preserve` was written).

The log dump is best-effort; a failure there shouldn't block
teardown.

### metadata.json shape

Add one field; everything else is unchanged.

```json
{
  "agent_name": "implementer",
  "cwd": "/Users/.../some-project",
  "started_at": "2026-05-25T20:13:04Z",
  "compose_project": "bot-bottle-implementer-a7k3f"
}
```

### Per-sidecar class shape

Today's `DockerPipelock`, `DockerGitGate`, `DockerEgress`,
`DockerSupervise` each carry `start()` + `stop()` lifecycle plus
helper logic (image building, route validation, apply handlers).

After this PRD:

- The `start()`/`stop()` methods come out.
- A new method per class, `compose_service(plan) -> dict`, returns
  the service-stanza fragment (image / build / container_name /
  networks / volumes / env / depends_on).
- The image-build flow becomes `build:` in the compose file, so
  the per-sidecar `docker build` calls go away too.
- The apply/introspection helpers (`egress_apply.add_route`,
  `supervise.py`'s capability handlers, etc.) are untouched — they
  read/write host paths under `state/<slug>/live-config/` and the
  bind-mounted sidecars `SIGHUP`-reload.

### Cleanup CLI

`./cli.py cleanup` switches from "list every container with prefix
`bot-bottle-` and every network with prefix `bot-bottle-net-`
or `bot-bottle-egress-`" to:

1. `docker compose ls --all --format json` → filter to projects
   whose name starts with `bot-bottle-`.
2. For each: `docker compose -p <project> down --volumes`.
3. Reap any state dirs under `~/.bot-bottle/state/` whose
   `compose_project` no longer appears in `compose ls`.

Strays from pre-compose code-paths can be mopped up by keeping the
existing prefix scan as a fallback for one release.

## Open questions

1. **`docker compose` vs `docker-compose` v1.** Compose v2 ships
   with Docker Desktop as `docker compose` (subcommand) and is what
   `tea pr create` users will already have. Assume v2; if v1 is
   detected, die with a pointer to upgrade.

2. **How does `claude` reach the agent's TTY?** Decided: keep
   today's `docker exec -it` model. Agent runs `sleep infinity`
   under compose; `DockerBottle.exec_agent` runs
   `docker exec -it bot-bottle-<slug> claude ...` exactly like
   today. Compose owns the lifecycle (so `compose logs` includes
   the agent's stdout, `compose down` tears it down), but the
   user-facing exec model is unchanged. Rejected `docker attach`
   because its default Ctrl-P-Ctrl-Q detach intercept buffers
   keypresses Claude Code uses; rejected "agent outside compose"
   because it gives up the unified `compose logs` view that
   motivated the PRD.

3. ~~TTY allocation under compose.~~ Resolved by #2: no `tty:` /
   `stdin_open:` on the agent service — interactivity is per-exec.

4. **`docker compose logs` ordering.** The dumped log file
   interleaves services by timestamp. Confirm `--timestamps` is
   enough to keep it readable; otherwise consider per-service
   subfiles (`compose.log.pipelock`, etc.).

5. **Image build caching.** `build:` in compose rebuilds on first
   `up` unless the image is already tagged. The per-sidecar images
   (`bot-bottle-pipelock`, `bot-bottle-egress`,
   `bot-bottle-git-gate`, `bot-bottle-supervise`) should
   stay tagged on the daemon between runs so we don't rebuild on
   every start. Verify compose's behavior matches.

6. **`docker compose down --volumes` and bind-mount data.** `down
   --volumes` removes named volumes but leaves bind-mount source
   paths alone (they're host paths under our state dir, which we
   manage explicitly). Confirm — and if there's a footgun, drop
   `--volumes` and rely on the state-dir cleanup step.

7. **Dashboard discovery.** `cli/dashboard.py` enumerates instances
   by scanning containers. Should it switch to `docker compose ls`
   too, or read `metadata.json` files under `state/`? Reading state
   dirs is faster and survives docker daemon restarts; compose ls
   is the truth about what's actually running. Probably both: list
   from state dirs, mark "running" by cross-referencing compose
   ls.

## Implementation chunks

Sized for one PR each, in order.

1. **Compose renderer.** Pure function:
   `bottle_plan_to_compose(plan) -> dict`. No I/O. Full unit-test
   coverage for the conditional-service matrix (every combination
   of git on/off, egress on/off, supervise on/off). No `start.py`
   changes yet.
2. **Stage-file move to host paths.** Refactor each sidecar's
   stage-file production (today: write to host stage dir → `docker
   cp` after create) to write directly into `state/<slug>/`
   sub-trees with bind-mount-ready perms. SDK path still does
   `docker cp`; this is a no-op rearrangement that sets up chunk 3.
3. **Switch `start.py` to compose.** Wire up the renderer +
   `docker compose up -d` + attach + teardown. Per-sidecar `start()`/
   `stop()` lifecycle methods deleted in the same chunk. Compose-
   log dump on teardown added.
4. **Cleanup CLI on compose.** Switch `./cli.py cleanup` to
   `docker compose ls`-based discovery; keep prefix-scan as
   fallback for one release.
5. **Dashboard.** Decide on the discovery question (open question
   #7), implement.

## References

- PRD 0003 — bottle backend abstraction (what stays / what
  changes underneath it)
- PRD 0010 / 0017 — cred-proxy → egress; the sidecar lifecycle
  this PRD collapses into compose
- PRD 0014 / 0015 / 0016 — apply flows that bind-mount-+-SIGHUP
  has to keep working without protocol change