docs(prd): update PRD 0003 to reflect the shipped design
test / run tests/run_tests.py (pull_request) Successful in 14s

Renames the file and rewrites the body around what actually shipped:
class-based BottleBackend ABC (not a free create_docker_bottle
function), the two-phase prepare/launch split, the backend/docker/
subpackage layout, env.py reshaped into a backend-neutral ResolvedEnv,
and PipelockProxy split between top-level and backend/docker/.
This commit is contained in:
2026-05-11 14:47:17 -04:00
parent 656dc88d76
commit f0b67a3e94
2 changed files with 300 additions and 279 deletions
@@ -0,0 +1,300 @@
# PRD 0003: Bottle Backend abstraction
- **Status:** Draft
- **Author:** didericis
- **Created:** 2026-05-10
## Summary
Introduce a per-backend abstraction that owns the end-to-end lifecycle
of a "bottle" (a running, isolated environment with claude inside).
The first and only implementation lands as `DockerBottleBackend`. No
second backend ships in this PRD.
## Problem
Today, "how to launch a bottle" is spread across roughly six modules
(`claude_bottle/cli/start.py`, `pipelock.py`, `network.py`, `ssh.py`,
`skills.py`, `docker.py`), each shelling out to `docker` directly via
`subprocess.run(["docker", ...])`. That coupling means:
- Adding a second backend (Apple's `container`, fly.io, a remote SSH
host, etc.) requires editing every one of those call sites. The
research note `docs/research/apple-container-backend.md` already
flags this as a prerequisite for that work.
- The pipelock sidecar topology — two networks, multi-attach, sidecar
lifecycle — is a Docker implementation detail that has leaked into
the top-level CLI orchestration. It reads as a core concept of the
project, but a fly.io bottle would not need any of it.
- The manifest carries a Docker-specific `runtime: "runsc"` field
(`bottles[].runtime`). Anyone setting it has to know about gVisor,
whether Docker has it registered, and what to do on macOS where it
isn't available natively. The field has one valid non-default value
and exists only because the current code can't decide on its own.
The shape that fits the project's actual goals (isolated agent runs
across multiple backends) is "one backend per platform," not "one
container-runtime SDK with N drivers." A previous draft of this PRD
considered a low-level runtime-primitive protocol (`run`, `exec`,
`cp`, `network_connect`, ...) and rejected it as the wrong layer —
it would have forced fly.io to pretend it's Docker.
## Goals / Success Criteria
The feature works when all of the following are observable:
- `cli.py start` works identically for an existing manifest with no
user-visible changes other than (a) a startup log line naming the
Docker runtime in use, and (b) `bottles[].runtime` no longer being a
valid manifest field.
- On a Linux host with gVisor registered, the agent container runs
under `runsc` without anything in the manifest requesting it.
- On a host without gVisor (including macOS), the agent container runs
under the default `runc` runtime; nothing fails, no warning is
printed beyond the runtime-name log line.
- The existing test suite passes with no behavior changes other than
the manifest-schema removal of `runtime`.
The feature is **done** when all of the following ship:
- A new `claude_bottle/backend/` package exists with abstract base
classes (`BottleBackend`, `BottlePlan`, `BottleCleanupPlan`,
`Bottle`) plus a `claude_bottle/backend/docker/` subpackage
containing the `DockerBottleBackend` implementation.
- `DockerBottleBackend.launch(plan)` returns a context manager
yielding a `Bottle` handle exposing `exec_claude(argv, *, tty=True)`,
`cp_in(host, ctr)`, and teardown on context exit.
- Every existing `subprocess.run(["docker", ...])` call in
`cli/start.py`, `pipelock.py`, `network.py`, `ssh.py`, and
`skills.py` either moves into `claude_bottle/backend/docker/` or is
called from it. No top-level CLI code references `docker` directly.
- `bottles[].runtime` is removed from the manifest schema, the
dataclass in `manifest.py`, the example manifest, and any README /
docs references. `require_runsc()` in the old top-level
`claude_bottle/docker.py` is deleted.
- A single env var, `CLAUDE_BOTTLE_BACKEND` (default `"docker"`),
selects the backend. Unknown values die at startup with a list of
known backends.
- The y/N preflight in `cli.py` includes the resolved Docker runtime
alongside the allowlist summary.
## Non-goals
- No second backend implementation. There is no
`AppleContainerBottleBackend` / `FlyioBottleBackend` in this PRD.
The registry in `backend/__init__.py` ships with one entry.
- No retries, async, or streaming exec. The current code is
synchronous `subprocess.run`; the `Bottle` handle matches.
- No behavior change beyond the runsc auto-detect. Pipelock topology,
network naming, container naming, image build flow, and SSH
provisioning all stay byte-identical.
- No `--require-runsc` CLI escape hatch. If a user later wants "fail
rather than silently downgrade," that's a follow-up.
- No `bottles[].backend` manifest field. Backend is a property of
the host environment, not the bottle definition (at least for now).
## Scope
### In scope
- New `claude_bottle/backend/` package containing the abstract types
and the registry, plus a `claude_bottle/backend/docker/` subpackage
containing the Docker implementation.
- The `Bottle`, `BottleBackend`, `BottlePlan`, and `BottleCleanupPlan`
abstract base classes; `BottleSpec` data carrier; and
`DockerBottleBackend` implementation.
- Moving Docker-specific subprocess calls into the Docker subpackage.
- Removing `bottles[].runtime` from the manifest schema and the
dataclass.
- Auto-detection of `runsc` registration via `docker info`.
- Preflight integration: the existing y/N output names the resolved
Docker runtime.
- Reshaping `env.py` (formerly `env_resolve.py`) to return a
backend-neutral `ResolvedEnv` (`forwarded` names + `literals` map)
rather than writing docker-shaped files directly. The Docker
backend now owns the `--env-file` / `-e NAME` serialization and the
newline-rejection check.
- Splitting `pipelock.py` into a backend-neutral `PipelockProxy` ABC
(yaml + allowlist resolution) and a `DockerPipelockProxy` subclass
(sidecar start/stop) under the Docker subpackage.
- Test updates: any manifest fixtures referencing `runtime` are
updated; tests that assert on `--runtime=runsc` instead seed the
detection by mocking `docker info`.
### Out of scope
- Apple `container` and fly.io backends (separate PRDs, deferred
until the Docker backend is the only thing shipping).
- Generalizing the pipelock sidecar to other backends. Pipelock
topology is, after this PRD, an implementation detail private to
the Docker backend.
- Rewriting `pipelock.py`'s YAML generation. The allowlist→YAML
translation stays where it is and is called by the Docker backend.
- CLI flags for runtime selection / override.
## Proposed Design
### New services / components
A new package, `claude_bottle/backend/`, with an abstract base layer
and a Docker subpackage:
- **`claude_bottle/backend/__init__.py`** — Defines the abstract base
classes and the backend registry. `BottleSpec` carries the
CLI-supplied intent; the abstract `BottlePlan` and
`BottleCleanupPlan` are the prepared-but-not-launched outputs of
the two `prepare*` phases; `Bottle` is the running-instance handle;
`BottleBackend` is the dispatcher with five methods:
```python
class BottleBackend(ABC):
name: str
def prepare(self, spec: BottleSpec, *, stage_dir: Path) -> BottlePlan: ...
def launch(self, plan: BottlePlan) -> ContextManager[Bottle]: ...
def prepare_cleanup(self) -> BottleCleanupPlan: ...
def cleanup(self, plan: BottleCleanupPlan) -> None: ...
def list_active(self) -> None: ...
```
The `prepare` / `launch` split lets the CLI render the y/N preflight
off the `BottlePlan` *before* any container or network is created.
The same split applies to `cleanup`. `BottleBackend.provision(plan,
target)` orchestrates copying skills / SSH / prompt / `.git` into a
running instance via four abstract sub-methods
(`provision_prompt`, `provision_skills`, `provision_ssh`,
`provision_git`); subclasses implement those four rather than
overriding `provision` itself.
Selection reads `CLAUDE_BOTTLE_BACKEND` (default `"docker"`).
Unknown values call `die()` with the list of known backends:
```python
def get_bottle_backend() -> BottleBackend: ...
```
- **`claude_bottle/backend/docker/`** — Subpackage with the Docker
implementation, split into:
- `backend.py` — `DockerBottleBackend`, owning all five abstract
methods (`prepare`, `launch`, `prepare_cleanup`, `cleanup`,
`list_active`) plus the four `provision_*` sub-methods. Probes
for `runsc` availability (`docker info --format
'{{json .Runtimes}}'`), builds the base image and per-cwd derived
image, creates the per-agent internal and egress networks, brings
up the pipelock sidecar, runs the agent container with
`--runtime=runsc` iff available, copies skills / SSH keys /
prompt / `.git` into the running container, and tears everything
down on context exit.
- `bottle.py` — `DockerBottle`, the running-instance handle yielded
by `launch`.
- `bottle_plan.py` — `DockerBottlePlan`, the prepared-but-not-launched
output of `prepare`. Carries resolved container/network/image
names, scratch paths, and `use_runsc`. Implements `print` for the
y/N preflight.
- `bottle_cleanup_plan.py` — `DockerBottleCleanupPlan`, the analog
for orphan cleanup.
- `network.py` — Docker network helpers (create/destroy, naming).
- `pipelock.py` — `DockerPipelockProxy` (the sidecar start/stop
lifecycle) and Docker-specific naming helpers. The backend-neutral
yaml + allowlist resolution stays in the top-level
`claude_bottle/pipelock.py`.
- `util.py` — Docker-specific helpers (slugify, image/container
existence checks, `runsc_available`).
### Existing code touched
- **`claude_bottle/cli/start.py`** — replace the inline docker
orchestration with `backend = get_bottle_backend(); plan =
backend.prepare(spec, stage_dir=...); with backend.launch(plan) as
bottle: bottle.exec_claude(...)`. The y/N preflight is rendered by
`plan.print(...)`.
- **`claude_bottle/manifest.py`** — drop the `runtime` field from the
Bottle dataclass and its validation. Existing manifests with
`runtime: "runsc"` produce a clear "no longer supported; gVisor is
now auto-detected by the backend; remove the 'runtime' field" error.
- **`claude_bottle/docker.py`** — module deleted. `require_runsc()`,
`slugify()`, `image_exists()`, `container_exists()`, the
`build_image` / `build_image_with_cwd` helpers, and `require_docker`
all migrate into `claude_bottle/backend/docker/util.py` (or
`backend.py`).
- **`claude_bottle/pipelock.py`** — keeps the allowlist resolution and
YAML generation. Becomes a thin abstract class (`PipelockProxy`)
exposing `prepare` (writes the yaml) plus abstract `start` / `stop`
methods. The Docker-specific subclass `DockerPipelockProxy` lives
under `backend/docker/pipelock.py`.
- **`claude_bottle/network.py`** — folds entirely into
`backend/docker/network.py`. No top-level network module remains.
- **`claude_bottle/ssh.py`** and **`claude_bottle/skills.py`** —
absorbed into `DockerBottleBackend` as `provision_ssh` and
`provision_skills`. The host-side file-tree generation stays as
private helpers on the backend class.
- **`claude_bottle/env.py`** (renamed from `env_resolve.py`) —
`resolve_env(manifest, agent) -> ResolvedEnv` returns
`forwarded: list[str]` (names whose values were exported into
`os.environ` for inheritance) and `literals: dict[str, str]` (name
→ verbatim value). The Docker backend translates the result into
`--env-file` content + `-e NAME` argv fragments.
- **`claude_bottle/util.py`** — top-level cross-backend helpers
(`expand_tilde`, `is_ipv4_literal`). Backend-specific helpers live
in their backend's `util.py`.
- **`claude-bottle.example.json`** — remove the `runtime` field from
any example bottle.
- **`README.md`** — note `CLAUDE_BOTTLE_BACKEND` and the runsc
auto-detect; remove any mention of `runtime: "runsc"` as a manifest
field.
### Data model changes
The bottle schema loses one field:
```diff
{
"bottles": {
"default": {
- "runtime": "runsc",
"env": { "...": "..." },
"ssh": [],
"egress": { "allowlist": [...] }
}
}
}
```
Any manifest carrying `runtime` produces a validation error on load
(`"bottle '<name>' has a 'runtime' field, which is no longer
supported. gVisor (runsc) is now auto-detected by the backend;
remove the 'runtime' field from the bottle definition."`).
The agent schema is unchanged.
### External dependencies
None new. This PRD reorganizes existing code; it does not pull in any
new images, binaries, or libraries.
### Behavior the runsc auto-detect introduces
`DockerBottleBackend.prepare` runs `docker info --format
'{{json .Runtimes}}'` exactly once per call. If `runsc` is in the
output, `use_runsc` is set on the `DockerBottlePlan` and the
subsequent `docker run` adds `--runtime=runsc`. Otherwise it runs
without that flag. The choice is logged via the existing `info()`
helper as part of the preflight:
```
docker runtime: runsc (gVisor) # or: runc (default)
```
The y/N preflight (rendered by `DockerBottlePlan.print`) shows the
same line, so users can confirm what they're about to run under
before approving.
## References
- `docs/research/apple-container-backend.md` — original motivation;
prior draft considered a low-level `Backend` protocol and rejected
it as the wrong layer.
- `docs/research/bash-vs-python-vs-go.md` §Recommendation — argues
that the backend abstraction matters independent of language choice.
- PRD 0001 (`docs/prds/0001-per-agent-egress-proxy-via-pipelock.md`)
— defines the pipelock topology that becomes a private
implementation detail of the Docker backend after this PRD ships.
@@ -1,279 +0,0 @@
# PRD 0003: Bottle factory abstraction
- **Status:** Draft
- **Author:** didericis
- **Created:** 2026-05-10
## Summary
Introduce a per-backend factory function that owns the end-to-end
lifecycle of a "bottle" (a running, isolated environment with claude
inside). The first and only implementation lands as
`create_docker_bottle`. No second backend ships in this PRD.
## Problem
Today, "how to launch a bottle" is spread across roughly six modules
(`claude_bottle/cli/start.py`, `pipelock.py`, `network.py`, `ssh.py`,
`skills.py`, `docker.py`), each shelling out to `docker` directly via
`subprocess.run(["docker", ...])`. That coupling means:
- Adding a second backend (Apple's `container`, fly.io, a remote SSH
host, etc.) requires editing every one of those call sites. The
research note `docs/research/apple-container-backend.md` already
flags this as a prerequisite for that work.
- The pipelock sidecar topology — two networks, multi-attach, sidecar
lifecycle — is a Docker implementation detail that has leaked into
the top-level CLI orchestration. It reads as a core concept of the
project, but a fly.io bottle would not need any of it.
- The manifest carries a Docker-specific `runtime: "runsc"` field
(`bottles[].runtime`). Anyone setting it has to know about gVisor,
whether Docker has it registered, and what to do on macOS where it
isn't available natively. The field has one valid non-default value
and exists only because the current code can't decide on its own.
The shape that fits the project's actual goals (isolated agent runs
across multiple backends) is "one factory per backend," not "one
container-runtime SDK with N drivers." A previous draft of this PRD
considered a low-level runtime-primitive protocol (`run`, `exec`,
`cp`, `network_connect`, ...) and rejected it as the wrong layer —
it would have forced fly.io to pretend it's Docker.
## Goals / Success Criteria
The feature works when all of the following are observable:
- `cli.py start` works identically for an existing manifest with no
user-visible changes other than (a) a startup log line naming the
Docker runtime in use, and (b) `bottles[].runtime` no longer being a
valid manifest field.
- On a Linux host with gVisor registered, the agent container runs
under `runsc` without anything in the manifest requesting it.
- On a host without gVisor (including macOS), the agent container runs
under the default `runc` runtime; nothing fails, no warning is
printed beyond the runtime-name log line.
- The existing test suite passes with no behavior changes other than
the manifest-schema removal of `runtime`.
The feature is **done** when all of the following ship:
- A new `claude_bottle/backend/` package exists with
`__init__.py` (factory selection) and `docker.py`
(`create_docker_bottle`).
- `create_docker_bottle` returns a context manager yielding a `Bottle`
handle exposing `exec_claude(argv, *, tty=True)`, `cp_in(host, ctr)`,
and teardown on context exit.
- Every existing `subprocess.run(["docker", ...])` call in
`cli/start.py`, `pipelock.py`, `network.py`, `ssh.py`, and
`skills.py` either moves into `backend/docker.py` or is called from
it. No top-level CLI code references `docker` directly.
- `bottles[].runtime` is removed from the manifest schema, the
dataclass in `manifest.py`, the example manifest, and any README /
docs references. `require_runsc()` in `claude_bottle/docker.py` is
deleted.
- A single env var, `CLAUDE_BOTTLE_BACKEND` (default `"docker"`),
selects the factory. Unknown values die at startup with a list of
known backends.
- The y/N preflight in `cli.py` includes the resolved Docker runtime
alongside the allowlist summary.
## Non-goals
- No second backend implementation. `create_container_bottle` and
`create_flyio_bottle` are not in this PRD. The factory dict in
`backend/__init__.py` ships with one entry.
- No retries, async, or streaming exec. The current code is
synchronous `subprocess.run`; the `Bottle` handle matches.
- No behavior change beyond the runsc auto-detect. Pipelock topology,
network naming, container naming, image build flow, and SSH
provisioning all stay byte-identical.
- No `--require-runsc` CLI escape hatch. If a user later wants "fail
rather than silently downgrade," that's a follow-up.
- No `bottles[].backend` manifest field. Backend is a property of
the host environment, not the bottle definition (at least for now).
## Scope
### In scope
- New `claude_bottle/backend/` package containing `__init__.py` and
`docker.py`.
- The `Bottle` Protocol definition and `create_docker_bottle` factory.
- Moving Docker-specific subprocess calls into the factory.
- Removing `bottles[].runtime` from the manifest schema and the
dataclass.
- Auto-detection of `runsc` registration via `docker info`.
- Preflight integration: the existing y/N output names the resolved
Docker runtime.
- Test updates: any manifest fixtures referencing `runtime` are
updated; tests that assert on `--runtime=runsc` instead seed the
detection by mocking `docker info`.
### Out of scope
- Apple `container` and fly.io factories (separate PRDs, deferred
until the Docker factory is the only thing shipping).
- Generalizing the pipelock sidecar to other backends. Pipelock
topology is, after this PRD, an implementation detail private to
`backend/docker.py`.
- Rewriting `pipelock.py`'s YAML generation. The allowlist→YAML
translation stays where it is and is called by the Docker factory.
- Changes to `env_resolve.py`, `manifest.py` (beyond the `runtime`
removal), or the agent schema.
- CLI flags for runtime selection / override.
## Proposed Design
### New services / components
A new package, `claude_bottle/backend/`:
- **`claude_bottle/backend/__init__.py`** — Defines the `Bottle`
Protocol and `get_bottle_factory()`. The factory registry is a
module-level dict mapping backend name → factory function.
Selection reads `CLAUDE_BOTTLE_BACKEND` (default `"docker"`).
Unknown values call `die()` with the list of known backends.
```python
class Bottle(Protocol):
name: str
def exec_claude(self, argv: list[str], *, tty: bool = True) -> int: ...
def cp_in(self, host_path: str, ctr_path: str) -> None: ...
def close(self) -> None: ...
def get_bottle_factory() -> Callable[..., AbstractContextManager[Bottle]]:
...
```
- **`claude_bottle/backend/docker.py`** — `create_docker_bottle(...)`,
the only factory implementation in this PRD. Owns:
- probing for `runsc` availability (`docker info --format
'{{json .Runtimes}}'`),
- building the base image and the per-cwd derived image,
- creating the per-agent internal and egress networks,
- launching the pipelock sidecar (calls `pipelock.py` for YAML
generation, but the sidecar's `docker create / cp / network
connect / start` sequence moves into this module),
- running the agent container with `--runtime=runsc` iff available,
- copying skills / SSH keys / prompt / `.git` into the running
container,
- tearing everything down (container, sidecar, two networks) on
context exit.
### Existing code touched
- **`claude_bottle/cli/start.py`** — replace the inline docker
orchestration with `with get_bottle_factory()(manifest, ...) as
bottle:` and call `bottle.exec_claude(...)`. The preflight stays
here but is extended to render the resolved Docker runtime alongside
the allowlist summary.
- **`claude_bottle/manifest.py`** — drop the `runtime` field from the
Bottle dataclass and its validation. Existing manifests with
`runtime: "runsc"` should produce a clear "unknown field" error so
users know to remove it.
- **`claude_bottle/docker.py`** — `require_runsc()` deleted.
`require_docker()`, `slugify()`, `image_exists()`,
`container_exists()`, and the `build_image` / `build_image_with_cwd`
helpers stay; they're host-side utilities that the Docker factory
consumes.
- **`claude_bottle/pipelock.py`** — keep all the allowlist resolution
and YAML generation. Remove `pipelock_start` / `pipelock_stop` (or
inline them into `backend/docker.py` — decide during
implementation). Pipelock-the-sidecar becomes a Docker-factory
internal concept.
- **`claude_bottle/network.py`** — same call-sites moved into
`backend/docker.py`. The module either becomes a thin set of pure
name-derivation helpers (`network_name_for_slug`, etc.) or folds
entirely into `backend/docker.py`. Decide during implementation.
- **`claude_bottle/ssh.py`** and **`claude_bottle/skills.py`** — the
`docker cp` and `docker exec` calls move into / are called from
`backend/docker.py`. The host-side file-tree generation stays put.
- **`claude-bottle.example.json`** — remove the `runtime` field from
any example bottle.
- **`README.md`** — note `CLAUDE_BOTTLE_BACKEND` and the runsc
auto-detect; remove any mention of `runtime: "runsc"` as a manifest
field.
### Data model changes
The bottle schema loses one field:
```diff
{
"bottles": {
"default": {
- "runtime": "runsc",
"env": { "...": "..." },
"ssh": [],
"egress": { "allowlist": [...] }
}
}
}
```
Any manifest carrying `runtime` produces a validation error on load
("unknown bottle field 'runtime' — gVisor is now auto-detected;
remove this field").
The agent schema is unchanged.
### External dependencies
None new. This PRD reorganizes existing code; it does not pull in any
new images, binaries, or libraries.
### Behavior the runsc auto-detect introduces
The Docker factory runs `docker info --format '{{json .Runtimes}}'`
exactly once per `create_docker_bottle` call. If `runsc` is in the
output, the subsequent `docker run` adds `--runtime=runsc`. Otherwise
it runs without that flag. The choice is logged via the existing
`info()` helper as part of the preflight:
```
docker runtime: runsc (gVisor) # or: runc (default)
```
The y/N preflight shows the same line, so users can confirm what
they're about to run under before approving.
## Open questions
- **Where the pipelock sidecar lifecycle lives.** Two reasonable
splits: (a) `pipelock.py` keeps `pipelock_start` / `pipelock_stop`
and `backend/docker.py` calls them; (b) the sidecar
`docker create/cp/network connect/start` sequence moves entirely
into `backend/docker.py` and `pipelock.py` shrinks to the YAML +
allowlist helpers. (a) keeps git blame intact and is the smaller
diff; (b) makes pipelock-as-an-implementation-detail more obvious.
Decide during implementation.
- **Whether `bottles/__init__.py` re-exports `create_docker_bottle`.**
Importing `from claude_bottle.bottles import create_docker_bottle`
vs. `from claude_bottle.bottles.docker import create_docker_bottle`.
Doesn't matter for v1 (only the registry consumes it), but worth
picking a convention before a second factory lands.
- **Manifest-error wording when `runtime` is seen.** "Unknown field"
is technically correct but unhelpful. A targeted error message
("runtime: was removed; gVisor is now auto-detected when the Docker
daemon has it registered") is more useful and worth the extra few
lines.
- **Test fixtures.** Some tests mock `docker info` or seed
`--runtime=runsc` expectations. Audit and update as part of the
implementation; not expected to be a large change.
- **Future `--require-runsc` flag.** Not in this PRD; flagged here so
it's findable when the question comes up.
## References
- `docs/research/apple-container-backend.md` — original motivation;
prior draft considered a low-level `Backend` protocol and rejected
it as the wrong layer.
- `docs/research/bash-vs-python-vs-go.md` §Recommendation — argues
that the factory abstraction matters independent of language choice.
- PRD 0001 (`docs/prds/0001-per-agent-egress-proxy-via-pipelock.md`)
— defines the pipelock topology that becomes a private
implementation detail of the Docker factory after this PRD ships.