ci(prd): assign sequential numbers to new PRDs
This commit is contained in:
@@ -0,0 +1,159 @@
|
||||
# PRD 0060: Commit bottle state to an image
|
||||
|
||||
- **Status:** Active
|
||||
- **Author:** Claude
|
||||
- **Created:** 2026-06-20
|
||||
- **Issue:** #194
|
||||
|
||||
## Summary
|
||||
|
||||
Add a `commit` CLI command that freezes a running bottle's state to a
|
||||
resumable local artifact. Docker bottles are stored as Docker images;
|
||||
smolmachines bottles are stored as `.smolmachine` artifacts. Operators
|
||||
can then resume the bottle from that exact filesystem snapshot, or
|
||||
export the artifact to migrate work to a different host.
|
||||
|
||||
## Problem
|
||||
|
||||
When a long-running agent session is interrupted — by a host reboot, a
|
||||
network failure, or a planned infrastructure migration — the in-progress
|
||||
container state is lost. `cli.py resume` rebuilds the agent image from
|
||||
the Dockerfile and reprovi-sions the bottle, but that returns the guest
|
||||
to its initial state, not to wherever the agent was mid-task.
|
||||
|
||||
There is no mechanism today to capture "what's installed / configured
|
||||
inside the running container right now" and make it reproducible. The
|
||||
`capability-block` flow writes a new Dockerfile and marks the bottle for
|
||||
resume, but that only applies when the agent itself has requested a
|
||||
capability change; it doesn't help the operator who wants to take a
|
||||
snapshot before a planned host reboot or hardware migration.
|
||||
|
||||
## Goals / Success Criteria
|
||||
|
||||
- `./cli.py commit [<slug>]` takes a snapshot of the running agent and
|
||||
stores it as a local artifact.
|
||||
- Without a slug argument the command shows the same interactive picker
|
||||
as `start` (the list of active slugs).
|
||||
- The committed artifact reference is stored in per-bottle state so
|
||||
that the next `./cli.py resume <slug>` automatically uses the
|
||||
snapshot instead of rebuilding from the Dockerfile.
|
||||
- `mark_preserved` is called so the state dir survives the normal
|
||||
session-end cleanup.
|
||||
- A backend-specific export hint is printed so operators know how to
|
||||
migrate the snapshot.
|
||||
- The command errors clearly on unsupported backends.
|
||||
|
||||
## Non-goals
|
||||
|
||||
- macOS-container backend support.
|
||||
- Automatic commit on agent exit.
|
||||
- Image push to a remote registry.
|
||||
- Storing the image tag in the manifest or sharing it between operators.
|
||||
|
||||
## Design
|
||||
|
||||
### Docker image tag
|
||||
|
||||
`bot-bottle-committed-<slug>:latest` — namespaced under `bot-bottle-`
|
||||
to match existing image naming conventions; `committed` distinguishes it
|
||||
from the build-time image (`bot-bottle-claude:latest`) and the
|
||||
capability-block rebuild image (`bot-bottle-rebuilt-<identity>:latest`).
|
||||
|
||||
### State storage
|
||||
|
||||
A new plain-text file `committed-image` is added to the per-bottle state
|
||||
directory:
|
||||
|
||||
```
|
||||
~/.bot-bottle/state/<identity>/
|
||||
metadata.json
|
||||
Dockerfile (capability-block override; optional)
|
||||
committed-image (committed artifact reference; optional)
|
||||
transcript/
|
||||
```
|
||||
|
||||
`bottle_state.committed_image_path(identity)` returns the path.
|
||||
`write_committed_image` / `read_committed_image` are the read/write
|
||||
helpers, matching the existing `per_bottle_dockerfile` pattern. Docker
|
||||
stores a Docker tag in this file; smolmachines stores the absolute path
|
||||
to the committed `.smolmachine` artifact.
|
||||
|
||||
### `commit` command
|
||||
|
||||
```
|
||||
./cli.py commit [<slug>]
|
||||
```
|
||||
|
||||
1. Resolve slug (arg or interactive picker from `enumerate_active_agents`).
|
||||
2. Check metadata and branch by backend.
|
||||
3. For Docker, derive container name `bot-bottle-<slug>` and run
|
||||
`docker commit <container> bot-bottle-committed-<slug>:latest`.
|
||||
4. For smolmachines, derive machine name `bot-bottle-<slug>` and run
|
||||
`smolvm pack create --from-vm <machine> -o ~/.bot-bottle/state/<slug>/committed-smolmachine`.
|
||||
5. Write the Docker image tag or smolmachine artifact path to
|
||||
`~/.bot-bottle/state/<slug>/committed-image`.
|
||||
6. Call `mark_preserved(<slug>)` so the state dir survives session-end.
|
||||
7. Print the resume hint and a backend-specific export example.
|
||||
|
||||
### Resume from committed image
|
||||
|
||||
`bot_bottle/backend/docker/launch.py` already rebuilds the agent image
|
||||
at the top of the `launch` context manager. The change is a check
|
||||
immediately before that step:
|
||||
|
||||
```python
|
||||
committed = read_committed_image(plan.slug)
|
||||
if committed and docker_mod.image_exists(committed):
|
||||
info(f"using committed image {committed!r}")
|
||||
plan = dataclasses.replace(
|
||||
plan,
|
||||
agent_provision=dataclasses.replace(plan.agent_provision, image=committed),
|
||||
)
|
||||
else:
|
||||
docker_mod.build_image(plan.image, _REPO_DIR, dockerfile=plan.dockerfile_path)
|
||||
```
|
||||
|
||||
Replacing `agent_provision.image` propagates to `plan.image` (a
|
||||
property) and from there to the Compose spec renderer's `_agent_service`
|
||||
→ `image:` field, so the container boots from the committed snapshot.
|
||||
The build step is skipped entirely when a committed image is found and
|
||||
exists locally.
|
||||
|
||||
If the committed image has been deleted from the local daemon (e.g.
|
||||
after `docker rmi` or a `docker system prune`), the launch falls back
|
||||
to a normal Dockerfile build, matching the pre-commit behavior.
|
||||
|
||||
### Resume from committed smolmachine
|
||||
|
||||
`bot_bottle/backend/smolmachines/launch.py` checks the committed
|
||||
reference before the normal Docker build -> pack cache path:
|
||||
|
||||
```python
|
||||
committed = read_committed_image(plan.slug)
|
||||
if committed and Path(committed).is_file():
|
||||
return Path(committed)
|
||||
return _ensure_smolmachine(plan.agent_image, dockerfile=plan.agent_dockerfile_path)
|
||||
```
|
||||
|
||||
The returned path is passed to `smolvm machine create --from`, so the
|
||||
resumed VM boots from the committed snapshot. If the artifact has been
|
||||
deleted, launch falls back to the normal build and pack flow.
|
||||
|
||||
## Testing strategy
|
||||
|
||||
- Unit tests for `write_committed_image` / `read_committed_image` in
|
||||
`tests/unit/test_bottle_state.py`, using the existing `_FakeHomeMixin`
|
||||
pattern.
|
||||
- Unit tests for `commit_container` in `tests/unit/test_docker_util_image.py`,
|
||||
mocking `subprocess.run` and asserting on the `docker commit` argv.
|
||||
- Unit tests for `cmd_commit` argument parsing, Docker commit,
|
||||
smolmachines pack, and the unsupported backend error path, mocking
|
||||
`enumerate_active_agents`, `commit_container`, and
|
||||
`pack_create_from_vm`.
|
||||
- Unit tests for the launch-step committed-image branch: patch
|
||||
`read_committed_image` to return a tag, patch `image_exists` to return
|
||||
True, and assert that `build_image` is not called and `plan.image` is
|
||||
overridden.
|
||||
- Unit tests for the smolmachines launch-step committed-artifact branch:
|
||||
patch `read_committed_image` to return an existing path and assert the
|
||||
normal `_ensure_smolmachine` path is skipped.
|
||||
Reference in New Issue
Block a user