6aed1bc589
Adds `./cli.py commit [<slug>]` which runs `docker commit` on the active agent container and stores the resulting image tag in per-bottle state. The next `./cli.py resume <slug>` automatically boots from the committed snapshot instead of rebuilding from the Dockerfile, preserving all in-container state across restarts and migrations. - bottle_state: add write_committed_image / read_committed_image helpers - docker/util: add commit_container wrapper around `docker commit` - docker/launch: check for a committed image before the Dockerfile build step; fall back to normal build if the image is absent from the daemon - cli/commit: new command with interactive slug picker; errors clearly on non-Docker backends - 50 new unit tests covering all paths Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
137 lines
5.3 KiB
Markdown
137 lines
5.3 KiB
Markdown
# PRD prd-new: Commit bottle state to an image
|
|
|
|
- **Status:** Draft
|
|
- **Author:** Claude
|
|
- **Created:** 2026-06-20
|
|
- **Issue:** #194
|
|
|
|
## Summary
|
|
|
|
Add a `commit` CLI command that freezes a running Docker bottle's
|
|
container state to a named Docker image. Operators can then resume the
|
|
bottle from that exact filesystem snapshot, or export the image with
|
|
`docker save` to migrate work to a different host.
|
|
|
|
## Problem
|
|
|
|
When a long-running agent session is interrupted — by a host reboot, a
|
|
network failure, or a planned infrastructure migration — the in-progress
|
|
container state is lost. `cli.py resume` rebuilds the agent image from
|
|
the Dockerfile and reprovi-sions the bottle, but that returns the guest
|
|
to its initial state, not to wherever the agent was mid-task.
|
|
|
|
There is no mechanism today to capture "what's installed / configured
|
|
inside the running container right now" and make it reproducible. The
|
|
`capability-block` flow writes a new Dockerfile and marks the bottle for
|
|
resume, but that only applies when the agent itself has requested a
|
|
capability change; it doesn't help the operator who wants to take a
|
|
snapshot before a planned host reboot or hardware migration.
|
|
|
|
## Goals / Success Criteria
|
|
|
|
- `./cli.py commit [<slug>]` takes a snapshot of the running Docker
|
|
agent container and stores it as a local Docker image.
|
|
- Without a slug argument the command shows the same interactive picker
|
|
as `start` (the list of active slugs).
|
|
- The committed image tag is stored in per-bottle state so that the next
|
|
`./cli.py resume <slug>` automatically uses the committed image instead
|
|
of rebuilding from the Dockerfile.
|
|
- `mark_preserved` is called so the state dir survives the normal
|
|
session-end cleanup.
|
|
- A `docker save` hint is printed so operators know how to export the
|
|
image for migration.
|
|
- The command errors clearly on non-Docker backends (smolmachines does
|
|
not expose a container-level commit API in its current CLI surface).
|
|
|
|
## Non-goals
|
|
|
|
- Smolmachines or macOS-container backend support.
|
|
- Automatic commit on agent exit.
|
|
- Image push to a remote registry.
|
|
- Storing the image tag in the manifest or sharing it between operators.
|
|
|
|
## Design
|
|
|
|
### Image tag
|
|
|
|
`bot-bottle-committed-<slug>:latest` — namespaced under `bot-bottle-`
|
|
to match existing image naming conventions; `committed` distinguishes it
|
|
from the build-time image (`bot-bottle-claude:latest`) and the
|
|
capability-block rebuild image (`bot-bottle-rebuilt-<identity>:latest`).
|
|
|
|
### State storage
|
|
|
|
A new plain-text file `committed-image` is added to the per-bottle state
|
|
directory:
|
|
|
|
```
|
|
~/.bot-bottle/state/<identity>/
|
|
metadata.json
|
|
Dockerfile (capability-block override; optional)
|
|
committed-image (committed image tag; optional)
|
|
transcript/
|
|
```
|
|
|
|
`bottle_state.committed_image_path(identity)` returns the path.
|
|
`write_committed_image` / `read_committed_image` are the read/write
|
|
helpers, matching the existing `per_bottle_dockerfile` pattern.
|
|
|
|
### `commit` command
|
|
|
|
```
|
|
./cli.py commit [<slug>]
|
|
```
|
|
|
|
1. Resolve slug (arg or interactive picker from `enumerate_active_agents`).
|
|
2. Check metadata: if `backend` is set and is not `docker`, die with a
|
|
clear "not supported" error.
|
|
3. Derive container name: `bot-bottle-<slug>` (matches the agent
|
|
provision plan's `instance_name` convention).
|
|
4. Run `docker commit <container> bot-bottle-committed-<slug>:latest`.
|
|
5. Write the image tag to `~/.bot-bottle/state/<slug>/committed-image`.
|
|
6. Call `mark_preserved(<slug>)` so the state dir survives session-end.
|
|
7. Print the resume hint and a `docker save` export example.
|
|
|
|
### Resume from committed image
|
|
|
|
`bot_bottle/backend/docker/launch.py` already rebuilds the agent image
|
|
at the top of the `launch` context manager. The change is a check
|
|
immediately before that step:
|
|
|
|
```python
|
|
committed = read_committed_image(plan.slug)
|
|
if committed and docker_mod.image_exists(committed):
|
|
info(f"using committed image {committed!r}")
|
|
plan = dataclasses.replace(
|
|
plan,
|
|
agent_provision=dataclasses.replace(plan.agent_provision, image=committed),
|
|
)
|
|
else:
|
|
docker_mod.build_image(plan.image, _REPO_DIR, dockerfile=plan.dockerfile_path)
|
|
```
|
|
|
|
Replacing `agent_provision.image` propagates to `plan.image` (a
|
|
property) and from there to the Compose spec renderer's `_agent_service`
|
|
→ `image:` field, so the container boots from the committed snapshot.
|
|
The build step is skipped entirely when a committed image is found and
|
|
exists locally.
|
|
|
|
If the committed image has been deleted from the local daemon (e.g.
|
|
after `docker rmi` or a `docker system prune`), the launch falls back
|
|
to a normal Dockerfile build, matching the pre-commit behavior.
|
|
|
|
## Testing strategy
|
|
|
|
- Unit tests for `write_committed_image` / `read_committed_image` in
|
|
`tests/unit/test_bottle_state.py`, using the existing `_FakeHomeMixin`
|
|
pattern.
|
|
- Unit tests for `commit_container` in `tests/unit/test_docker_util_image.py`,
|
|
mocking `subprocess.run` and asserting on the `docker commit` argv.
|
|
- Unit tests for `cmd_commit` argument parsing and the "unsupported
|
|
backend" error path, mocking `enumerate_active_agents` and
|
|
`commit_container`.
|
|
- Unit tests for the launch-step committed-image branch: patch
|
|
`read_committed_image` to return a tag, patch `image_exists` to return
|
|
True, and assert that `build_image` is not called and `plan.image` is
|
|
overridden.
|