5.3 KiB
PRD prd-new: Commit bottle state to an image
- Status: Active
- Author: Claude
- Created: 2026-06-20
- Issue: #194
Summary
Add a commit CLI command that freezes a running Docker bottle's
container state to a named Docker image. Operators can then resume the
bottle from that exact filesystem snapshot, or export the image with
docker save to migrate work to a different host.
Problem
When a long-running agent session is interrupted — by a host reboot, a
network failure, or a planned infrastructure migration — the in-progress
container state is lost. cli.py resume rebuilds the agent image from
the Dockerfile and reprovi-sions the bottle, but that returns the guest
to its initial state, not to wherever the agent was mid-task.
There is no mechanism today to capture "what's installed / configured
inside the running container right now" and make it reproducible. The
capability-block flow writes a new Dockerfile and marks the bottle for
resume, but that only applies when the agent itself has requested a
capability change; it doesn't help the operator who wants to take a
snapshot before a planned host reboot or hardware migration.
Goals / Success Criteria
./cli.py commit [<slug>]takes a snapshot of the running Docker agent container and stores it as a local Docker image.- Without a slug argument the command shows the same interactive picker
as
start(the list of active slugs). - The committed image tag is stored in per-bottle state so that the next
./cli.py resume <slug>automatically uses the committed image instead of rebuilding from the Dockerfile. mark_preservedis called so the state dir survives the normal session-end cleanup.- A
docker savehint is printed so operators know how to export the image for migration. - The command errors clearly on non-Docker backends (smolmachines does not expose a container-level commit API in its current CLI surface).
Non-goals
- Smolmachines or macOS-container backend support.
- Automatic commit on agent exit.
- Image push to a remote registry.
- Storing the image tag in the manifest or sharing it between operators.
Design
Image tag
bot-bottle-committed-<slug>:latest — namespaced under bot-bottle-
to match existing image naming conventions; committed distinguishes it
from the build-time image (bot-bottle-claude:latest) and the
capability-block rebuild image (bot-bottle-rebuilt-<identity>:latest).
State storage
A new plain-text file committed-image is added to the per-bottle state
directory:
~/.bot-bottle/state/<identity>/
metadata.json
Dockerfile (capability-block override; optional)
committed-image (committed image tag; optional)
transcript/
bottle_state.committed_image_path(identity) returns the path.
write_committed_image / read_committed_image are the read/write
helpers, matching the existing per_bottle_dockerfile pattern.
commit command
./cli.py commit [<slug>]
- Resolve slug (arg or interactive picker from
enumerate_active_agents). - Check metadata: if
backendis set and is notdocker, die with a clear "not supported" error. - Derive container name:
bot-bottle-<slug>(matches the agent provision plan'sinstance_nameconvention). - Run
docker commit <container> bot-bottle-committed-<slug>:latest. - Write the image tag to
~/.bot-bottle/state/<slug>/committed-image. - Call
mark_preserved(<slug>)so the state dir survives session-end. - Print the resume hint and a
docker saveexport example.
Resume from committed image
bot_bottle/backend/docker/launch.py already rebuilds the agent image
at the top of the launch context manager. The change is a check
immediately before that step:
committed = read_committed_image(plan.slug)
if committed and docker_mod.image_exists(committed):
info(f"using committed image {committed!r}")
plan = dataclasses.replace(
plan,
agent_provision=dataclasses.replace(plan.agent_provision, image=committed),
)
else:
docker_mod.build_image(plan.image, _REPO_DIR, dockerfile=plan.dockerfile_path)
Replacing agent_provision.image propagates to plan.image (a
property) and from there to the Compose spec renderer's _agent_service
→ image: field, so the container boots from the committed snapshot.
The build step is skipped entirely when a committed image is found and
exists locally.
If the committed image has been deleted from the local daemon (e.g.
after docker rmi or a docker system prune), the launch falls back
to a normal Dockerfile build, matching the pre-commit behavior.
Testing strategy
- Unit tests for
write_committed_image/read_committed_imageintests/unit/test_bottle_state.py, using the existing_FakeHomeMixinpattern. - Unit tests for
commit_containerintests/unit/test_docker_util_image.py, mockingsubprocess.runand asserting on thedocker commitargv. - Unit tests for
cmd_commitargument parsing and the "unsupported backend" error path, mockingenumerate_active_agentsandcommit_container. - Unit tests for the launch-step committed-image branch: patch
read_committed_imageto return a tag, patchimage_existsto return True, and assert thatbuild_imageis not called andplan.imageis overridden.