Files
bot-bottle/docs/prds/prd-new-commit-bottle-state.md
T
didericis-claude 6aed1bc589 feat(cli): add commit command to snapshot running bottle state
Adds `./cli.py commit [<slug>]` which runs `docker commit` on the
active agent container and stores the resulting image tag in per-bottle
state. The next `./cli.py resume <slug>` automatically boots from the
committed snapshot instead of rebuilding from the Dockerfile, preserving
all in-container state across restarts and migrations.

- bottle_state: add write_committed_image / read_committed_image helpers
- docker/util: add commit_container wrapper around `docker commit`
- docker/launch: check for a committed image before the Dockerfile build
  step; fall back to normal build if the image is absent from the daemon
- cli/commit: new command with interactive slug picker; errors clearly on
  non-Docker backends
- 50 new unit tests covering all paths

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-23 16:43:01 -04:00

5.3 KiB

PRD prd-new: Commit bottle state to an image

  • Status: Draft
  • Author: Claude
  • Created: 2026-06-20
  • Issue: #194

Summary

Add a commit CLI command that freezes a running Docker bottle's container state to a named Docker image. Operators can then resume the bottle from that exact filesystem snapshot, or export the image with docker save to migrate work to a different host.

Problem

When a long-running agent session is interrupted — by a host reboot, a network failure, or a planned infrastructure migration — the in-progress container state is lost. cli.py resume rebuilds the agent image from the Dockerfile and reprovi-sions the bottle, but that returns the guest to its initial state, not to wherever the agent was mid-task.

There is no mechanism today to capture "what's installed / configured inside the running container right now" and make it reproducible. The capability-block flow writes a new Dockerfile and marks the bottle for resume, but that only applies when the agent itself has requested a capability change; it doesn't help the operator who wants to take a snapshot before a planned host reboot or hardware migration.

Goals / Success Criteria

  • ./cli.py commit [<slug>] takes a snapshot of the running Docker agent container and stores it as a local Docker image.
  • Without a slug argument the command shows the same interactive picker as start (the list of active slugs).
  • The committed image tag is stored in per-bottle state so that the next ./cli.py resume <slug> automatically uses the committed image instead of rebuilding from the Dockerfile.
  • mark_preserved is called so the state dir survives the normal session-end cleanup.
  • A docker save hint is printed so operators know how to export the image for migration.
  • The command errors clearly on non-Docker backends (smolmachines does not expose a container-level commit API in its current CLI surface).

Non-goals

  • Smolmachines or macOS-container backend support.
  • Automatic commit on agent exit.
  • Image push to a remote registry.
  • Storing the image tag in the manifest or sharing it between operators.

Design

Image tag

bot-bottle-committed-<slug>:latest — namespaced under bot-bottle- to match existing image naming conventions; committed distinguishes it from the build-time image (bot-bottle-claude:latest) and the capability-block rebuild image (bot-bottle-rebuilt-<identity>:latest).

State storage

A new plain-text file committed-image is added to the per-bottle state directory:

~/.bot-bottle/state/<identity>/
    metadata.json
    Dockerfile            (capability-block override; optional)
    committed-image       (committed image tag; optional)
    transcript/

bottle_state.committed_image_path(identity) returns the path. write_committed_image / read_committed_image are the read/write helpers, matching the existing per_bottle_dockerfile pattern.

commit command

./cli.py commit [<slug>]
  1. Resolve slug (arg or interactive picker from enumerate_active_agents).
  2. Check metadata: if backend is set and is not docker, die with a clear "not supported" error.
  3. Derive container name: bot-bottle-<slug> (matches the agent provision plan's instance_name convention).
  4. Run docker commit <container> bot-bottle-committed-<slug>:latest.
  5. Write the image tag to ~/.bot-bottle/state/<slug>/committed-image.
  6. Call mark_preserved(<slug>) so the state dir survives session-end.
  7. Print the resume hint and a docker save export example.

Resume from committed image

bot_bottle/backend/docker/launch.py already rebuilds the agent image at the top of the launch context manager. The change is a check immediately before that step:

committed = read_committed_image(plan.slug)
if committed and docker_mod.image_exists(committed):
    info(f"using committed image {committed!r}")
    plan = dataclasses.replace(
        plan,
        agent_provision=dataclasses.replace(plan.agent_provision, image=committed),
    )
else:
    docker_mod.build_image(plan.image, _REPO_DIR, dockerfile=plan.dockerfile_path)

Replacing agent_provision.image propagates to plan.image (a property) and from there to the Compose spec renderer's _agent_serviceimage: field, so the container boots from the committed snapshot. The build step is skipped entirely when a committed image is found and exists locally.

If the committed image has been deleted from the local daemon (e.g. after docker rmi or a docker system prune), the launch falls back to a normal Dockerfile build, matching the pre-commit behavior.

Testing strategy

  • Unit tests for write_committed_image / read_committed_image in tests/unit/test_bottle_state.py, using the existing _FakeHomeMixin pattern.
  • Unit tests for commit_container in tests/unit/test_docker_util_image.py, mocking subprocess.run and asserting on the docker commit argv.
  • Unit tests for cmd_commit argument parsing and the "unsupported backend" error path, mocking enumerate_active_agents and commit_container.
  • Unit tests for the launch-step committed-image branch: patch read_committed_image to return a tag, patch image_exists to return True, and assert that build_image is not called and plan.image is overridden.