Files
bot-bottle/docs/prds/0060-commit-bottle-state.md
2026-06-23 21:32:54 +00:00

6.3 KiB

PRD 0060: Commit bottle state to an image

  • Status: Active
  • Author: Claude
  • Created: 2026-06-20
  • Issue: #194

Summary

Add a commit CLI command that freezes a running bottle's state to a resumable local artifact. Docker bottles are stored as Docker images; smolmachines bottles are stored as .smolmachine artifacts. Operators can then resume the bottle from that exact filesystem snapshot, or export the artifact to migrate work to a different host.

Problem

When a long-running agent session is interrupted — by a host reboot, a network failure, or a planned infrastructure migration — the in-progress container state is lost. cli.py resume rebuilds the agent image from the Dockerfile and reprovi-sions the bottle, but that returns the guest to its initial state, not to wherever the agent was mid-task.

There is no mechanism today to capture "what's installed / configured inside the running container right now" and make it reproducible. The capability-block flow writes a new Dockerfile and marks the bottle for resume, but that only applies when the agent itself has requested a capability change; it doesn't help the operator who wants to take a snapshot before a planned host reboot or hardware migration.

Goals / Success Criteria

  • ./cli.py commit [<slug>] takes a snapshot of the running agent and stores it as a local artifact.
  • Without a slug argument the command shows the same interactive picker as start (the list of active slugs).
  • The committed artifact reference is stored in per-bottle state so that the next ./cli.py resume <slug> automatically uses the snapshot instead of rebuilding from the Dockerfile.
  • mark_preserved is called so the state dir survives the normal session-end cleanup.
  • A backend-specific export hint is printed so operators know how to migrate the snapshot.
  • The command errors clearly on unsupported backends.

Non-goals

  • macOS-container backend support.
  • Automatic commit on agent exit.
  • Image push to a remote registry.
  • Storing the image tag in the manifest or sharing it between operators.

Design

Docker image tag

bot-bottle-committed-<slug>:latest — namespaced under bot-bottle- to match existing image naming conventions; committed distinguishes it from the build-time image (bot-bottle-claude:latest) and the capability-block rebuild image (bot-bottle-rebuilt-<identity>:latest).

State storage

A new plain-text file committed-image is added to the per-bottle state directory:

~/.bot-bottle/state/<identity>/
    metadata.json
    Dockerfile            (capability-block override; optional)
    committed-image       (committed artifact reference; optional)
    transcript/

bottle_state.committed_image_path(identity) returns the path. write_committed_image / read_committed_image are the read/write helpers, matching the existing per_bottle_dockerfile pattern. Docker stores a Docker tag in this file; smolmachines stores the absolute path to the committed .smolmachine artifact.

commit command

./cli.py commit [<slug>]
  1. Resolve slug (arg or interactive picker from enumerate_active_agents).
  2. Check metadata and branch by backend.
  3. For Docker, derive container name bot-bottle-<slug> and run docker commit <container> bot-bottle-committed-<slug>:latest.
  4. For smolmachines, derive machine name bot-bottle-<slug> and run smolvm pack create --from-vm <machine> -o ~/.bot-bottle/state/<slug>/committed-smolmachine.
  5. Write the Docker image tag or smolmachine artifact path to ~/.bot-bottle/state/<slug>/committed-image.
  6. Call mark_preserved(<slug>) so the state dir survives session-end.
  7. Print the resume hint and a backend-specific export example.

Resume from committed image

bot_bottle/backend/docker/launch.py already rebuilds the agent image at the top of the launch context manager. The change is a check immediately before that step:

committed = read_committed_image(plan.slug)
if committed and docker_mod.image_exists(committed):
    info(f"using committed image {committed!r}")
    plan = dataclasses.replace(
        plan,
        agent_provision=dataclasses.replace(plan.agent_provision, image=committed),
    )
else:
    docker_mod.build_image(plan.image, _REPO_DIR, dockerfile=plan.dockerfile_path)

Replacing agent_provision.image propagates to plan.image (a property) and from there to the Compose spec renderer's _agent_serviceimage: field, so the container boots from the committed snapshot. The build step is skipped entirely when a committed image is found and exists locally.

If the committed image has been deleted from the local daemon (e.g. after docker rmi or a docker system prune), the launch falls back to a normal Dockerfile build, matching the pre-commit behavior.

Resume from committed smolmachine

bot_bottle/backend/smolmachines/launch.py checks the committed reference before the normal Docker build -> pack cache path:

committed = read_committed_image(plan.slug)
if committed and Path(committed).is_file():
    return Path(committed)
return _ensure_smolmachine(plan.agent_image, dockerfile=plan.agent_dockerfile_path)

The returned path is passed to smolvm machine create --from, so the resumed VM boots from the committed snapshot. If the artifact has been deleted, launch falls back to the normal build and pack flow.

Testing strategy

  • Unit tests for write_committed_image / read_committed_image in tests/unit/test_bottle_state.py, using the existing _FakeHomeMixin pattern.
  • Unit tests for commit_container in tests/unit/test_docker_util_image.py, mocking subprocess.run and asserting on the docker commit argv.
  • Unit tests for cmd_commit argument parsing, Docker commit, smolmachines pack, and the unsupported backend error path, mocking enumerate_active_agents, commit_container, and pack_create_from_vm.
  • Unit tests for the launch-step committed-image branch: patch read_committed_image to return a tag, patch image_exists to return True, and assert that build_image is not called and plan.image is overridden.
  • Unit tests for the smolmachines launch-step committed-artifact branch: patch read_committed_image to return an existing path and assert the normal _ensure_smolmachine path is skipped.