PRD: Commit bottle state to an image #240

Merged
didericis merged 17 commits from commit-bottle-state into main 2026-06-23 17:29:45 -04:00
Collaborator

Closes #194.

PRD

Summary

  • Adds ./cli.py commit [<slug>] to snapshot a running Docker bottle's container state via docker commit
  • Stores the committed image tag in per-bottle state (~/.bot-bottle/state/<slug>/committed-image) so the next ./cli.py resume boots from the snapshot instead of rebuilding
  • Falls back to normal Dockerfile build if the committed image is absent from the local daemon
  • Errors clearly on non-Docker backends (smolmachines / macos-container)
  • 50 new unit tests; 1024 total pass
Closes #194. [PRD](https://gitea.dideric.is/didericis/bot-bottle/src/commit/d8071753f1a3ab82ff76088f9ee6d3be7d99a910/docs/prds/prd-new-commit-bottle-state.md) ## Summary - Adds `./cli.py commit [<slug>]` to snapshot a running Docker bottle's container state via `docker commit` - Stores the committed image tag in per-bottle state (`~/.bot-bottle/state/<slug>/committed-image`) so the next `./cli.py resume` boots from the snapshot instead of rebuilding - Falls back to normal Dockerfile build if the committed image is absent from the local daemon - Errors clearly on non-Docker backends (smolmachines / macos-container) - 50 new unit tests; 1024 total pass
didericis force-pushed commit-bottle-state from 99b23db16c to 8eea8f83a2 2026-06-22 20:30:19 -04:00 Compare
didericis force-pushed commit-bottle-state from 8eea8f83a2 to c6e976fa7d 2026-06-22 22:29:36 -04:00 Compare
Owner

Would like to implement this for macos containers and smolmachines as well. I know it's possible for smolmachines at least, not sure about macos.

Would like to implement this for macos containers and smolmachines as well. I know it's possible for smolmachines at least, not sure about macos.
Author
Collaborator

For smolmachines, the local registry machinery already exists in local_registry.py — the implementation path is: docker-commit the running container, push it to the ephemeral registry, and repack via smolvm pack create. That reuses exactly the same docker→smolmachine conversion pipeline launch.py already runs at start time. The main open question is whether smolvm exposes a way to identify which docker container backs a running machine (so we know what to commit), or whether we need to track that mapping ourselves in bottle state.

For macOS containers, it depends on whether Apple's container CLI has a commit subcommand. The util.py pattern is already there (build_image, save, image_exists), so if container commit <name> <tag> works it's a small addition. Worth checking the Apple container release notes — if the subcommand exists, the implementation is straightforward; if not, the export path (container image save → retag → reload) may still work but with a larger snapshot.

Happy to open a follow-up issue tracking both once we know the smolvm container-name mapping answer.

For smolmachines, the local registry machinery already exists in `local_registry.py` — the implementation path is: docker-commit the running container, push it to the ephemeral registry, and repack via `smolvm pack create`. That reuses exactly the same docker→smolmachine conversion pipeline `launch.py` already runs at start time. The main open question is whether `smolvm` exposes a way to identify which docker container backs a running machine (so we know what to commit), or whether we need to track that mapping ourselves in bottle state. For macOS containers, it depends on whether Apple's `container` CLI has a `commit` subcommand. The `util.py` pattern is already there (`build_image`, `save`, `image_exists`), so if `container commit <name> <tag>` works it's a small addition. Worth checking the Apple container release notes — if the subcommand exists, the implementation is straightforward; if not, the export path (`container image save` → retag → reload) may still work but with a larger snapshot. Happy to open a follow-up issue tracking both once we know the smolvm container-name mapping answer.
Owner

@didericis-codex implement the smolmachines backend commit flow

@didericis-codex implement the smolmachines backend commit flow
Collaborator

Implemented the smolmachines backend commit flow in 02b38b63f5.

Summary:

  • ./cli.py commit <slug> now uses smolvm pack create --from-vm for smolmachines bottles and stores the resulting .smolmachine artifact path in per-bottle state.
  • smolmachines resume now prefers that committed artifact when present, falling back to the existing build/pack flow if it is missing.
  • Kept Docker behavior unchanged and updated tests/PRD text.

Verification:

  • .venv/bin/pyright .
  • python3 -m unittest discover -t . -s tests/unit -v
Implemented the smolmachines backend commit flow in 02b38b63f5ee9a65f20b5d5f0f7f983918633594. Summary: - `./cli.py commit <slug>` now uses `smolvm pack create --from-vm` for smolmachines bottles and stores the resulting `.smolmachine` artifact path in per-bottle state. - smolmachines resume now prefers that committed artifact when present, falling back to the existing build/pack flow if it is missing. - Kept Docker behavior unchanged and updated tests/PRD text. Verification: - `.venv/bin/pyright .` - `python3 -m unittest discover -t . -s tests/unit -v`
didericis force-pushed commit-bottle-state from 02b38b63f5 to 16020a6a09 2026-06-23 00:35:21 -04:00 Compare
Owner

There's a catch 22 here: when we try to commit a running bottle, it asks for the bottle to be stopped

> ./cli.py commit
bot-bottle: error: container export 'bot-bottle-test' failed: Error: failed to export container (cause: "invalidState: "container is not stopped"")

But when we stop it, it gets cleaned up.

We should make it so committing does the following:

  1. confirms that we want to stop the running bottle
  2. stops the running bottle
  3. commits it
  4. gives the command to restart it/resume it
There's a catch 22 here: when we try to commit a running bottle, it asks for the bottle to be stopped ``` > ./cli.py commit bot-bottle: error: container export 'bot-bottle-test' failed: Error: failed to export container (cause: "invalidState: "container is not stopped"") ``` But when we stop it, it gets cleaned up. We should make it so committing does the following: 1. confirms that we want to stop the running bottle 2. stops the running bottle 3. commits it 4. gives the command to restart it/resume it
Author
Collaborator

Fixed in 3cd4a7a.

container export requires the container to be stopped. The commit command now:

  1. Checks if the bottle is running (container list --quiet)
  2. Prompts "commit will stop it. Continue? [y/N]"
  3. Runs container stop <name> on confirmation
  4. Commits and prints ./cli.py resume <slug>

Changes: macos_container/util.py gains container_is_running + stop_container; cli/commit.py wires the stop-confirm flow before the export step. Two new unit tests cover the yes/no paths.

Fixed in 3cd4a7a. `container export` requires the container to be stopped. The commit command now: 1. Checks if the bottle is running (`container list --quiet`) 2. Prompts "commit will stop it. Continue? [y/N]" 3. Runs `container stop <name>` on confirmation 4. Commits and prints `./cli.py resume <slug>` Changes: `macos_container/util.py` gains `container_is_running` + `stop_container`; `cli/commit.py` wires the stop-confirm flow before the export step. Two new unit tests cover the yes/no paths.
Owner

@didericis-claude I'd like you to create an Freezer class with a shared between all the backends that's responsible for taking an active Agent and Bottle class and "freezing" them via a commit method. This should follow the same pattern we follow for backends: a generic Freeze class with the shared logic in the base class, and backend specific methods in the backend specific classes.

@didericis-claude I'd like you to create an `Freezer` class with a shared between all the backends that's responsible for taking an active `Agent` and `Bottle` class and "freezing" them via a `commit` method. This should follow the same pattern we follow for backends: a generic `Freeze` class with the shared logic in the base class, and backend specific methods in the backend specific classes.
Author
Collaborator

Done in 8ab2472.

Added a Freezer ABC (backend/freeze.py) following the same pattern as BottleBackend:

  • Freezer.commit(agent: ActiveAgent, bottle: Bottle) — primary entry point for use within a live launch context; calls _freeze, then owns the shared post-commit steps (write_committed_image, mark_preserved, resume hint)
  • Freezer.commit_slug(slug) — convenience entry for cmd_commit when no live Bottle is available; constructs a name-only _NamedBottle stub and delegates to commit
  • get_freezer(backend_name) -> Freezer — factory, analogous to get_bottle_backend
  • CommitCancelled — raised when the user declines a stop prompt

Backend subclasses:

  • backend/docker/freezer.pyDockerFreezer: docker commit
  • backend/macos_container/freezer.pyMacosContainerFreezer: stop-confirm flow + container export
  • backend/smolmachines/freezer.pySmolmachinesFreezer: smolvm pack create --from-vm

cmd_commit no longer branches on backend names — it reads the backend from metadata, calls get_freezer(backend).commit_slug(slug), and catches CommitCancelled.

Done in 8ab2472. Added a `Freezer` ABC (`backend/freeze.py`) following the same pattern as `BottleBackend`: - **`Freezer.commit(agent: ActiveAgent, bottle: Bottle)`** — primary entry point for use within a live launch context; calls `_freeze`, then owns the shared post-commit steps (`write_committed_image`, `mark_preserved`, resume hint) - **`Freezer.commit_slug(slug)`** — convenience entry for `cmd_commit` when no live `Bottle` is available; constructs a name-only `_NamedBottle` stub and delegates to `commit` - **`get_freezer(backend_name) -> Freezer`** — factory, analogous to `get_bottle_backend` - **`CommitCancelled`** — raised when the user declines a stop prompt Backend subclasses: - `backend/docker/freezer.py` — `DockerFreezer`: `docker commit` - `backend/macos_container/freezer.py` — `MacosContainerFreezer`: stop-confirm flow + `container export` - `backend/smolmachines/freezer.py` — `SmolmachinesFreezer`: `smolvm pack create --from-vm` `cmd_commit` no longer branches on backend names — it reads the backend from metadata, calls `get_freezer(backend).commit_slug(slug)`, and catches `CommitCancelled`.
didericis force-pushed commit-bottle-state from 828101412c to e74a5e0219 2026-06-23 16:43:05 -04:00 Compare
didericis added 17 commits 2026-06-23 16:53:46 -04:00
Adds `./cli.py commit [<slug>]` which runs `docker commit` on the
active agent container and stores the resulting image tag in per-bottle
state. The next `./cli.py resume <slug>` automatically boots from the
committed snapshot instead of rebuilding from the Dockerfile, preserving
all in-container state across restarts and migrations.

- bottle_state: add write_committed_image / read_committed_image helpers
- docker/util: add commit_container wrapper around `docker commit`
- docker/launch: check for a committed image before the Dockerfile build
  step; fall back to normal build if the image is absent from the daemon
- cli/commit: new command with interactive slug picker; errors clearly on
  non-Docker backends
- 50 new unit tests covering all paths

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- test_docker_launch_committed_image: replace Manifest.from_json_obj
  (nonexistent) with ManifestIndex.from_json_obj; pass manifest= arg
  to DockerBottlePlan constructor (required by BottlePlan base class)
- test_macos_container_launch: cast SimpleNamespace stubs to their
  expected types (BottleSpec, GitGatePlan, EgressPlan) in _build_plan;
  add str type annotations to fake_build parameter signatures
- test_macos_container_util: add str type annotations to fake_build_image
  parameter signatures
`container export` requires the container to be stopped first. When a
running bottle is detected, prompt the user to confirm, stop the
container, then commit. Adds `container_is_running` and
`stop_container` helpers to the macos-container util.

Addresses #240 (comment)
Adds a Freezer ABC (backend/freeze.py) that encapsulates the
stop-commit-mark-preserved flow for all backends, following the same
pattern as BottleBackend. Each backend gets its own Freezer subclass:

  DockerFreezer           — docker commit
  MacosContainerFreezer   — container export + image rebuild; prompts
                            to stop if the container is running
  SmolmachinesFreezer     — smolvm pack create --from-vm

The base class owns write_committed_image, mark_preserved, and the
resume hint. Subclasses implement _freeze() and optionally override
_export_hint() for migration instructions.

Freezer.commit(agent, bottle) is the primary entry point for use
within a live launch context. Freezer.commit_slug(slug) is a
convenience wrapper for cmd_commit, which no longer branches on
backend names itself.

get_freezer(backend_name) is the factory, analogous to
get_bottle_backend(). CommitCancelled is raised by MacosContainerFreezer
when the user declines the stop prompt; cmd_commit catches it and
returns 0.
Freezer._freeze only ever used bottle.name, which is always
f"bot-bottle-{agent.slug}". Remove the Bottle parameter from
commit() and _freeze(), derive the container name from agent.slug
directly in each subclass, and delete the _NamedBottle stub that
existed solely to paper over this.
container stop was removing the container immediately (due to --rm)
before container export could run. The force_remove_container teardown
callback on the ExitStack already handles cleanup on normal exit, so
--rm was redundant. Without it, the stopped container stays available
for container export to snapshot.
Apple Container removes containers when they stop, making the
stop-then-export flow impossible regardless of the --rm flag.

Replace `container export` (requires stopped container) with
`container exec --user root <name> tar --create ... --file=- --directory=/ .`
streamed to a temp file, then build the committed image from that archive
as before. The bottle stays running after commit, which is better UX.

Drop the stop-confirm prompt from MacosContainerFreezer since we no longer
need to stop the container at all.
- Rename export test to reflect new exec-tar mechanism; update argv
  assertions to match the new `container exec ... tar` command shape
- Change mock stderr from str to bytes (subprocess.PIPE without text=True)
- Add type annotation to capture_freeze closure to satisfy pyright
smolvm pack create --from-vm requires the VM to be stopped. Add
machine_is_running() to smolvm.py (via machine ls --json state field),
and add the same confirm-stop flow to SmolmachinesFreezer that was
originally designed for macos-container: if running, prompt the user,
stop the VM, then pack. Already-stopped VMs are packed directly.
smolvm pack create --from-vm requires the VM to be stopped, and stopping
a smolmachines VM terminates any running interactive session.

Instead, mirror the macos-container approach: exec into the running VM as
root and stream the root filesystem via tar (smolvm machine exec -- tar),
build a Docker image from the archive, push to an ephemeral local registry,
and run smolvm pack create --image to produce the .smolmachine artifact.
The VM stays running throughout the commit.

Remove the stop-confirm prompt and machine_is_running check that were
added in the previous commit — neither is needed when we no longer stop.
The terminal-decoration wrapper script is invoked with sh -lc, which
sources login-shell init files (/etc/profile, ~/.profile) rather than
interactive-shell files (~/.zshrc). smolvm is typically installed via
homebrew whose PATH setup lands in ~/.zprofile or ~/.zshrc — not picked
up by sh -l — so pty_resize.py's Popen(["smolvm", ...]) raises
FileNotFoundError, pty_resize exits non-zero, and the trailing reset-
printf makes sh exit 0. The caller sees "session ended (exit 0)"
immediately with no agent output.

Use sh -c instead. The calling process (./cli.py) inherits the user's
interactive shell PATH where smolvm is present, confirmed by the
provision steps (machine_exec) succeeding before exec_agent is reached.
smolvm machine exec requires stdout to be a pipe, not a regular
file descriptor. Passing stdout=file caused smolvm to return
non-zero with no stderr (the error was silently swallowed or went
to the regular-file fd instead of reaching us).

Switch _snapshot_running_vm to a new _exec_tar_to_file helper that
uses Popen with stdout=PIPE and streams the tar to disk via
shutil.copyfileobj. A background thread drains stderr concurrently
to prevent deadlock when the stderr pipe buffer fills while we are
writing stdout data.
Replace the Popen/stdout=PIPE approach with a write-then-copy
strategy that avoids binary-stdout piping through the smolvm exec
channel entirely:

1. Probe connectivity with `machine_exec(machine, ["true"])` first.
   If this fails while an interactive session is running, the error
   now says "concurrent exec not available" instead of the opaque
   "<no stderr>".

2. Run `tar --create --gzip --file=/var/tmp/.bot-bottle-commit.tar.gz`
   inside the VM via machine_exec (same mechanism used during
   provisioning). tar writes to a file in the VM, not stdout, so
   smolvm never has to transmit binary data over the exec channel.

3. Copy the compressed archive to the host with machine_cp.

4. Dockerfile switches to ADD rootfs.tar.gz / — Docker decompresses
   gzip tarballs automatically.
fix(smolmachines): exclude /tmp+/var/tmp from snapshot; mkdir -p on boot
test / unit (pull_request) Successful in 36s
test / integration (pull_request) Successful in 23s
lint / lint (push) Successful in 1m59s
prd-number / assign-numbers (push) Successful in 1m8s
test / unit (push) Successful in 35s
test / integration (push) Successful in 21s
Update Quality Badges / update-badges (push) Successful in 1m22s
c48c3688b8
On resume from a committed snapshot, smolvm's pack process remaps all
file uids to the host uid (501 on macOS). Files in /tmp that were
created during the session (e.g. /tmp/claude-1000 owned by node=uid
1000) get remapped to 501. Claude Code then refuses to use the temp
directory because it's owned by a different uid.

Two-part fix:
- Exclude ./tmp and ./var/tmp from the tar in _exec_tar_to_file.
  Both directories are ephemeral; a resumed VM should start with clean
  temp directories identical to a fresh VM.
- Add mkdir -p /tmp /var/tmp to _init_vm before chown/chmod, so the
  directories are created if the committed snapshot omitted them.
didericis force-pushed commit-bottle-state from e74a5e0219 to c48c3688b8 2026-06-23 16:53:46 -04:00 Compare
didericis merged commit c48c3688b8 into main 2026-06-23 17:29:45 -04:00
Sign in to join this conversation.