feat(state): preserve on crash + always snapshot transcript
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m31s

Extends the preserve-on-capability-block design to also preserve
state on agent crash, and snapshots the transcript on every
teardown so any resume (crash or capability-block) gets a warm
claude session — not a cold start.

- capability_apply: rename _snapshot_transcript → snapshot_transcript
  (public; reused below). No behavior change in the capability path.
- cli/start.py: capture bottle.exec_claude's exit code; while the
  container is still alive (inside the launch context):
    * always snapshot_transcript(identity)
    * if exit_code != 0, mark_preserved(identity)
  Then the existing _settle_state runs after teardown.

Now the preservation matrix is:

  exit 0   (clean)          → snapshot + cleanup state
  exit ≠0  (crash, Ctrl-C)  → snapshot + preserve + show resume hint
  capability-block          → (already snapshotted/preserved by apply
                               before teardown; this path is a no-op
                               because the container is already gone
                               by the time exec_claude returns)

snapshot_transcript is best-effort — capability-block's earlier
snapshot is not clobbered when the container is already torn down,
and a missing /home/node/.claude is a warn + skip.

Tested behavior: clean exit doesn't preserve, non-zero exit
(including SIGINT/130 and SIGKILL/137) preserves; empty identity
no-ops both helpers.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-25 07:05:23 -04:00
parent fb2b5844c4
commit ef5d2f9a4d
4 changed files with 144 additions and 18 deletions
@@ -115,7 +115,7 @@ def apply_capability_change(slug: str, new_dockerfile: str) -> tuple[str, str]:
raise CapabilityApplyError("proposed Dockerfile is empty")
before = fetch_current_dockerfile(slug)
_snapshot_transcript(slug)
snapshot_transcript(slug)
_push_working_tree(slug)
write_per_bottle_dockerfile(slug, new_dockerfile)
# Set the preserve marker BEFORE teardown so cli.py's session-end
@@ -139,12 +139,19 @@ def _repo_dockerfile_path() -> Path:
return Path(__file__).resolve().parent.parent.parent.parent / "Dockerfile"
def _snapshot_transcript(slug: str) -> None:
def snapshot_transcript(slug: str) -> None:
"""`docker cp` /home/node/.claude out of the agent container into
~/.claude-bottle/state/<slug>/transcript/. Best-effort: missing
container, missing dir, or cp error all log a warning and return.
The transcript is what `claude --resume` reads to pick up where
the agent left off."""
the agent left off.
Called from two places:
- capability-apply, before tearing the bottle down.
- cli.py's session-end path, before the launch context closes,
so a crash or normal exit also leaves a transcript on disk
(deleted along with the state dir on clean exit, kept on
crash or capability-block per the preserve marker)."""
container = _agent_container_name(slug)
dest = transcript_snapshot_dir(slug)
if dest.exists():
@@ -157,11 +164,11 @@ def _snapshot_transcript(slug: str) -> None:
)
if r.returncode != 0:
warn(
f"capability-apply: transcript snapshot skipped "
f"transcript snapshot skipped "
f"({(r.stderr or '').strip() or 'no transcript dir in container?'})"
)
return
info(f"capability-apply: transcript snapshotted to {dest}")
info(f"transcript snapshotted to {dest}")
def _push_working_tree(slug: str) -> None:
@@ -213,4 +220,5 @@ __all__ = [
"CapabilityApplyError",
"apply_capability_change",
"fetch_current_dockerfile",
"snapshot_transcript",
]