feat(state): preserve on crash + always snapshot transcript

Extends the preserve-on-capability-block design to also preserve state on agent crash, and snapshots the transcript on every teardown so any resume (crash or capability-block) gets a warm claude session — not a cold start. - capability_apply: rename _snapshot_transcript → snapshot_transcript (public; reused below). No behavior change in the capability path. - cli/start.py: capture bottle.exec_claude's exit code; while the container is still alive (inside the launch context): * always snapshot_transcript(identity) * if exit_code != 0, mark_preserved(identity) Then the existing _settle_state runs after teardown. Now the preservation matrix is: exit 0 (clean) → snapshot + cleanup state exit ≠0 (crash, Ctrl-C) → snapshot + preserve + show resume hint capability-block → (already snapshotted/preserved by apply before teardown; this path is a no-op because the container is already gone by the time exec_claude returns) snapshot_transcript is best-effort — capability-block's earlier snapshot is not clobbered when the container is already torn down, and a missing /home/node/.claude is a warn + skip. Tested behavior: clean exit doesn't preserve, non-zero exit (including SIGINT/130 and SIGKILL/137) preserves; empty identity no-ops both helpers. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 07:05:23 -04:00
parent fb2b5844c4
commit ef5d2f9a4d
4 changed files with 144 additions and 18 deletions
@@ -115,7 +115,7 @@ def apply_capability_change(slug: str, new_dockerfile: str) -> tuple[str, str]:
        raise CapabilityApplyError("proposed Dockerfile is empty")
    before = fetch_current_dockerfile(slug)

-    _snapshot_transcript(slug)
+    snapshot_transcript(slug)
    _push_working_tree(slug)
    write_per_bottle_dockerfile(slug, new_dockerfile)
    # Set the preserve marker BEFORE teardown so cli.py's session-end
@@ -139,12 +139,19 @@ def _repo_dockerfile_path() -> Path:
    return Path(__file__).resolve().parent.parent.parent.parent / "Dockerfile"


-def _snapshot_transcript(slug: str) -> None:
+def snapshot_transcript(slug: str) -> None:
    """`docker cp` /home/node/.claude out of the agent container into
    ~/.claude-bottle/state/<slug>/transcript/. Best-effort: missing
    container, missing dir, or cp error all log a warning and return.
    The transcript is what `claude --resume` reads to pick up where
-    the agent left off."""
+    the agent left off.
+
+    Called from two places:
+      - capability-apply, before tearing the bottle down.
+      - cli.py's session-end path, before the launch context closes,
+        so a crash or normal exit also leaves a transcript on disk
+        (deleted along with the state dir on clean exit, kept on
+        crash or capability-block per the preserve marker)."""
    container = _agent_container_name(slug)
    dest = transcript_snapshot_dir(slug)
    if dest.exists():
@@ -157,11 +164,11 @@ def _snapshot_transcript(slug: str) -> None:
    )
    if r.returncode != 0:
        warn(
-            f"capability-apply: transcript snapshot skipped "
+            f"transcript snapshot skipped "
            f"({(r.stderr or '').strip() or 'no transcript dir in container?'})"
        )
        return
-    info(f"capability-apply: transcript snapshotted to {dest}")
+    info(f"transcript snapshotted to {dest}")


 def _push_working_tree(slug: str) -> None:
@@ -213,4 +220,5 @@ __all__ = [
    "CapabilityApplyError",
    "apply_capability_change",
    "fetch_current_dockerfile",
+    "snapshot_transcript",
 ]