aa5aa1f031
Theb9853aestdin=DEVNULL fix wasn't sufficient. End-to-end testing against a live VM in tmux revealed a second crash path: libkrun spits "load \`config.json\`: parse error: trailing garbage { \"ociVersion\": \"1.0.2\", ... }" and the main exec dies (rc=1 or SIGKILL/rc=137, depending on race scheduling). Root cause: each `smolvm machine exec` writes a per-invocation OCI config.json to the same smolvm state dir during its bringup. The wrapper's startup sync() fires within 1ms of Popen-ing the main exec — both invocations write config.json concurrently, libkrun loads one mid-write, and gets garbage. Trivial inner commands (`sh -c "echo hi"`) finished before the overlap mattered, masking the race in earlier tests. claude's slower startup hits the race every time, and only inside tmux because the outside-tmux foreground-handoff path takes a different bringup sequence that happens to dodge the window. Fix: schedule the initial sync on a 2-second `threading.Timer` instead of calling it synchronously. By 2s the main exec is past its bringup window, so the side-channel's config.json write doesn't collide. Daemon thread so the timer doesn't block exit when the child finishes quickly. Trade-off: the in-VM PTY uses smolvm's default size for the first ~2s, then snaps to the host pane size when the timer fires. Verified end-to-end against a live VM in tmux: claude renders at the default size during bringup, then redraws at full pane width once the deferred sync lands. Operator-driven resizes (SIGWINCH) still bridge in real time via the already-installed signal handler. Also drop the diagnostic log added in9c83ea6— we have the fix. Regression test: `TestStartupSyncDeferred.test_main_schedules_timer_does_not_ call_sync_synchronously` mocks Popen + Timer + _push_size and asserts `main()` schedules the timer with the documented delay constant and never invokes _push_size synchronously. Catches a "let's just inline the sync() call" regression immediately. 638 unit tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Tests
Plain-Python test suite using stdlib unittest. No external
dependencies. Unit tests run anywhere Python 3 is present; integration
tests need Docker and skip cleanly otherwise.
Layout
tests/
fixtures.py # JSON manifest builders (shared)
_docker.py # docker-availability skip helper (shared)
unit/
test_pipelock_classify.py
test_pipelock_allowlist.py
test_pipelock_yaml.py
test_manifest_runtime.py
integration/
test_pipelock_sidecar_smoke.py
test_dry_run_plan.py
test_orphan_cleanup.py
canaries/
test_pipelock_image.py # opt-in; see below
Classification falls out of the directory — no hand-maintained list to keep in sync.
Running
python -m unittest discover -t . -s tests/unit -v # unit only
python -m unittest discover -t . -s tests/integration -v # integration only
python -m unittest discover -t . -s tests -v # both (recursive)
python -m unittest tests.unit.test_pipelock_yaml # one file
Discovery is invoked with -t . (top-level dir = repo root) so the
claude_bottle package on sys.path resolves correctly.
What the integration tests cover
test_dry_run_plan.py—cli.py start --dry-run --format=jsonemits a structured plan that contains the resolved egress allowlist and the bottle's runtime, and creates zero Docker resources.test_orphan_cleanup.py—network_removeis idempotent against missing resources, so the EXIT trap can call it unconditionally.test_sidecar_bundle_image.py— builds Dockerfile.sidecars and probes that pipelock / gitleaks / mitmdump / supervise are all reachable inside the bundle.test_sidecar_bundle_compose.py— end-to-end compose-up of an agent + bundle pair; verifies the agent reaches the bundle via the legacy network aliases.
Canaries
tests/canaries/ holds upstream-regression checks (e.g. the pinned
pipelock digest's binary still runs). These are gated on
CLAUDE_BOTTLE_RUN_CANARIES=1 and not part of the per-push suite.
They're invoked by the scheduled canaries workflow.
CLAUDE_BOTTLE_RUN_CANARIES=1 python -m unittest discover -t . -s tests/canaries -v
What's NOT covered
claude_bottle/ssh.pyend-to-end (would need a fake SSH host inside the container).- A live SSH-through-pipelock tunnel against a real Tailscale-style IP.
- DLP false-positive measurements.
- TLS handling / cert pinning behavior.
Adding a test
- Pick the directory:
tests/unit/for a pure unit test,tests/integration/for one that needs Docker. - Filename:
test_<topic>.py. - Boilerplate:
import unittest from claude_bottle.<module> import <symbol> class TestThing(unittest.TestCase): def test_x(self): ... if __name__ == "__main__": unittest.main() - For Docker-dependent tests, decorate the class with
@skip_unless_docker()fromtests._docker.