didericis/bot-bottle

Fork 0

Files

T

didericis 47c3ba63f8

test / unit (pull_request) Successful in 36s

Details

test / integration (pull_request) Successful in 58s

Details

test / integration (push) Successful in 54s

Details

test / unit (push) Successful in 32s

Details

docs(prd): mark merged PRDs as Active

Flip Status: Draft -> Active for the 23 PRDs whose work has shipped to
main (including 0027, now that PR #95 has merged). Leaves the
terminal-status PRDs unchanged: 0007 and 0010 (Superseded) and 0014
(Retargeted) were replaced, not shipped as-is.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-05-28 22:12:03 -04:00

13 KiB

Raw Blame History

PRD 0006: pipelock native TLS interception

Status: Active
Author: didericis
Created: 2026-05-12

Summary

Turn on pipelock's built-in tls_interception so its DLP / URL / header / MCP scanners fire on the plaintext of HTTPS requests instead of only the outer CONNECT hostname. Pipelock generates a per-bottle ephemeral CA at launch (pipelock tls init); the public cert is installed into the agent container's trust store and the private key dies with the sidecar on teardown. The existing per-agent sidecar topology from PRD 0001 is otherwise unchanged — one container, no addon, no second proxy.

This supersedes the closed PR #8 / branch mitmproxy-tls-interception, which built a mitmproxy + addon chain on the (falsified) premise that pipelock could not MITM. Empirical proof from the impl-time spike: with tls_interception: { enabled: true, ca_cert, ca_key } in the pipelock config, pipelock answered a credential POST over HTTPS with STATUS=403 / body: blocked: request body contains secret: GitHub Token and emitted both scanner:"tls_intercept" and scanner:"body_dlp" events.

Problem

PRD 0001 wired pipelock onto every bottle's egress, but pipelock ran with its default tls_interception.enabled: false. The agent container's only egress route is pipelock, but pipelock only saw CONNECT hostnames and the encrypted bytes inside the tunnel. Pipelock's headline scanners — request body DLP (48 credential patterns), header DLP, URL DLP, subdomain entropy, MCP scanning, response-body scanning — all need plaintext to fire. Against the HTTPS-only hosts in DEFAULT_ALLOWLIST (api.anthropic.com, raw.githubusercontent.com, etc.) they are effectively disabled.

The existing tests/integration/test_pipelock_blocks_secret_post test only fires because it forces the agent to send plain HTTP through pipelock's forward-proxy mode. Real Claude Code traffic uses HTTPS via CONNECT and slips past the scanner.

Goals / Success Criteria

The feature works when all of the following are observable:

A Node / curl request from inside a launched bottle to a CONNECT-bumped HTTPS host (e.g. https://api.anthropic.com/dlp-probe) carrying a pipelock-recognized credential pattern in the body returns 403 from pipelock with the documented blocked: request body contains secret: … body. Pipelock's body_dlp event fires on the decrypted request.
A clean HTTPS GET from inside the bottle to an allowlisted host (e.g. https://raw.githubusercontent.com/...) returns the real upstream response — TLS interception doesn't break legitimate traffic.
The agent's TLS library trusts pipelock's bumped leaf certs (per the bottle's installed CA); no TLS-trust errors.
Claude Code reaches api.anthropic.com end-to-end through the bottle and completes a chat round-trip.

The feature is done when all of the following ship:

pipelock_build_config / pipelock_render_yaml emit a tls_interception block with enabled: true and the per-bottle CA cert/key paths. The defaults (cert_ttl: 24h, cert_cache_size: 10000, passthrough_domains: []) are kept; only enabled and the cert paths are populated.
The prepare step generates a per-bottle CA via pipelock tls init in a one-shot container, writes ca.pem and ca-key.pem to stage_dir. Paths land on the DockerBottlePlan.
DockerPipelockProxy.start mounts the stage dir into the sidecar (read-only) so the running pipelock can read its CA.
BottleBackend.provision_ca (new) copies the CA public cert into the agent at /usr/local/share/ca-certificates/bot-bottle-mitm.crt, runs update-ca-certificates, and sets the NODE_EXTRA_CA_CERTS / SSL_CERT_FILE / REQUESTS_CA_BUNDLE env trio on the agent container's runtime env. Default no-op on the abstract base so other backends aren't forced to implement.
The launch step prints a one-line stderr log with the SHA-256 fingerprint of the public CA cert (computed via stdlib ssl.PEM_cert_to_DER_cert + hashlib.sha256).
On bottle teardown the sidecar is removed and the CA private key is gone with it.
Two new integration tests under tests/integration/:
- HTTPS variant of the credential-post block test (proves the tls_intercept + body_dlp chain fires end-to-end).
- Clean HTTPS GET test (proves the allow path doesn't break TLS trust and returns real upstream content).
The dry-run preflight (start --dry-run) renders the new TLS layer. Text: one line under the egress summary. JSON: a reserved egress.tls_interception: { enabled: true, ca_fingerprint: null } block — fingerprint is null at dry-run because the CA only exists after launch.

Non-goals

A second proxy in the chain. Pipelock does the bumping natively; the mitmproxy approach was based on a wrong premise (closed PR #8).
Per-bottle override to disable interception. v1 always enables tls_interception. The pipelock-side passthrough_domains list is the right knob if a future allowlisted host turns out to pin certs — exposing it through the manifest is a follow-up.
A long-lived / shared CA across bottles. Each bottle gets a fresh CA generated by pipelock tls init and destroyed with the sidecar.
Tuning cert_ttl, cert_cache_size, max_response_bytes, cross_request_detection, or other pipelock advanced features. Defaults from pipelock generate config --preset strict are fine for v1.
Trust-store paths for non-Debian agent images. node:22-slim is Debian; update-ca-certificates is the right command. A Red-Hat-family base would need update-ca-trust.
HTTP/3 / QUIC. Pipelock's interception is HTTP/HTTPS-over-TLS; UDP/443 still needs an iptables layer (separate PRD).

Scope

In scope

bot_bottle/pipelock.py changes:
- Extend pipelock_build_config to include tls_interception: { enabled: true, ca_cert: <path>, ca_key: <path> }. Paths are populated from the plan; the function's signature grows a cert_path / key_path pair or reads them off Bottle once they're stored.
- Extend pipelock_render_yaml to emit the new block.
bot_bottle/backend/docker/pipelock.py changes:
- New helper pipelock_tls_init(stage_dir) runs the upstream image as a one-shot: docker run --rm -v <stage>:/h -e PIPELOCK_HOME=/h pipelock tls init, leaving ca.pem and ca-key.pem under stage_dir. The host file owner is whatever the upstream image's user is; the sidecar mount is read-only so this is fine.
- DockerPipelockProxy.start docker cps the CA cert + key into the sidecar at /etc/pipelock/ca.pem and /etc/pipelock/ca-key.pem between docker create and docker start, mirroring the existing pattern for the YAML config. If pipelock's image runs as non-root, a docker exec -u 0 chown pipelock:pipelock /etc/pipelock/ca*.pem lands between the cp and the start.
bot_bottle/backend/__init__.py: new abstract method provision_ca(plan, target) on BottleBackend, default no-op. BottleBackend.provision orchestrates ca → prompt → skills → ssh → git.
bot_bottle/backend/docker/provision/ca.py (new):
- Reads the cert from stage_dir (already written by prepare).
- docker cp into the agent.
- docker exec -u 0 ... chmod 644 ... + update-ca-certificates.
- Computes the SHA-256 fingerprint with stdlib (ssl + hashlib), emits one stderr log line.
bot_bottle/backend/docker/launch.py:
- Three new -e flags on the agent's docker run: NODE_EXTRA_CA_CERTS=/usr/local/share/ca-certificates/bot-bottle-mitm.crt, SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt, REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt.
- HTTPS_PROXY / HTTP_PROXY continue to point at pipelock (unchanged from PRD 0001 — the mitmproxy detour in PR #8 is abandoned).
bot_bottle/backend/docker/bottle_plan.py:
- One new info(...) line in print() noting TLS interception is on.
- to_dict() gains an egress.tls_interception: { enabled: true, ca_fingerprint: null } block. Reserved for future population.
bot_bottle/backend/docker/prepare.py: call pipelock_tls_init(stage_dir) and write the resolved cert/key paths onto the plan (either on the existing proxy_plan field or on the parent DockerBottlePlan).
Tests:
- tests/integration/test_pipelock_blocks_secret_https_post.py (new) — HTTPS variant of the existing block test.
- tests/integration/test_pipelock_allows_normal_https.py (new) — clean HTTPS GET succeeds.
- tests/unit/test_pipelock_yaml.py updated to assert the new tls_interception block in the rendered config.
- tests/integration/test_dry_run_plan.py updated to assert the new egress.tls_interception JSON block.

Out of scope

Modifying pipelock itself. We're using existing config knobs.
A manifest field to disable / customize interception per bottle. Doable but premature.
Wiring passthrough_domains. The default [] is correct for v1; add the manifest field when a pinning host shows up. The shape is pre-recorded so the follow-up is mechanical: bottle.egress.tls_passthrough_domains: [host, ...], mirroring the existing egress.allowlist.
cross_request_detection, entropy_budget, fragment_reassembly, reverse_proxy, scan_api — features pipelock exposes but we don't need for the body-DLP gap.

Proposed Design

Topology

agent --HTTPS_PROXY--> pipelock --[bumps TLS]--> internet
                       (sees plaintext: URL, headers, body)

Same single-sidecar shape as PRD 0001. The only addition is tls_interception in pipelock's config plus the per-bottle CA generated at prepare time.

CA lifecycle

Generation. Host-side, at prepare time, via a one-shot docker run --rm -v <stage>:/h -e PIPELOCK_HOME=/h pipelock tls init. Output: <stage>/ca.pem + <stage>/ca-key.pem, mode 600.
Sidecar install. DockerPipelockProxy.start docker cps the CA cert + key into the sidecar at /etc/pipelock/ca.pem and /etc/pipelock/ca-key.pem between docker create and docker start. Same pattern the proxy already uses for the YAML config — no bind-mount, no UID/permission concern from the one-shot generation step. The rendered YAML references the in-container paths.
Bottle install. provision_ca (Docker impl) does docker cp <stage>/ca.pem agent:/usr/local/share/ca-certificates/bot-bottle-mitm.crt, then update-ca-certificates. The CA env trio is set at docker run -e time (Docker propagates run-time env into docker exec).
Per-bottle ephemerality. Enforced by regenerating per launch, not by validity windows. Pipelock's defaults (cert_ttl: 24h for leaves, --validity 87600h for the CA) are fine — the CA lives only as long as the sidecar, which is the bottle's lifetime.
Teardown. Sidecar removed via ExitStack callback, then the launch context manager's outer finally shutil.rmtrees stage_dir. CA dies with both, in that order, so the sidecar is never reading a deleted mount on shutdown.
Fingerprint. Computed via stdlib in provision_ca and logged once to stderr (bot-bottle: mitm ca fingerprint: sha256:<hex>…). The private key never appears in any log.

Data model changes

None to the manifest schema. The dry-run JSON contract grows a reserved egress.tls_interception block; the fingerprint is always null at dry-run because the CA doesn't exist yet.

Existing code touched

Surgical, all on the existing pipelock path:

bot_bottle/pipelock.py — config builder + YAML renderer.
bot_bottle/backend/__init__.py — abstract provision_ca.
bot_bottle/backend/docker/pipelock.py — tls init helper, sidecar volume mount.
bot_bottle/backend/docker/prepare.py — CA paths on plan.
bot_bottle/backend/docker/launch.py — CA env trio on agent.
bot_bottle/backend/docker/backend.py — provision_ca dispatch + thread self._proxy through prepare/launch unchanged shape.
bot_bottle/backend/docker/bottle_plan.py — preflight rendering.
bot_bottle/backend/docker/provision/ca.py (new).

Net diff is meaningfully smaller than PR #8 because pipelock already does the work — no addon, no second sidecar, no second backend module.

External dependencies

Pipelock image — unchanged pin from PRD 0001 (ghcr.io/luckypipewrench/pipelock@sha256:3b1a3941…, matching pipelock v2.3.0). No new image dependency.
No host-side crypto deps. CA generation uses the pipelock image's own tls init command in a one-shot container. Fingerprint uses Python stdlib ssl + hashlib.

References

docs/research/pipelock-assessment.md (now corrected) — pipelock capability assessment including the tls_interception block.
docs/prds/0001-per-agent-egress-proxy-via-pipelock.md — egress-proxy baseline this PRD extends.
docs/prds/0003-bottle-backend-abstraction.md — backend ABC contract this PRD adds a provision_ca method to.
docs/prds/0004-split-out-provisioners.md — per-provisioner module pattern reused for the new CA provisioner.
Pipelock tls CLI (in-image help): pipelock tls init / install-ca / show-ca.
Closed PR #8 — earlier mitmproxy-based design built on the falsified "pipelock can't MITM" premise; archived for context.

13 KiB Raw Blame History

PRD 0006: pipelock native TLS interception

Summary

Problem

Goals / Success Criteria

Non-goals

Scope

In scope

Out of scope

Proposed Design

Topology

CA lifecycle

Data model changes

Existing code touched

External dependencies

References

13 KiB

Raw Blame History