From fe712490053956655351101195114652ff8fa7f1 Mon Sep 17 00:00:00 2001 From: didericis Date: Tue, 12 May 2026 12:20:24 -0400 Subject: [PATCH] docs(prd): add 0005 mitmproxy TLS interception Captures the design for putting a mitmproxy sidecar in front of pipelock on the egress path so pipelock's body / header / MCP scanners see plaintext for the HTTPS hosts in the default allowlist. Implements Topology A from docs/research/tls-mitm-for-pipelock.md with a per-bottle ephemeral CA, no manifest schema change in v1, and selective-bumping deferred until a pinning host appears. Co-Authored-By: Claude Opus 4.7 --- docs/prds/0005-mitmproxy-tls-interception.md | 371 +++++++++++++++++++ 1 file changed, 371 insertions(+) create mode 100644 docs/prds/0005-mitmproxy-tls-interception.md diff --git a/docs/prds/0005-mitmproxy-tls-interception.md b/docs/prds/0005-mitmproxy-tls-interception.md new file mode 100644 index 0000000..e0f3d95 --- /dev/null +++ b/docs/prds/0005-mitmproxy-tls-interception.md @@ -0,0 +1,371 @@ +# PRD 0005: mitmproxy TLS interception for pipelock content scanning + +- **Status:** Draft +- **Author:** didericis +- **Created:** 2026-05-12 + +## Summary + +Add a per-bottle **mitmproxy** sidecar in front of pipelock on the +egress path so pipelock's DLP, subdomain-entropy, and MCP scanners +fire on the plaintext bodies of HTTPS requests instead of only the +opaque ciphertext that follows a `CONNECT`. mitmproxy terminates the +agent's TLS, hands plaintext HTTP to pipelock as an upstream +forward proxy, and re-establishes TLS to the real destination. A +fresh ephemeral CA is minted per bottle; the CA private key never +leaves the sidecar, and the public cert is wired into the agent +container's trust store at launch. + +## Problem + +PRD 0001 wired pipelock onto every bottle's egress, but the current +topology only sees `CONNECT` hostnames and opaque TLS bytes: + +``` +agent --HTTPS_PROXY--> pipelock --CONNECT host:443--> internet + \____________________________ + opaque TLS bytes +``` + +What pipelock cannot scan in this mode is documented in +`docs/research/tls-mitm-for-pipelock.md` §What pipelock cannot see +today: request URLs and methods, request and response headers, +request and response bodies, MCP JSON-RPC payloads, inner-vs-outer +hostname (the domain-fronting check), and WebSocket frames inside a +TLS-wrapped upgrade. The 48-pattern DLP layer this project relies on +in PRD 0001 is therefore inert against every host in the current +`DEFAULT_ALLOWLIST` — all of which are HTTPS-only. + +The integration test added in `tests/integration/test_pipelock_blocks_secret_post.py` +demonstrates the gap concretely: pipelock's body-scan layer only +fires when the agent is forced to send plain HTTP. Real Claude Code +traffic to `api.anthropic.com` goes over CONNECT-tunneled TLS and +slips past the scanner. + +`pipelock-assessment.md` §Scope gaps names this as a known +limitation of the proxy-without-TLS-inspection shape. Closing it is +the explicit motivation for `tls-mitm-for-pipelock.md`, whose +recommendation this PRD implements. + +## Goals / Success Criteria + +The feature works when all of the following are observable: + +- A Node request from inside a launched bottle to a CONNECT-bumped + HTTPS host (e.g. `https://api.anthropic.com/dlp-probe`) carrying a + pipelock-recognized credential pattern in the body returns 403 from + the proxy, not a response from the upstream. The existing + `test_pipelock_blocks_secret_post` test path becomes the HTTPS + variant of this assertion. +- Claude Code itself reaches `api.anthropic.com` end-to-end through + the bottle and completes a chat round-trip. No TLS-trust errors + in the agent process. +- mitmproxy's TLS-handshake log lines and pipelock's `body_dlp` + event lines both appear for the same outbound request, confirming + the two-stage path is active. + +The feature is **done** when all of the following ship: + +- A new `MitmproxyProxy` class with the same `prepare` / `start` / + `stop` lifecycle shape as `PipelockProxy`, wired into the Docker + backend's launch step. +- The bottle launch step generates a per-bottle ephemeral CA in + `stage_dir`, starts the mitmproxy sidecar with that CA on the + per-bottle internal network, copies the CA public cert into the + agent container's trust store, and points the agent's + `HTTPS_PROXY` / `HTTP_PROXY` at mitmproxy. +- mitmproxy's upstream is the existing pipelock sidecar; pipelock + sees plaintext HTTP from mitmproxy for every previously-HTTPS + request. +- On bottle teardown the mitmproxy sidecar is removed and the + ephemeral CA private key is gone with it. +- An integration test (variant of `test_pipelock_blocks_secret_post`) + proves pipelock now blocks a credential POST that goes out over + HTTPS rather than plain HTTP. +- An integration test proves a non-credential HTTPS request to an + allowlisted host (e.g. CONNECT-then-GET on `raw.githubusercontent.com`) + succeeds end-to-end with mitmproxy in the path (no TLS-trust + errors, response body received). +- The dry-run preflight (`start --dry-run`) shows the mitmproxy + sidecar in both the text and `--format=json` output alongside the + existing pipelock entry. + +## Non-goals + +- **Topology C** — extending pipelock itself to terminate TLS. That + is the cleanest long-term shape per the research note's + recommendation but is substantial Go work and hits the + Apache-2.0-vs-ELv2 question. Deferred. +- **Topology D** — driving mitmproxy with a pipelock `/scan` HTTP + endpoint. Requires a pipelock surface that doesn't exist today. + Deferred. +- **Persistent or shared CA across bottles.** Each bottle gets a + fresh CA generated at start and destroyed at teardown. No CA + storage on the host, no cross-bottle reuse. +- **Selective bumping ("ignore_hosts") as a v1 manifest field.** + v1 bumps every CONNECT. If a future allowlisted host turns out to + pin (Mobile / Chromium-style cert pinning), a follow-up PRD adds + the per-host opt-out — likely a `bottle.egress.tls_bump_ignore` + field. See Open questions. +- **HTTP/3 / QUIC.** mitmproxy's HTTP/3 support is experimental. + v1 relies on the v1-egress iptables layer (separate PRD) blocking + UDP/443 to force clients onto HTTP/2 over TCP, which mitmproxy + inspects normally. +- **Raw TCP / non-HTTP TLS interception.** mitmproxy supports it + via `--mode reverse:`, not in CONNECT-bump mode. SSH and any + future raw-TCP egress route around mitmproxy entirely. +- **Trust-store rewiring for non-Debian agent base images.** The + current `Dockerfile` is `node:22-slim` (Debian). If a future base + switches to Red-Hat-family, the `update-ca-certificates` step + becomes `update-ca-trust`. Out of scope until the base changes. + +## Scope + +### In scope + +- New `claude_bottle/mitmproxy.py` mirroring `claude_bottle/pipelock.py`: + config helpers (no backend-specific Docker calls), the + `MitmproxyProxy` abstract class, and the per-bottle CA generation + helpers. +- New `claude_bottle/backend/docker/mitmproxy.py` mirroring + `claude_bottle/backend/docker/pipelock.py`: `DockerMitmproxyProxy` + with the Docker-specific `start` / `stop` lifecycle, the sidecar + container name scheme, and the image pin. +- New provisioner: `claude_bottle/backend/docker/provision/ca.py`, + installing the CA public cert into the agent container at + `/usr/local/share/ca-certificates/claude-bottle-mitm.crt`, running + `update-ca-certificates`, and exporting `NODE_EXTRA_CA_CERTS` / + `SSL_CERT_FILE` / `REQUESTS_CA_BUNDLE` env vars to the agent + process. The provisioner runs from `BottleBackend.provision` in + the same orchestration as `prompt`, `skills`, `ssh`, `git`. +- Per-agent network reshuffle in `DockerBottleBackend.launch`: + - internal network is unchanged (mitmproxy + pipelock + agent) + - agent's `HTTPS_PROXY` / `HTTP_PROXY` change from pointing at the + pipelock service name to the mitmproxy service name + - mitmproxy's `upstream_proxy` config points at the pipelock + service name on the internal network +- `DockerBottlePlan` grows a `mitmproxy_plan` field analogous to the + existing `proxy_plan` (the pipelock one) so prepare-time state + rides on the plan. +- Dry-run preflight (`start --dry-run` text + JSON) renders the + mitmproxy line and surfaces the CA fingerprint shown in the + bottle's trust store, so the operator can verify what's been + installed. +- Two new integration tests under `tests/integration/`: + - `test_mitmproxy_blocks_secret_https_post.py` — the HTTPS + variant of the existing `_blocks_secret_post` test. + - `test_mitmproxy_allows_normal_https.py` — confirms a plain + HTTPS GET to a non-credential-bearing path through mitmproxy + + pipelock returns the upstream response, asserting no trust / + handshake breakage. +- Unit tests for the new config builder (mirroring the pipelock + YAML unit tests) and for the CA generation helper. + +### Out of scope + +- The v1 iptables + dnsmasq layer (separate PRD; see + `network-egress-guard.md`). mitmproxy covers HTTP/HTTPS only. + Raw TCP, UDP, ICMP, and direct DNS still need the IP-level layer. +- Pipelock config changes. Pipelock continues to load the YAML PRD + 0001 already generates. mitmproxy is opaque to it; pipelock just + sees plain HTTP from a forward-proxy client. +- A bottle-level toggle to skip mitmproxy entirely. v1 always wires + it in. If a use case appears for an unintercepted bottle + (e.g. testing pipelock's CONNECT-mode behavior in isolation), + that's a follow-up. +- Pinning-host detection automation. The cost of finding out (per + the research note) is a single 5-minute test before adding a + host; it stays a manual step. + +## Proposed Design + +### Topology + +``` +agent --HTTPS_PROXY--> mitmproxy --HTTP_PROXY--> pipelock --> internet + (bump TLS) (scan plain) (real TLS) +``` + +All three containers live on the same per-bottle internal Docker +network. mitmproxy and pipelock are both attached to the per-bottle +egress bridge so they can reach the host network; the agent has no +default route, exactly as today. + +Concretely: + +- `agent` sets `HTTPS_PROXY=http://claude-bottle-mitm-:`. + Currently this points at `claude-bottle-pipelock-`. The + hostname swap is the only agent-side env change. +- `mitmproxy` runs with `--mode upstream:http://claude-bottle-pipelock-:` + so its decrypted plaintext is forwarded to pipelock as a regular + upstream forward-proxy request. (Research open question #1 calls + this out: mitmproxy 10+ documentation says `upstream` mode forwards + the original request shape; verify against the pinned version at + implementation time. If forwarding wraps a new CONNECT, fall back + to `regular` mode with a chained proxy declared in mitmproxy's + config and route plain HTTP to pipelock by hand.) +- `pipelock` continues to listen on its existing port and receives + plain HTTP from mitmproxy. No pipelock config change. + +### New components + +Two new modules, matching PRD 0001's split between +backend-agnostic config and backend-specific lifecycle: + +- **`claude_bottle/mitmproxy.py`** — backend-agnostic. The config + builder (mitmproxy YAML / TOML — confirm format), the abstract + `MitmproxyProxy` class with `prepare(...)` writing the config and + the ephemeral CA into `stage_dir`, the CA generation helper + (RSA-2048 or ECDSA-P256 — pick at impl time, research suggests + ECDSA for cert-gen speed), and constants for the sidecar's + internal-network port and image pin. +- **`claude_bottle/backend/docker/mitmproxy.py`** — Docker + implementation. `DockerMitmproxyProxy(MitmproxyProxy)` with + `start(plan)` doing `docker create` / `docker cp` / `docker + network connect` / `docker start` analogous to + `DockerPipelockProxy.start`. `stop(target)` removes the sidecar + idempotently. + +The provisioner that installs the CA cert into the agent's trust +store lives at `claude_bottle/backend/docker/provision/ca.py` and +plugs into the existing `BottleBackend.provision` orchestration. The +abstract `BottleBackend.provision_ca` method joins +`provision_prompt` / `provision_skills` / `provision_ssh` / +`provision_git` on the base class (PRD 0004's pattern), with a +default no-op implementation so other backends don't break when +they don't yet implement it. + +### CA lifecycle + +Per `tls-mitm-for-pipelock.md` §CA lifecycle: + +- **Generation.** Host-side in `MitmproxyProxy.prepare`, written to + `stage_dir/mitm-ca.key` (mode 600) and `stage_dir/mitm-ca.crt` + (mode 644). The `.key` is copied into the mitmproxy container at + start; nothing else touches it. +- **Bottle injection.** `provision_ca` copies only the public + `.crt` into the agent container at + `/usr/local/share/ca-certificates/claude-bottle-mitm.crt`, runs + `update-ca-certificates` as root inside the container, and sets + `NODE_EXTRA_CA_CERTS=/usr/local/share/ca-certificates/claude-bottle-mitm.crt`, + `SSL_CERT_FILE`, and `REQUESTS_CA_BUNDLE` for the agent process. + Belt-and-suspenders because some libraries honor only env vars. +- **Teardown.** The mitmproxy sidecar container is destroyed; the + CA key vanishes with it. Nothing persists on the host outside + `stage_dir`, which the start command already deletes in its + finally block. +- **Cost.** ECDSA-P256 CA + per-host leaf generation runs in + milliseconds; the per-bottle Docker pull and network plumbing + dominate startup time. + +### Data model changes + +None in v1. The manifest schema is unchanged. mitmproxy is always +on for every bottle once this PRD ships. + +A future selective-bump knob (per `tls-mitm-for-pipelock.md` open +question #5) would land on `bottle.egress.tls_bump_ignore` as a +list of hostnames. The shape mirrors `egress.allowlist`. Adding it +later is a strictly additive change. + +### Existing code touched + +- **`claude_bottle/backend/docker/launch.py`** — bring up the + mitmproxy sidecar after the pipelock sidecar but before the agent + container, repoint the agent's `HTTPS_PROXY` / `HTTP_PROXY` env + flags, register an `ExitStack` callback to stop mitmproxy on + teardown. +- **`claude_bottle/backend/docker/prepare.py`** — call into + `MitmproxyProxy.prepare(...)` alongside the existing + `PipelockProxy.prepare(...)`, populate + `DockerBottlePlan.mitmproxy_plan`. +- **`claude_bottle/backend/docker/backend.py`** — add the + `DockerMitmproxyProxy` instance attribute (`self._mitm`) and + thread it through `launch` + cleanup, mirroring the existing + `self._proxy` pattern. +- **`claude_bottle/backend/docker/bottle_plan.py`** — new + `mitmproxy_plan: MitmproxyProxyPlan` field on + `DockerBottlePlan`. `print()` and `to_dict()` learn to render it. +- **`claude_bottle/backend/__init__.py`** — abstract + `BottleBackend.provision_ca(plan, target)` joins the other four + provisioners. Default impl is a no-op (so a future fly backend + isn't forced to implement TLS interception in v1). +- **`tests/integration/`** — two new tests as described above. +- **`tests/unit/`** — config-builder unit tests; CA-helper unit + tests; updated dry-run-plan test pinning the mitmproxy entry. + +### External dependencies + +- **mitmproxy Docker image** pulled from + `mitmproxy/mitmproxy@sha256:`. The digest is pinned in + `claude_bottle/mitmproxy.py` and bumped deliberately, mirroring + the pipelock pin. Tag line `mitmproxy/mitmproxy:11.x` per + research §Image pin for mitmproxy. +- No new host-side runtimes. CA generation uses Python's `cryptography` + if it's already a transitive dep; otherwise use `openssl` shelled + out from the host-side prepare step. Decide at impl time after + confirming what's available on the runner without adding deps. + +## Open questions + +- **mitmproxy upstream-proxy mode mechanics.** Whether `upstream` + mode forwards decrypted plaintext to pipelock or re-wraps it in a + CONNECT. Documented behavior changed between mitmproxy 8 and 10. + Needs verification against the pinned version at impl time. If + `upstream` re-wraps, fall back to `regular` mode plus a chained + proxy directive routing plain HTTP to pipelock. +- **Pipelock plain-HTTP scanning coverage.** Pipelock's + `forward_proxy.enabled: true` accepts both `GET http://…` and + `CONNECT host:443`. Confirm by reading + `github.com/luckyPipewrench/pipelock/blob/main/docs/configuration.md` + that the full DLP / MCP / subdomain-entropy pipeline runs on the + HTTP path; some pipelock layers may be gated on CONNECT only. +- **CA installation in the Anthropic-provided Claude Code image.** + The base image determines whether `update-ca-certificates` + (Debian) or `update-ca-trust` (Red Hat) applies. Confirm against + the `Dockerfile` before writing the provisioner; v1 assumes + Debian (`node:22-slim`). +- **HTTP/2 ALPN end-to-end.** Node's HTTP client negotiates `h2` + via ALPN. Confirm the pinned mitmproxy version speaks `h2` to + both halves without silently downgrading to `http/1.1`, which + would be a noticeable performance regression on bulk transfers. +- **Selective-bump policy surface.** Where does the + "tunnel this hostname blindly" decision live when (not if) a + pinning host appears? Recommended shape per research: + `bottle.egress.tls_bump_ignore: ["example.com"]`, a list of + hostnames mitmproxy passes through via `ignore_hosts`. Defer + until needed; record the shape so the follow-up is mechanical. +- **CA generation: Python `cryptography` vs. shelled-out + `openssl`.** Adding `cryptography` brings a substantial transitive + graph; shelling to `openssl` keeps the host-side prepare step + dep-light. Decide at impl time based on what's already on the + runner. Either way, the CA is per-bottle and ephemeral. +- **Domain-fronting verification.** Once pipelock sees the inner + `Host` / `:authority`, comparing it to the outer `CONNECT` target + catches domain fronting. Whether pipelock has a rule for this or + we need to add one is a follow-up; out of scope here. +- **Dry-run preflight rendering of the CA.** Show the fingerprint + but never the private key. Confirm the exact dry-run JSON shape + during implementation; the field set is part of the CLI's user- + facing contract (per PRD 0003 §to_dict notes). + +## References + +- `docs/research/tls-mitm-for-pipelock.md` — primary source; this + PRD implements the recommendation in §Recommendation (Topology A). +- `docs/research/pipelock-assessment.md` §Scope gaps — names the + TLS-inspection gap closed here. +- `docs/prds/0001-per-agent-egress-proxy-via-pipelock.md` — + egress-proxy baseline this PRD extends. +- `docs/prds/0003-bottle-backend-abstraction.md` — backend ABC + contract this PRD adds a `provision_ca` method to. +- `docs/prds/0004-split-out-provisioners.md` — per-provisioner + module pattern reused for the new CA provisioner. +- mitmproxy: , + +- mitmproxy `upstream_proxy` mode: + +- mitmproxy CA cert installation: + +- Node `NODE_EXTRA_CA_CERTS`: +