# TLS interception for pipelock content scanning Research into adding TLS termination ("MITM") to the egress path so that pipelock's scanning pipeline can see plaintext HTTP request and response bodies, instead of only the `CONNECT` host and opaque ciphertext. ## Summary - Pipelock today sees `CONNECT` hostnames and the encrypted bytes that follow. Its DLP, subdomain-entropy, and MCP scanners cannot fire on TLS-encrypted bodies, which is the gap explicitly named under "Scope gaps" in `pipelock-assessment.md` ("Pipelock does not perform TLS inspection (no CA trust injection)"). - Closing that gap requires a TLS-terminating proxy that bumps `CONNECT`, presents a leaf certificate for the target hostname signed by a CA the bottle's trust store accepts, decrypts the inner HTTP, and re-establishes TLS to the real upstream. - The mature open-source option is **mitmproxy**. Squid + `ssl_bump` is the heavier production-grade alternative. The Go ecosystem (`goproxy`, `gomitmproxy`, `martian`) is suitable only if we want a custom binary tightly coupled to pipelock. - Recommended v1 topology: **mitmproxy in front of pipelock** on the same egress route. mitmproxy terminates client TLS, forwards plaintext to pipelock as its upstream HTTP proxy, and re-encrypts to the real upstream. Pipelock stays unchanged. - Per-bottle ephemeral CA, generated at bottle start and destroyed on teardown. The CA private key lives only on the sidecar; the bottle's trust store only ever sees the public cert. - Cert pinning is a known caveat but a small one given the narrow allowlist in this project. Selective bumping is the mitigation if a future allowlisted host turns out to pin. --- ## What pipelock cannot see today The current egress topology (per `pipelock-assessment.md`): ``` agent --HTTPS_PROXY--> pipelock --CONNECT host:443--> internet \____________________________ opaque TLS bytes ``` The agent's client (Claude Code, `curl`, an MCP server, a Python SDK) sends `CONNECT api.anthropic.com:443`. Pipelock checks the hostname against its `api_allowlist`, replies `200 Connection Established`, and then blindly relays bytes between the two TCP halves. The TLS handshake and everything inside it happens end-to-end between the agent and the real upstream. What pipelock can scan in this mode: - `CONNECT` target hostname (SNI is not even needed). - TLS record framing and lengths (useful for budgets, useless for DLP). - Plain HTTP/1.1 to non-HTTPS destinations (irrelevant — there are none in `DEFAULT_ALLOWLIST`). What pipelock cannot scan in this mode: - Request URL, method, headers, body. - Response status, headers, body. - MCP JSON-RPC payloads inside the TLS session. - WebSocket frames inside a TLS-wrapped upgrade. - Whether the inner SNI or HTTP `Host` / `:authority` matches the outer `CONNECT` target (domain-fronting check). The 48-pattern DLP layer, the subdomain-entropy check (insofar as it inspects URLs rather than DNS-resolver queries), the request-redaction feature added in v2.3.0, and bidirectional MCP scanning all require plaintext to operate on. Without TLS termination, those layers are inert against any HTTPS destination — which is every destination in the current allowlist. --- ## How TLS interception works The mechanics of `CONNECT` bumping, end to end: 1. **Agent issues `CONNECT`.** The HTTP client sees `HTTPS_PROXY` set, so it opens a TCP connection to the proxy and sends `CONNECT api.anthropic.com:443 HTTP/1.1`. 2. **Proxy answers `200`.** Standard tunnel-established response. 3. **Proxy starts TLS as the server.** Instead of relaying bytes, the proxy itself performs a TLS handshake with the agent. It needs a server certificate for `api.anthropic.com` — so on first contact for that hostname, the proxy generates a leaf certificate with `CN=api.anthropic.com` and a SAN for the same, signs it with its own CA private key, and presents that cert. Subsequent connections to the same hostname reuse the cached leaf. 4. **Agent verifies the cert.** The agent's TLS library walks the chain to a trusted root. Because the bottle's trust store contains the proxy's CA cert, validation succeeds. The agent has no way to tell it isn't talking to the real `api.anthropic.com`. 5. **Proxy opens its own TLS to the real upstream.** As a client this time, using the system root store, talking to the real `api.anthropic.com`. Real SNI, real cert chain validated normally. 6. **Proxy bridges the two TLS sessions.** Decrypts on the server side, re-encrypts on the client side, and scans the plaintext in between. This is what every TLS-terminating egress proxy does. The trade-offs live in three places: - **CA trust injection.** Step 4 only works if the bottle's trust store contains the proxy's CA. Mechanics covered under "CA lifecycle" below. - **Cert generation cost.** Generating an RSA-2048 leaf cert takes ~50 ms; ECDSA P-256 is ~5 ms. Cache leaves per (hostname, SAN list) to keep this off the steady-state hot path. - **Protocol coverage.** The proxy needs to speak HTTP/1.1, HTTP/2 (ALPN `h2`), and ideally WebSocket. HTTP/3 / QUIC is UDP and requires a separate code path; for v1, blocking UDP/443 at the iptables layer forces clients to fall back to HTTP/2, which we can inspect. --- ## Tools ### mitmproxy - **What it is.** Python (with Rust crypto bits) interactive HTTPS proxy. Reference open-source implementation of the bump pattern. Ships as `mitmproxy` (TUI), `mitmweb` (browser UI), and `mitmdump` (headless). - **Cert handling.** Generates a CA on first run under `~/.mitmproxy/`. Per-host leaves are generated on demand and cached in memory. Cert cache keyed by (hostname, SAN extensions inferred from upstream cert). - **Protocols.** HTTP/1.1, HTTP/2, WebSocket fully supported. HTTP/3 exists as experimental. Raw TCP / non-HTTP TLS supported via `--mode reverse:` but not in CONNECT-bump mode. - **Extensibility.** Python addon API. An addon module can inspect or modify any `request` / `response` / `tcp_message` flow. The pipelock integration in Topology D below uses this. - **Selective bumping.** `ignore_hosts` regex; matching CONNECTs are tunneled blindly instead of bumped. Critical for the cert-pinning mitigation. - **Docker image.** `mitmproxy/mitmproxy` on Docker Hub. Single binary for the CLI, ~80 MB image. Configurable via flags or `~/.mitmproxy/config.yaml`. - **Project URL.** , . Most mature, best-documented, lowest-effort integration. Default choice for v1. ### Squid + ssl_bump - **What it is.** Squid is a long-running C++ caching proxy. `ssl_bump` is its TLS-interception feature, controlled by per-CONNECT actions: `splice` (tunnel blindly), `bump` (decrypt and re-encrypt), `peek` (look at TLS hello then decide), `stare` (look at server cert then decide), `terminate` (abort the connection). - **Cert handling.** Configured via `sslcrtd_program` — a helper that generates and caches per-host certs. CA cert and key referenced by PEM paths in `squid.conf`. - **Protocols.** HTTP/1.1 fully; HTTP/2 to clients via recent versions; no scripted addons. - **Extensibility.** ICAP (Internet Content Adaptation Protocol) for external scanners — Squid POSTs each request/response to an ICAP service that can modify or reject. This is the formal version of Topology D below. - **Production track record.** Used at corporate-proxy scale (large enterprises, ISPs). Heavyweight for a single-bottle sidecar. - **Project URL.** . Right tool if pipelock grows an ICAP server endpoint. Otherwise, more config surface than this project needs. ### Go libraries: goproxy, gomitmproxy, martian - **`goproxy`** (elazarl) — long-lived Go library, basic CONNECT-bumping proxy with a handler API. Sparse on HTTP/2. - **`gomitmproxy`** (AdGuard) — newer, cleaner API; built for AdGuard Home / DNS-filtering products. HTTP/2 support is partial. - **`martian`** (Google) — request/response modifier framework with a JSON-configurable rule engine. Used internally at Google; public ecosystem thin. These are relevant only if we decide to write a custom TLS-terminating binary that links pipelock's scanning packages directly — Topology C below. They are not faster than mitmproxy for the v1 sidecar shape; they are smaller and more direct, at the cost of writing more Go. ### Disqualified - **Caddy, Envoy, HAProxy.** All can terminate TLS at a reverse-proxy vhost. None ship a "bump on CONNECT and forward plaintext to a downstream proxy" mode out of the box. Adapting any of them to this shape is more work than starting from mitmproxy. - **Cloudflare Gateway, Zscaler, NetSkope, Forcepoint.** Managed cloud egress with TLS inspection. Wrong topology — they live outside the host, not as a per-bottle sidecar, and they require trusting a vendor with full plaintext. - **Charles Proxy, Burp Suite.** Closed-source GUI tools for developer capture and security testing. Not appropriate as headless sidecars. - **`mitmdump` standalone vs. embedding mitmproxy as a library.** Both are mitmproxy. Calling out only to note: the project ships both a CLI and a Python API; addons can be loaded either way. --- ## Topologies Five candidate topologies, ordered roughly from least to most coupled between the two components. ### A — mitmproxy in front of pipelock (recommended) ``` agent --HTTPS_PROXY--> mitmproxy --HTTP_PROXY--> pipelock --> internet (bump TLS) (scan plain) (real TLS) ``` mitmproxy terminates the agent's TLS connection, decrypts, and then forwards the inner HTTP request to pipelock by treating pipelock as its own upstream HTTP forward proxy. Pipelock receives plaintext HTTP exactly as if the agent had used HTTP, applies its full scanning pipeline, and forwards to mitmproxy's upstream client half — which re-establishes TLS to the real destination. Concretely the agent's `HTTPS_PROXY` points at mitmproxy; mitmproxy's `upstream_proxy` config points at pipelock; pipelock's network reach includes the real internet. - **Wins.** Pipelock unchanged. mitmproxy unchanged from default configuration. Each component has one job. Failure modes are clear per layer. - **Costs.** Two sidecars per bottle instead of one. One extra decrypt / re-encrypt hop, ~5–15 ms per request in steady state. - **Open question.** How exactly mitmproxy forwards to pipelock matters for whether pipelock sees TLS again or only HTTP. mitmproxy's `upstream` mode wraps the decrypted request in another CONNECT if the destination is HTTPS — which would re-encrypt before pipelock sees it, defeating the point. The correct mode is `upstream` with TLS re-origination disabled, or `regular` mode with a chained proxy. The v2 release of mitmproxy reworked this; needs verification against the current docs at integration time. ### B — pipelock in front of mitmproxy (ruled out) ``` agent --HTTPS_PROXY--> pipelock --CONNECT?--> mitmproxy --> internet (sees CONNECT only) (bump TLS) ``` Pipelock would receive a `CONNECT` and decide to allow or deny based on hostname, then tunnel to mitmproxy. mitmproxy would terminate TLS and see plaintext — but pipelock would never see the plaintext, which is the whole point of the exercise. The scanning still happens (in mitmproxy), but it isn't pipelock doing it, so we'd need an entirely different rule engine. Ruled out. ### C — Extend pipelock itself to terminate TLS Two sub-variants: **C.1 — Upstream a `tls_terminate` mode.** Submit a feature to pipelock that adds CONNECT bumping and per-host cert generation in Go, using `crypto/tls` and the existing scanning packages. Pipelock becomes a self-contained MITM proxy. License question matters here: the Apache 2.0 core can grow new features in-tree, but if upstream insists this belongs in `enterprise/` (ELv2), we either accept ELv2 or fork. **C.2 — Wrap pipelock in a thin Go binary in the same container.** A small Go program does the TLS half (`CONNECT` parsing, cert generation, TLS handshake) and pipes plaintext to pipelock over UDS or loopback. The wrapper is ours; pipelock is unmodified. No license question. - **Wins.** Single component on the egress path. Pipelock owns the scanning end-to-end, including domain-fronting checks (SNI vs. `Host` vs. `CONNECT`). - **Costs.** Real Go engineering effort. CA generation, cert caching, TLS handshake, HTTP/2 ALPN negotiation, WebSocket upgrade — all things mitmproxy already solves. - **When.** Right shape for v2 or v3 once the v1 mitmproxy-in-front topology has proven the integration works and the scanning rules are stable. ### D — mitmproxy as the proxy, pipelock as a content-scan subroutine ``` agent --HTTPS_PROXY--> mitmproxy --> internet (bump TLS) | v POST /scan to pipelock <- allow / block / redact ``` A Python addon in mitmproxy sends each decrypted request (and response) to a pipelock HTTP `/scan` endpoint and gates the flow on the verdict. mitmproxy handles all networking; pipelock is the rule engine only. - **Wins.** Clean separation of concerns. Pipelock doesn't have to speak TLS at all. The addon is small, ~100 lines of Python. - **Costs.** Requires pipelock to expose a scan API. The current Apache 2.0 core does not document one. If `/scan` lives in `enterprise/`, ELv2 applies. If it doesn't exist, we'd be asking pipelock for a new surface. - **Variant.** Squid's ICAP path is the formalized version of the same pattern. ### E — Single container, two processes mitmproxy and pipelock share a container, started by `supervisord` or `s6-overlay`. Networking simplifies to localhost. Lifecycle complicates: container restart now means restarting both; failure of one process is not visible at the Docker layer; logs interleave. - **Wins.** Slightly less Docker plumbing in `cli.py`. - **Costs.** Operational complexity not worth the savings. The two containers are independent processes with independent failure modes; Docker is the right tool for that. Net: not recommended. --- ## CA lifecycle The CA private key is the asset to defend. With it, anyone can issue certs that the bottle's trust store will accept for any hostname. So: **Per-bottle ephemeral CA.** At bottle start, generate a fresh RSA-2048 or ECDSA-P256 CA inside the mitmproxy sidecar. Export only the public cert (PEM) into the bottle's trust store at one of: - `/usr/local/share/ca-certificates/claude-bottle-mitm.crt` followed by `update-ca-certificates` (Debian/Ubuntu base images). - `/etc/pki/ca-trust/source/anchors/` with `update-ca-trust` (Red-Hat-family). - `$NODE_EXTRA_CA_CERTS` for Node-based agents (Claude Code). - `$SSL_CERT_FILE` / `$REQUESTS_CA_BUNDLE` for Python SDKs. The private key never leaves the sidecar's filesystem. The CA cert public half is the only artifact that crosses into the bottle. On bottle teardown, the sidecar container is destroyed; the CA dies with it. The next bottle gets a fresh CA. No long-lived MITM CA on disk. **Why not a shared per-host CA.** A persistent CA across bottles is faster (no generation at start) but is a real liability: if any bottle exfiltrates the CA cert public half (which it can — it's in the trust store by design), an attacker on the host network could in principle impersonate any host to any bottle. With a per-bottle CA, the exfil gains nothing: the CA is bottle-local and dies in minutes. **Generation cost.** RSA-2048 CA generation is ~200 ms; ECDSA-P256 is ~5 ms. Either is irrelevant against the per-bottle Docker pull and network setup cost. **Where the CA lives in the bottle's trust store.** Both: a distribution-standard path with `update-ca-certificates`, and the env-var path. Belt and suspenders, because some Node and Python libraries honor the env vars only, and some load only `/etc/ssl/certs/` directly. --- ## Cert pinning (brief) A client that pins ignores the trust store and refuses any cert whose public key isn't on a hardcoded list. Three observations for this project: - The current `DEFAULT_ALLOWLIST` (`api.anthropic.com`, `statsig.anthropic.com`, `sentry.io`, `claude.ai`, `platform.claude.com`, `downloads.claude.ai`, `raw.githubusercontent.com`) does not appear to include any host that pins against server-side SDKs. Server-side SDKs (Node, Python) almost universally honor system trust and `NODE_EXTRA_CA_CERTS` / `SSL_CERT_FILE`. Mobile SDKs and Chromium pin; we don't run those. - If a future allowlisted host turns out to pin, the mitigation is selective bumping via mitmproxy `ignore_hosts`: that specific hostname tunnels blindly and pipelock loses DLP coverage for it. Coverage on every other host is unaffected. - The cost of finding out: a single 5-minute test before adding a host — point mitmproxy at the host, observe whether the client succeeds. Not a v1 blocker. Document the failure mode and the mitigation. --- ## Comparison table | | A: mitmproxy → pipelock | B: pipelock → mitmproxy | C: TLS in pipelock | D: mitmproxy + scan API | E: one container | |---|---|---|---|---|---| | Pipelock sees plaintext | yes | no | yes | yes (via /scan) | yes | | Code change to pipelock | none | none | substantial | adds /scan endpoint | none | | Sidecar count | 2 | 2 | 1 | 2 | 1 | | Cert generation owner | mitmproxy | mitmproxy | pipelock | mitmproxy | mitmproxy | | Selective bumping | mitmproxy `ignore_hosts` | mitmproxy `ignore_hosts` | pipelock config | mitmproxy `ignore_hosts` | mitmproxy `ignore_hosts` | | Failure isolation per process | yes | yes | n/a (one process) | yes | no (shared container) | | License question | none | none | ELv2 risk | ELv2 risk | none | | v1 effort | low | low (but pointless) | high | medium | low | | Long-term shape | interim | n/a | best | possible | not recommended | --- ## Recommendation **Adopt Topology A for v1.** Add a mitmproxy sidecar to the egress topology, in front of pipelock on the same per-bottle internal network. The agent's `HTTPS_PROXY` points at mitmproxy; mitmproxy's upstream is pipelock; pipelock's upstream is the real internet. Concretely: 1. Add a `MitmproxyProxy` class alongside `PipelockProxy`, with the same `prepare` / `start` / `stop` lifecycle. The class generates a per-bottle CA in `stage_dir`, exports the public cert into a second file, and writes a mitmproxy config that: - bumps every CONNECT by default - uses `upstream_proxy = http://pipelock-:` - listens on a known port inside the per-bottle internal network 2. Extend the bottle launch step to copy the CA public cert into the agent container under `/usr/local/share/ca-certificates/claude-bottle-mitm.crt`, run `update-ca-certificates`, and set `NODE_EXTRA_CA_CERTS` / `SSL_CERT_FILE` / `REQUESTS_CA_BUNDLE` accordingly. 3. Repoint the agent's `HTTPS_PROXY` and `HTTP_PROXY` from the pipelock container to the mitmproxy container. 4. Verify mitmproxy's upstream-proxy mode forwards plaintext (not a re-wrapped CONNECT) to pipelock; if not, use `regular` mode with a chained proxy directive. 5. Test that pipelock's DLP, subdomain-entropy, and MCP scanners now fire on real request bodies for `api.anthropic.com` traffic. **Defer Topologies C and D.** Topology C (extending pipelock to terminate TLS) is the cleanest long-term shape but is a substantial build and runs into the Apache 2.0 vs. ELv2 question. Topology D (mitmproxy with pipelock as a scan API) is attractive but requires a pipelock surface that doesn't exist today. Both are valid v2 targets; neither is the right starting point. The `network-egress-guard.md` v1 iptables + dnsmasq layer remains necessary alongside this — TLS interception covers HTTP/HTTPS only; raw TCP, UDP/443 (QUIC), UDP/53 (DNS), and ICMP still need the IP-level default-deny. --- ## Open questions 1. **mitmproxy upstream-proxy mode mechanics.** Does mitmproxy in `upstream_proxy` mode forward decrypted HTTP plaintext to the upstream, or does it wrap it in a new CONNECT? The documented behavior changed between mitmproxy 8 and 10. Needs verification against the version we pin. 2. **Pipelock's behavior when receiving plain HTTP.** Pipelock's `forward_proxy.enabled: true` accepts both `GET http://...` (plain HTTP) and `CONNECT host:443` (HTTPS). After Topology A is wired up, pipelock will see only plain HTTP — does its DLP / MCP scanning pipeline run the full set of layers, or are some gated on the CONNECT path? Confirm by reading `github.com/luckyPipewrench/pipelock/blob/main/docs/configuration.md`. 3. **CA installation in the Anthropic-provided Claude Code Docker image.** The base image's distribution determines whether `update-ca-certificates` (Debian/Ubuntu) or `update-ca-trust` (Red Hat) is the right command. The current `Dockerfile` should be inspected before assuming Debian. 4. **HTTP/2 over the agent → mitmproxy hop.** Node's HTTP client negotiates `h2` via ALPN. mitmproxy speaks `h2` to clients in recent versions. Confirm the version we pin supports `h2` end-to-end and doesn't downgrade to `http/1.1` (which would be a silent performance regression). 5. **Selective-bump policy surface.** Where does the "tunnel this hostname blindly" decision live? Options: a field on `bottle.egress` in the manifest, a fixed list of known-pinning hosts baked into the mitmproxy config, or pipelock-side opt-out. Manifest field is most consistent with the existing `bottle.egress.allowlist` shape. 6. **Image pin for mitmproxy.** The `pipelock-assessment.md` recommendation is to pin by digest. The mitmproxy Docker Hub image should be pinned the same way. Which release line? `mitmproxy/mitmproxy` ships rolling and tagged versions; the tagged `:11.x` line is the right baseline. 7. **CA generation in Python (mitmproxy) vs. as a separate step.** mitmproxy generates a CA on first launch if none is provided. For per-bottle ephemerality, we want the CA to be ours, not whatever mitmproxy chooses — so generate the CA in the host-side prepare step and inject it via `--certs *=...`. Mechanics need confirming. 8. **Domain fronting verification.** Once pipelock sees plaintext, it has access to the inner `Host` / `:authority`. A new rule that compares it against the outer `CONNECT` target catches domain fronting. Worth a follow-up note on whether pipelock has such a rule or whether we add it. --- ## References - mitmproxy: , - mitmproxy `upstream_proxy` mode: - mitmproxy CA cert installation: - Squid `ssl_bump`: - Squid ICAP: - `goproxy`: - `gomitmproxy`: - `martian`: - Node TLS / `NODE_EXTRA_CA_CERTS`: - Python `SSL_CERT_FILE` and `REQUESTS_CA_BUNDLE`: - Prior research — pipelock assessment: `docs/research/pipelock-assessment.md` - Prior research — network egress guard: `docs/research/network-egress-guard.md` - Prior research — secret exfil tripwire encodings: `docs/research/secret-exfil-tripwire-encodings.md` Research conducted 2026-05-12.