From 8e261563dcf786779126b3cacd247accc2ac8792 Mon Sep 17 00:00:00 2001
From: didericis <eric@dideric.is>
Date: Tue, 12 May 2026 11:41:34 -0400
Subject: [PATCH] docs(research): TLS interception topologies for pipelock
 content scanning

Survey of TLS-MITM tools (mitmproxy, Squid+ssl_bump, Go libraries) and
five candidate topologies for adding TLS termination to the egress path
so pipelock's DLP, subdomain-entropy, and MCP scanners can fire on
plaintext bodies. Recommends mitmproxy in front of pipelock for v1
with a per-bottle ephemeral CA.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 docs/research/tls-mitm-for-pipelock.md | 508 +++++++++++++++++++++++++
 1 file changed, 508 insertions(+)
 create mode 100644 docs/research/tls-mitm-for-pipelock.md

diff --git a/docs/research/tls-mitm-for-pipelock.md b/docs/research/tls-mitm-for-pipelock.md
new file mode 100644
index 0000000..aa8c4d0
--- /dev/null
+++ b/docs/research/tls-mitm-for-pipelock.md
@@ -0,0 +1,508 @@
+# TLS interception for pipelock content scanning
+
+Research into adding TLS termination ("MITM") to the egress path so that
+pipelock's scanning pipeline can see plaintext HTTP request and response
+bodies, instead of only the `CONNECT` host and opaque ciphertext.
+
+## Summary
+
+- Pipelock today sees `CONNECT` hostnames and the encrypted bytes that follow.
+  Its DLP, subdomain-entropy, and MCP scanners cannot fire on TLS-encrypted
+  bodies, which is the gap explicitly named under "Scope gaps" in
+  `pipelock-assessment.md` ("Pipelock does not perform TLS inspection (no CA
+  trust injection)").
+- Closing that gap requires a TLS-terminating proxy that bumps `CONNECT`,
+  presents a leaf certificate for the target hostname signed by a CA the
+  bottle's trust store accepts, decrypts the inner HTTP, and re-establishes
+  TLS to the real upstream.
+- The mature open-source option is **mitmproxy**. Squid + `ssl_bump` is the
+  heavier production-grade alternative. The Go ecosystem (`goproxy`,
+  `gomitmproxy`, `martian`) is suitable only if we want a custom binary
+  tightly coupled to pipelock.
+- Recommended v1 topology: **mitmproxy in front of pipelock** on the same
+  egress route. mitmproxy terminates client TLS, forwards plaintext to
+  pipelock as its upstream HTTP proxy, and re-encrypts to the real upstream.
+  Pipelock stays unchanged.
+- Per-bottle ephemeral CA, generated at bottle start and destroyed on
+  teardown. The CA private key lives only on the sidecar; the bottle's
+  trust store only ever sees the public cert.
+- Cert pinning is a known caveat but a small one given the narrow allowlist
+  in this project. Selective bumping is the mitigation if a future
+  allowlisted host turns out to pin.
+
+---
+
+## What pipelock cannot see today
+
+The current egress topology (per `pipelock-assessment.md`):
+
+```
+agent --HTTPS_PROXY--> pipelock --CONNECT host:443--> internet
+                                  \____________________________
+                                       opaque TLS bytes
+```
+
+The agent's client (Claude Code, `curl`, an MCP server, a Python SDK)
+sends `CONNECT api.anthropic.com:443`. Pipelock checks the hostname
+against its `api_allowlist`, replies `200 Connection Established`, and
+then blindly relays bytes between the two TCP halves. The TLS handshake
+and everything inside it happens end-to-end between the agent and the
+real upstream.
+
+What pipelock can scan in this mode:
+
+- `CONNECT` target hostname (SNI is not even needed).
+- TLS record framing and lengths (useful for budgets, useless for DLP).
+- Plain HTTP/1.1 to non-HTTPS destinations (irrelevant — there are none
+  in `DEFAULT_ALLOWLIST`).
+
+What pipelock cannot scan in this mode:
+
+- Request URL, method, headers, body.
+- Response status, headers, body.
+- MCP JSON-RPC payloads inside the TLS session.
+- WebSocket frames inside a TLS-wrapped upgrade.
+- Whether the inner SNI or HTTP `Host` / `:authority` matches the
+  outer `CONNECT` target (domain-fronting check).
+
+The 48-pattern DLP layer, the subdomain-entropy check (insofar as it
+inspects URLs rather than DNS-resolver queries), the request-redaction
+feature added in v2.3.0, and bidirectional MCP scanning all require
+plaintext to operate on. Without TLS termination, those layers are
+inert against any HTTPS destination — which is every destination in
+the current allowlist.
+
+---
+
+## How TLS interception works
+
+The mechanics of `CONNECT` bumping, end to end:
+
+1. **Agent issues `CONNECT`.** The HTTP client sees `HTTPS_PROXY` set,
+   so it opens a TCP connection to the proxy and sends
+   `CONNECT api.anthropic.com:443 HTTP/1.1`.
+2. **Proxy answers `200`.** Standard tunnel-established response.
+3. **Proxy starts TLS as the server.** Instead of relaying bytes, the
+   proxy itself performs a TLS handshake with the agent. It needs a
+   server certificate for `api.anthropic.com` — so on first contact for
+   that hostname, the proxy generates a leaf certificate with
+   `CN=api.anthropic.com` and a SAN for the same, signs it with its
+   own CA private key, and presents that cert. Subsequent connections
+   to the same hostname reuse the cached leaf.
+4. **Agent verifies the cert.** The agent's TLS library walks the chain
+   to a trusted root. Because the bottle's trust store contains the
+   proxy's CA cert, validation succeeds. The agent has no way to tell
+   it isn't talking to the real `api.anthropic.com`.
+5. **Proxy opens its own TLS to the real upstream.** As a client this
+   time, using the system root store, talking to the real
+   `api.anthropic.com`. Real SNI, real cert chain validated normally.
+6. **Proxy bridges the two TLS sessions.** Decrypts on the server side,
+   re-encrypts on the client side, and scans the plaintext in between.
+
+This is what every TLS-terminating egress proxy does. The trade-offs
+live in three places:
+
+- **CA trust injection.** Step 4 only works if the bottle's trust
+  store contains the proxy's CA. Mechanics covered under "CA lifecycle"
+  below.
+- **Cert generation cost.** Generating an RSA-2048 leaf cert takes
+  ~50 ms; ECDSA P-256 is ~5 ms. Cache leaves per (hostname, SAN list)
+  to keep this off the steady-state hot path.
+- **Protocol coverage.** The proxy needs to speak HTTP/1.1, HTTP/2 (ALPN
+  `h2`), and ideally WebSocket. HTTP/3 / QUIC is UDP and requires a
+  separate code path; for v1, blocking UDP/443 at the iptables layer
+  forces clients to fall back to HTTP/2, which we can inspect.
+
+---
+
+## Tools
+
+### mitmproxy
+
+- **What it is.** Python (with Rust crypto bits) interactive HTTPS proxy.
+  Reference open-source implementation of the bump pattern. Ships as
+  `mitmproxy` (TUI), `mitmweb` (browser UI), and `mitmdump` (headless).
+- **Cert handling.** Generates a CA on first run under `~/.mitmproxy/`.
+  Per-host leaves are generated on demand and cached in memory. Cert
+  cache keyed by (hostname, SAN extensions inferred from upstream cert).
+- **Protocols.** HTTP/1.1, HTTP/2, WebSocket fully supported. HTTP/3
+  exists as experimental. Raw TCP / non-HTTP TLS supported via
+  `--mode reverse:` but not in CONNECT-bump mode.
+- **Extensibility.** Python addon API. An addon module can inspect or
+  modify any `request` / `response` / `tcp_message` flow. The pipelock
+  integration in Topology D below uses this.
+- **Selective bumping.** `ignore_hosts` regex; matching CONNECTs are
+  tunneled blindly instead of bumped. Critical for the cert-pinning
+  mitigation.
+- **Docker image.** `mitmproxy/mitmproxy` on Docker Hub. Single binary
+  for the CLI, ~80 MB image. Configurable via flags or `~/.mitmproxy/config.yaml`.
+- **Project URL.** <https://mitmproxy.org>, <https://github.com/mitmproxy/mitmproxy>.
+
+Most mature, best-documented, lowest-effort integration. Default choice
+for v1.
+
+### Squid + ssl_bump
+
+- **What it is.** Squid is a long-running C++ caching proxy.
+  `ssl_bump` is its TLS-interception feature, controlled by per-CONNECT
+  actions: `splice` (tunnel blindly), `bump` (decrypt and re-encrypt),
+  `peek` (look at TLS hello then decide), `stare` (look at server cert
+  then decide), `terminate` (abort the connection).
+- **Cert handling.** Configured via `sslcrtd_program` — a helper that
+  generates and caches per-host certs. CA cert and key referenced by
+  PEM paths in `squid.conf`.
+- **Protocols.** HTTP/1.1 fully; HTTP/2 to clients via recent versions;
+  no scripted addons.
+- **Extensibility.** ICAP (Internet Content Adaptation Protocol) for
+  external scanners — Squid POSTs each request/response to an ICAP
+  service that can modify or reject. This is the formal version of
+  Topology D below.
+- **Production track record.** Used at corporate-proxy scale (large
+  enterprises, ISPs). Heavyweight for a single-bottle sidecar.
+- **Project URL.** <https://wiki.squid-cache.org/Features/SslPeekAndSplice>.
+
+Right tool if pipelock grows an ICAP server endpoint. Otherwise, more
+config surface than this project needs.
+
+### Go libraries: goproxy, gomitmproxy, martian
+
+- **`goproxy`** (elazarl) — long-lived Go library, basic CONNECT-bumping
+  proxy with a handler API. Sparse on HTTP/2.
+  <https://github.com/elazarl/goproxy>
+- **`gomitmproxy`** (AdGuard) — newer, cleaner API; built for AdGuard
+  Home / DNS-filtering products. HTTP/2 support is partial.
+  <https://github.com/AdguardTeam/gomitmproxy>
+- **`martian`** (Google) — request/response modifier framework with a
+  JSON-configurable rule engine. Used internally at Google; public
+  ecosystem thin.
+  <https://github.com/google/martian>
+
+These are relevant only if we decide to write a custom TLS-terminating
+binary that links pipelock's scanning packages directly — Topology C
+below. They are not faster than mitmproxy for the v1 sidecar shape;
+they are smaller and more direct, at the cost of writing more Go.
+
+### Disqualified
+
+- **Caddy, Envoy, HAProxy.** All can terminate TLS at a reverse-proxy
+  vhost. None ship a "bump on CONNECT and forward plaintext to a
+  downstream proxy" mode out of the box. Adapting any of them to this
+  shape is more work than starting from mitmproxy.
+- **Cloudflare Gateway, Zscaler, NetSkope, Forcepoint.** Managed cloud
+  egress with TLS inspection. Wrong topology — they live outside the
+  host, not as a per-bottle sidecar, and they require trusting a vendor
+  with full plaintext.
+- **Charles Proxy, Burp Suite.** Closed-source GUI tools for developer
+  capture and security testing. Not appropriate as headless sidecars.
+- **`mitmdump` standalone vs. embedding mitmproxy as a library.** Both
+  are mitmproxy. Calling out only to note: the project ships both a CLI
+  and a Python API; addons can be loaded either way.
+
+---
+
+## Topologies
+
+Five candidate topologies, ordered roughly from least to most coupled
+between the two components.
+
+### A — mitmproxy in front of pipelock (recommended)
+
+```
+agent --HTTPS_PROXY--> mitmproxy --HTTP_PROXY--> pipelock --> internet
+                       (bump TLS)               (scan plain)  (real TLS)
+```
+
+mitmproxy terminates the agent's TLS connection, decrypts, and then
+forwards the inner HTTP request to pipelock by treating pipelock as
+its own upstream HTTP forward proxy. Pipelock receives plaintext HTTP
+exactly as if the agent had used HTTP, applies its full scanning
+pipeline, and forwards to mitmproxy's upstream client half — which
+re-establishes TLS to the real destination.
+
+Concretely the agent's `HTTPS_PROXY` points at mitmproxy; mitmproxy's
+`upstream_proxy` config points at pipelock; pipelock's network reach
+includes the real internet.
+
+- **Wins.** Pipelock unchanged. mitmproxy unchanged from default
+  configuration. Each component has one job. Failure modes are clear
+  per layer.
+- **Costs.** Two sidecars per bottle instead of one. One extra
+  decrypt / re-encrypt hop, ~5–15 ms per request in steady state.
+- **Open question.** How exactly mitmproxy forwards to pipelock matters
+  for whether pipelock sees TLS again or only HTTP. mitmproxy's
+  `upstream` mode wraps the decrypted request in another CONNECT if the
+  destination is HTTPS — which would re-encrypt before pipelock sees
+  it, defeating the point. The correct mode is `upstream` with TLS
+  re-origination disabled, or `regular` mode with a chained proxy. The
+  v2 release of mitmproxy reworked this; needs verification against the
+  current docs at integration time.
+
+### B — pipelock in front of mitmproxy (ruled out)
+
+```
+agent --HTTPS_PROXY--> pipelock --CONNECT?--> mitmproxy --> internet
+                       (sees CONNECT only)   (bump TLS)
+```
+
+Pipelock would receive a `CONNECT` and decide to allow or deny based
+on hostname, then tunnel to mitmproxy. mitmproxy would terminate TLS
+and see plaintext — but pipelock would never see the plaintext, which
+is the whole point of the exercise. The scanning still happens (in
+mitmproxy), but it isn't pipelock doing it, so we'd need an entirely
+different rule engine. Ruled out.
+
+### C — Extend pipelock itself to terminate TLS
+
+Two sub-variants:
+
+**C.1 — Upstream a `tls_terminate` mode.** Submit a feature to
+pipelock that adds CONNECT bumping and per-host cert generation in Go,
+using `crypto/tls` and the existing scanning packages. Pipelock becomes
+a self-contained MITM proxy. License question matters here: the Apache
+2.0 core can grow new features in-tree, but if upstream insists this
+belongs in `enterprise/` (ELv2), we either accept ELv2 or fork.
+
+**C.2 — Wrap pipelock in a thin Go binary in the same container.** A
+small Go program does the TLS half (`CONNECT` parsing, cert generation,
+TLS handshake) and pipes plaintext to pipelock over UDS or loopback.
+The wrapper is ours; pipelock is unmodified. No license question.
+
+- **Wins.** Single component on the egress path. Pipelock owns the
+  scanning end-to-end, including domain-fronting checks (SNI vs.
+  `Host` vs. `CONNECT`).
+- **Costs.** Real Go engineering effort. CA generation, cert caching,
+  TLS handshake, HTTP/2 ALPN negotiation, WebSocket upgrade — all
+  things mitmproxy already solves.
+- **When.** Right shape for v2 or v3 once the v1 mitmproxy-in-front
+  topology has proven the integration works and the scanning rules are
+  stable.
+
+### D — mitmproxy as the proxy, pipelock as a content-scan subroutine
+
+```
+agent --HTTPS_PROXY--> mitmproxy --> internet
+                       (bump TLS)
+                          |
+                          v
+                       POST /scan to pipelock
+                       <- allow / block / redact
+```
+
+A Python addon in mitmproxy sends each decrypted request (and response)
+to a pipelock HTTP `/scan` endpoint and gates the flow on the verdict.
+mitmproxy handles all networking; pipelock is the rule engine only.
+
+- **Wins.** Clean separation of concerns. Pipelock doesn't have to
+  speak TLS at all. The addon is small, ~100 lines of Python.
+- **Costs.** Requires pipelock to expose a scan API. The current Apache
+  2.0 core does not document one. If `/scan` lives in `enterprise/`,
+  ELv2 applies. If it doesn't exist, we'd be asking pipelock for a new
+  surface.
+- **Variant.** Squid's ICAP path is the formalized version of the same
+  pattern.
+
+### E — Single container, two processes
+
+mitmproxy and pipelock share a container, started by `supervisord` or
+`s6-overlay`. Networking simplifies to localhost. Lifecycle complicates:
+container restart now means restarting both; failure of one process is
+not visible at the Docker layer; logs interleave.
+
+- **Wins.** Slightly less Docker plumbing in `cli.py`.
+- **Costs.** Operational complexity not worth the savings. The two
+  containers are independent processes with independent failure modes;
+  Docker is the right tool for that.
+
+Net: not recommended.
+
+---
+
+## CA lifecycle
+
+The CA private key is the asset to defend. With it, anyone can issue
+certs that the bottle's trust store will accept for any hostname. So:
+
+**Per-bottle ephemeral CA.** At bottle start, generate a fresh
+RSA-2048 or ECDSA-P256 CA inside the mitmproxy sidecar. Export only
+the public cert (PEM) into the bottle's trust store at one of:
+
+- `/usr/local/share/ca-certificates/claude-bottle-mitm.crt` followed by
+  `update-ca-certificates` (Debian/Ubuntu base images).
+- `/etc/pki/ca-trust/source/anchors/` with `update-ca-trust`
+  (Red-Hat-family).
+- `$NODE_EXTRA_CA_CERTS` for Node-based agents (Claude Code).
+- `$SSL_CERT_FILE` / `$REQUESTS_CA_BUNDLE` for Python SDKs.
+
+The private key never leaves the sidecar's filesystem. The CA cert
+public half is the only artifact that crosses into the bottle.
+
+On bottle teardown, the sidecar container is destroyed; the CA dies
+with it. The next bottle gets a fresh CA. No long-lived MITM CA on
+disk.
+
+**Why not a shared per-host CA.** A persistent CA across bottles is
+faster (no generation at start) but is a real liability: if any bottle
+exfiltrates the CA cert public half (which it can — it's in the trust
+store by design), an attacker on the host network could in principle
+impersonate any host to any bottle. With a per-bottle CA, the exfil
+gains nothing: the CA is bottle-local and dies in minutes.
+
+**Generation cost.** RSA-2048 CA generation is ~200 ms; ECDSA-P256 is
+~5 ms. Either is irrelevant against the per-bottle Docker pull and
+network setup cost.
+
+**Where the CA lives in the bottle's trust store.** Both: a
+distribution-standard path with `update-ca-certificates`, and the
+env-var path. Belt and suspenders, because some Node and Python
+libraries honor the env vars only, and some load only `/etc/ssl/certs/`
+directly.
+
+---
+
+## Cert pinning (brief)
+
+A client that pins ignores the trust store and refuses any cert whose
+public key isn't on a hardcoded list. Three observations for this
+project:
+
+- The current `DEFAULT_ALLOWLIST` (`api.anthropic.com`,
+  `statsig.anthropic.com`, `sentry.io`, `claude.ai`,
+  `platform.claude.com`, `downloads.claude.ai`,
+  `raw.githubusercontent.com`) does not appear to include any host that
+  pins against server-side SDKs. Server-side SDKs (Node, Python) almost
+  universally honor system trust and `NODE_EXTRA_CA_CERTS` /
+  `SSL_CERT_FILE`. Mobile SDKs and Chromium pin; we don't run those.
+- If a future allowlisted host turns out to pin, the mitigation is
+  selective bumping via mitmproxy `ignore_hosts`: that specific
+  hostname tunnels blindly and pipelock loses DLP coverage for it.
+  Coverage on every other host is unaffected.
+- The cost of finding out: a single 5-minute test before adding a host
+  — point mitmproxy at the host, observe whether the client succeeds.
+
+Not a v1 blocker. Document the failure mode and the mitigation.
+
+---
+
+## Comparison table
+
+| | A: mitmproxy → pipelock | B: pipelock → mitmproxy | C: TLS in pipelock | D: mitmproxy + scan API | E: one container |
+|---|---|---|---|---|---|
+| Pipelock sees plaintext | yes | no | yes | yes (via /scan) | yes |
+| Code change to pipelock | none | none | substantial | adds /scan endpoint | none |
+| Sidecar count | 2 | 2 | 1 | 2 | 1 |
+| Cert generation owner | mitmproxy | mitmproxy | pipelock | mitmproxy | mitmproxy |
+| Selective bumping | mitmproxy `ignore_hosts` | mitmproxy `ignore_hosts` | pipelock config | mitmproxy `ignore_hosts` | mitmproxy `ignore_hosts` |
+| Failure isolation per process | yes | yes | n/a (one process) | yes | no (shared container) |
+| License question | none | none | ELv2 risk | ELv2 risk | none |
+| v1 effort | low | low (but pointless) | high | medium | low |
+| Long-term shape | interim | n/a | best | possible | not recommended |
+
+---
+
+## Recommendation
+
+**Adopt Topology A for v1.** Add a mitmproxy sidecar to the egress
+topology, in front of pipelock on the same per-bottle internal network.
+The agent's `HTTPS_PROXY` points at mitmproxy; mitmproxy's upstream is
+pipelock; pipelock's upstream is the real internet.
+
+Concretely:
+
+1. Add a `MitmproxyProxy` class alongside `PipelockProxy`, with the
+   same `prepare` / `start` / `stop` lifecycle. The class generates
+   a per-bottle CA in `stage_dir`, exports the public cert into a
+   second file, and writes a mitmproxy config that:
+   - bumps every CONNECT by default
+   - uses `upstream_proxy = http://pipelock-<slug>:<port>`
+   - listens on a known port inside the per-bottle internal network
+2. Extend the bottle launch step to copy the CA public cert into the
+   agent container under
+   `/usr/local/share/ca-certificates/claude-bottle-mitm.crt`, run
+   `update-ca-certificates`, and set `NODE_EXTRA_CA_CERTS` /
+   `SSL_CERT_FILE` / `REQUESTS_CA_BUNDLE` accordingly.
+3. Repoint the agent's `HTTPS_PROXY` and `HTTP_PROXY` from the pipelock
+   container to the mitmproxy container.
+4. Verify mitmproxy's upstream-proxy mode forwards plaintext (not a
+   re-wrapped CONNECT) to pipelock; if not, use `regular` mode with a
+   chained proxy directive.
+5. Test that pipelock's DLP, subdomain-entropy, and MCP scanners now
+   fire on real request bodies for `api.anthropic.com` traffic.
+
+**Defer Topologies C and D.** Topology C (extending pipelock to
+terminate TLS) is the cleanest long-term shape but is a substantial
+build and runs into the Apache 2.0 vs. ELv2 question. Topology D
+(mitmproxy with pipelock as a scan API) is attractive but requires a
+pipelock surface that doesn't exist today. Both are valid v2 targets;
+neither is the right starting point.
+
+The `network-egress-guard.md` v1 iptables + dnsmasq layer remains
+necessary alongside this — TLS interception covers HTTP/HTTPS only;
+raw TCP, UDP/443 (QUIC), UDP/53 (DNS), and ICMP still need the
+IP-level default-deny.
+
+---
+
+## Open questions
+
+1. **mitmproxy upstream-proxy mode mechanics.** Does mitmproxy in
+   `upstream_proxy` mode forward decrypted HTTP plaintext to the
+   upstream, or does it wrap it in a new CONNECT? The documented
+   behavior changed between mitmproxy 8 and 10. Needs verification
+   against the version we pin.
+2. **Pipelock's behavior when receiving plain HTTP.** Pipelock's
+   `forward_proxy.enabled: true` accepts both `GET http://...` (plain
+   HTTP) and `CONNECT host:443` (HTTPS). After Topology A is wired up,
+   pipelock will see only plain HTTP — does its DLP / MCP scanning
+   pipeline run the full set of layers, or are some gated on the
+   CONNECT path? Confirm by reading
+   `github.com/luckyPipewrench/pipelock/blob/main/docs/configuration.md`.
+3. **CA installation in the Anthropic-provided Claude Code Docker image.**
+   The base image's distribution determines whether `update-ca-certificates`
+   (Debian/Ubuntu) or `update-ca-trust` (Red Hat) is the right command.
+   The current `Dockerfile` should be inspected before assuming Debian.
+4. **HTTP/2 over the agent → mitmproxy hop.** Node's HTTP client
+   negotiates `h2` via ALPN. mitmproxy speaks `h2` to clients in recent
+   versions. Confirm the version we pin supports `h2` end-to-end and
+   doesn't downgrade to `http/1.1` (which would be a silent
+   performance regression).
+5. **Selective-bump policy surface.** Where does the
+   "tunnel this hostname blindly" decision live? Options: a field on
+   `bottle.egress` in the manifest, a fixed list of known-pinning
+   hosts baked into the mitmproxy config, or pipelock-side opt-out.
+   Manifest field is most consistent with the existing
+   `bottle.egress.allowlist` shape.
+6. **Image pin for mitmproxy.** The `pipelock-assessment.md`
+   recommendation is to pin by digest. The mitmproxy Docker Hub image
+   should be pinned the same way. Which release line? `mitmproxy/mitmproxy`
+   ships rolling and tagged versions; the tagged `:11.x` line is the
+   right baseline.
+7. **CA generation in Python (mitmproxy) vs. as a separate step.**
+   mitmproxy generates a CA on first launch if none is provided. For
+   per-bottle ephemerality, we want the CA to be ours, not whatever
+   mitmproxy chooses — so generate the CA in the host-side prepare
+   step and inject it via `--certs *=...`. Mechanics need confirming.
+8. **Domain fronting verification.** Once pipelock sees plaintext, it
+   has access to the inner `Host` / `:authority`. A new rule that
+   compares it against the outer `CONNECT` target catches domain
+   fronting. Worth a follow-up note on whether pipelock has such a
+   rule or whether we add it.
+
+---
+
+## References
+
+- mitmproxy: <https://mitmproxy.org>, <https://github.com/mitmproxy/mitmproxy>
+- mitmproxy `upstream_proxy` mode: <https://docs.mitmproxy.org/stable/concepts/modes/#upstream-proxy>
+- mitmproxy CA cert installation: <https://docs.mitmproxy.org/stable/concepts/certificates/>
+- Squid `ssl_bump`: <https://wiki.squid-cache.org/Features/SslPeekAndSplice>
+- Squid ICAP: <https://wiki.squid-cache.org/Features/ICAP>
+- `goproxy`: <https://github.com/elazarl/goproxy>
+- `gomitmproxy`: <https://github.com/AdguardTeam/gomitmproxy>
+- `martian`: <https://github.com/google/martian>
+- Node TLS / `NODE_EXTRA_CA_CERTS`: <https://nodejs.org/api/cli.html#node_extra_ca_certsfile>
+- Python `SSL_CERT_FILE` and `REQUESTS_CA_BUNDLE`: <https://docs.python.org/3/library/ssl.html#ssl.SSLContext.load_verify_locations>
+- Prior research — pipelock assessment: `docs/research/pipelock-assessment.md`
+- Prior research — network egress guard: `docs/research/network-egress-guard.md`
+- Prior research — secret exfil tripwire encodings: `docs/research/secret-exfil-tripwire-encodings.md`
+
+Research conducted 2026-05-12.