docs(prd): add 0006, enable pipelock's native TLS interception

Supersedes the abandoned PR #8 (`mitmproxy-tls-interception`), which built a mitmproxy + addon chain on the (falsified) premise that pipelock could not MITM. Empirical proof from the impl-time spike: with `tls_interception: { enabled: true, ca_cert, ca_key }` in pipelock's config, pipelock answered a credential POST over HTTPS with `STATUS=403 / body: blocked: request body contains secret: GitHub Token` and emitted both `scanner:"tls_intercept"` and `scanner:"body_dlp"` events. Standalone, no second proxy. Net change vs PR #8: one sidecar instead of two, no vendored addon, no addon-verdict pattern matching, no HTTPS-trust / DNS / lookup workarounds. Same end-state behavior — pipelock's DLP fires on plaintext for HTTPS hosts in the allowlist. Also cleaning up the now-stale TLS-research notes: - `docs/research/tls-mitm-for-pipelock.md` is removed. Its entire premise (mitmproxy in front of pipelock) is moot now that pipelock does the work natively. The mechanics of CONNECT bumping and the CA-lifecycle considerations it documented are the same as what pipelock implements; the PRD restates the parts that matter for the integration. - `docs/research/pipelock-assessment.md` had two stale claims corrected: the "Pipelock does not perform TLS inspection (no CA trust injection)" line in §Scope gaps and the "no TLS termination" cell in the comparison table. Both now point at the `tls_interception` config and `pipelock tls` CLI instead. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-12 14:15:44 -04:00
parent e45cd2fb07
commit 6716f091c1
3 changed files with 312 additions and 513 deletions
@@ -0,0 +1,303 @@
+# PRD 0006: pipelock native TLS interception
+
+- **Status:** Draft
+- **Author:** didericis
+- **Created:** 2026-05-12
+
+## Summary
+
+Turn on pipelock's built-in `tls_interception` so its DLP / URL /
+header / MCP scanners fire on the plaintext of HTTPS requests
+instead of only the outer `CONNECT` hostname. Pipelock generates a
+per-bottle ephemeral CA at launch (`pipelock tls init`); the
+public cert is installed into the agent container's trust store
+and the private key dies with the sidecar on teardown. The
+existing per-agent sidecar topology from PRD 0001 is otherwise
+unchanged — one container, no addon, no second proxy.
+
+This supersedes the closed PR #8 / branch `mitmproxy-tls-interception`,
+which built a mitmproxy + addon chain on the (falsified) premise
+that pipelock could not MITM. Empirical proof from the impl-time
+spike: with `tls_interception: { enabled: true, ca_cert, ca_key }`
+in the pipelock config, pipelock answered a credential POST over
+HTTPS with `STATUS=403 / body: blocked: request body contains
+secret: GitHub Token` and emitted both
+`scanner:"tls_intercept"` and `scanner:"body_dlp"` events.
+
+## Problem
+
+PRD 0001 wired pipelock onto every bottle's egress, but pipelock
+ran with its default `tls_interception.enabled: false`. The agent
+container's only egress route is pipelock, but pipelock only saw
+`CONNECT` hostnames and the encrypted bytes inside the tunnel.
+Pipelock's headline scanners — request body DLP (48 credential
+patterns), header DLP, URL DLP, subdomain entropy, MCP scanning,
+response-body scanning — all need plaintext to fire. Against the
+HTTPS-only hosts in `DEFAULT_ALLOWLIST` (`api.anthropic.com`,
+`raw.githubusercontent.com`, etc.) they are effectively disabled.
+
+The existing `tests/integration/test_pipelock_blocks_secret_post`
+test only fires because it forces the agent to send plain HTTP
+through pipelock's forward-proxy mode. Real Claude Code traffic
+uses HTTPS via CONNECT and slips past the scanner.
+
+## Goals / Success Criteria
+
+The feature works when all of the following are observable:
+
+- A Node / curl request from inside a launched bottle to a
+  CONNECT-bumped HTTPS host (e.g. `https://api.anthropic.com/dlp-probe`)
+  carrying a pipelock-recognized credential pattern in the body
+  returns 403 from pipelock with the documented
+  `blocked: request body contains secret: …` body. Pipelock's
+  `body_dlp` event fires on the decrypted request.
+- A clean HTTPS GET from inside the bottle to an allowlisted host
+  (e.g. `https://raw.githubusercontent.com/...`) returns the real
+  upstream response — TLS interception doesn't break legitimate
+  traffic.
+- The agent's TLS library trusts pipelock's bumped leaf certs
+  (per the bottle's installed CA); no TLS-trust errors.
+- Claude Code reaches `api.anthropic.com` end-to-end through the
+  bottle and completes a chat round-trip.
+
+The feature is **done** when all of the following ship:
+
+- `pipelock_build_config` / `pipelock_render_yaml` emit a
+  `tls_interception` block with `enabled: true` and the per-bottle
+  CA cert/key paths. The defaults
+  (`cert_ttl: 24h`, `cert_cache_size: 10000`,
+  `passthrough_domains: []`) are kept; only `enabled` and the
+  cert paths are populated.
+- The prepare step generates a per-bottle CA via `pipelock tls init`
+  in a one-shot container, writes `ca.pem` and `ca-key.pem` to
+  `stage_dir`. Paths land on the `DockerBottlePlan`.
+- `DockerPipelockProxy.start` mounts the stage dir into the
+  sidecar (read-only) so the running pipelock can read its CA.
+- `BottleBackend.provision_ca` (new) copies the CA public cert
+  into the agent at
+  `/usr/local/share/ca-certificates/claude-bottle-mitm.crt`, runs
+  `update-ca-certificates`, and sets the `NODE_EXTRA_CA_CERTS` /
+  `SSL_CERT_FILE` / `REQUESTS_CA_BUNDLE` env trio on the agent
+  container's runtime env. Default no-op on the abstract base so
+  other backends aren't forced to implement.
+- The launch step prints a one-line stderr log with the SHA-256
+  fingerprint of the public CA cert (computed via stdlib
+  `ssl.PEM_cert_to_DER_cert` + `hashlib.sha256`).
+- On bottle teardown the sidecar is removed and the CA private
+  key is gone with it.
+- Two new integration tests under `tests/integration/`:
+  - HTTPS variant of the credential-post block test (proves the
+    `tls_intercept` + `body_dlp` chain fires end-to-end).
+  - Clean HTTPS GET test (proves the allow path doesn't break TLS
+    trust and returns real upstream content).
+- The dry-run preflight (`start --dry-run`) renders the new TLS
+  layer. Text: one line under the egress summary. JSON: a
+  reserved `egress.tls_interception: { enabled: true,
+  ca_fingerprint: null }` block — fingerprint is null at dry-run
+  because the CA only exists after launch.
+
+## Non-goals
+
+- A second proxy in the chain. Pipelock does the bumping
+  natively; the mitmproxy approach was based on a wrong premise
+  (closed PR #8).
+- Per-bottle override to disable interception. v1 always enables
+  `tls_interception`. The pipelock-side `passthrough_domains`
+  list is the right knob if a future allowlisted host turns out
+  to pin certs — exposing it through the manifest is a follow-up.
+- A long-lived / shared CA across bottles. Each bottle gets a
+  fresh CA generated by `pipelock tls init` and destroyed with the
+  sidecar.
+- Tuning `cert_ttl`, `cert_cache_size`, `max_response_bytes`,
+  `cross_request_detection`, or other pipelock advanced features.
+  Defaults from `pipelock generate config --preset strict` are
+  fine for v1.
+- Trust-store paths for non-Debian agent images.
+  `node:22-slim` is Debian; `update-ca-certificates` is the right
+  command. A Red-Hat-family base would need `update-ca-trust`.
+- HTTP/3 / QUIC. Pipelock's interception is HTTP/HTTPS-over-TLS;
+  UDP/443 still needs an iptables layer (separate PRD).
+
+## Scope
+
+### In scope
+
+- **`claude_bottle/pipelock.py`** changes:
+  - Extend `pipelock_build_config` to include
+    `tls_interception: { enabled: true, ca_cert: <path>, ca_key:
+    <path> }`. Paths are populated from the plan; the function's
+    signature grows a `cert_path` / `key_path` pair or reads them
+    off `Bottle` once they're stored.
+  - Extend `pipelock_render_yaml` to emit the new block.
+- **`claude_bottle/backend/docker/pipelock.py`** changes:
+  - New helper `pipelock_tls_init(stage_dir)` runs the upstream
+    image as a one-shot:
+    `docker run --rm -v <stage>:/h -e PIPELOCK_HOME=/h pipelock tls init`,
+    leaving `ca.pem` and `ca-key.pem` under `stage_dir`. The host
+    file owner is whatever the upstream image's user is; the
+    sidecar mount is read-only so this is fine.
+  - `DockerPipelockProxy.start` mounts the stage dir into the
+    sidecar at `/h:ro` and references the CA paths in the rendered
+    YAML.
+- **`claude_bottle/backend/__init__.py`**: new abstract method
+  `provision_ca(plan, target)` on `BottleBackend`, default no-op.
+  `BottleBackend.provision` orchestrates `ca → prompt → skills →
+  ssh → git`.
+- **`claude_bottle/backend/docker/provision/ca.py`** (new):
+  - Reads the cert from `stage_dir` (already written by prepare).
+  - `docker cp` into the agent.
+  - `docker exec -u 0 ... chmod 644 ...` + `update-ca-certificates`.
+  - Computes the SHA-256 fingerprint with stdlib (`ssl` +
+    `hashlib`), emits one stderr log line.
+- **`claude_bottle/backend/docker/launch.py`**:
+  - Three new `-e` flags on the agent's `docker run`:
+    `NODE_EXTRA_CA_CERTS=/usr/local/share/ca-certificates/claude-bottle-mitm.crt`,
+    `SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt`,
+    `REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt`.
+  - `HTTPS_PROXY` / `HTTP_PROXY` continue to point at pipelock
+    (unchanged from PRD 0001 — the mitmproxy detour in PR #8 is
+    abandoned).
+- **`claude_bottle/backend/docker/bottle_plan.py`**:
+  - One new `info(...)` line in `print()` noting TLS interception
+    is on.
+  - `to_dict()` gains an `egress.tls_interception: { enabled:
+    true, ca_fingerprint: null }` block. Reserved for future
+    population.
+- **`claude_bottle/backend/docker/prepare.py`**: call
+  `pipelock_tls_init(stage_dir)` and write the resolved cert/key
+  paths onto the plan (either on the existing `proxy_plan` field
+  or on the parent `DockerBottlePlan`).
+- **Tests:**
+  - `tests/integration/test_pipelock_blocks_secret_https_post.py`
+    (new) — HTTPS variant of the existing block test.
+  - `tests/integration/test_pipelock_allows_normal_https.py`
+    (new) — clean HTTPS GET succeeds.
+  - `tests/unit/test_pipelock_yaml.py` updated to assert the new
+    `tls_interception` block in the rendered config.
+  - `tests/integration/test_dry_run_plan.py` updated to assert
+    the new `egress.tls_interception` JSON block.
+
+### Out of scope
+
+- Modifying pipelock itself. We're using existing config knobs.
+- A manifest field to disable / customize interception per bottle.
+  Doable but premature.
+- Wiring `passthrough_domains`. The default `[]` is correct for
+  v1; add the manifest field when a pinning host shows up.
+- `cross_request_detection`, `entropy_budget`,
+  `fragment_reassembly`, `reverse_proxy`, `scan_api` — features
+  pipelock exposes but we don't need for the body-DLP gap.
+
+## Proposed Design
+
+### Topology
+
+```
+agent --HTTPS_PROXY--> pipelock --[bumps TLS]--> internet
+                       (sees plaintext: URL, headers, body)
+```
+
+Same single-sidecar shape as PRD 0001. The only addition is
+`tls_interception` in pipelock's config plus the per-bottle CA
+generated at prepare time.
+
+### CA lifecycle
+
+- **Generation.** Host-side, at prepare time, via a one-shot
+  `docker run --rm -v <stage>:/h pipelock tls init`. Output is
+  `<stage>/ca.pem` + `<stage>/ca-key.pem`, both mode 600.
+- **Sidecar mount.** `DockerPipelockProxy.start` adds
+  `-v <stage>:/h:ro` to the sidecar's `docker run`. The rendered
+  YAML references `/h/ca.pem` and `/h/ca-key.pem`. The private
+  key is read-only from pipelock's perspective; the host stage
+  dir is owned by the launching user.
+- **Bottle install.** `provision_ca` (Docker impl) does
+  `docker cp <stage>/ca.pem agent:/usr/local/share/ca-certificates/claude-bottle-mitm.crt`,
+  then `update-ca-certificates`. The CA env trio is set at
+  `docker run -e` time (Docker propagates run-time env into
+  `docker exec`, verified in PR #8's spike).
+- **Teardown.** The sidecar container is destroyed, the stage
+  dir is removed by `start.py`'s existing `finally` block, and
+  the CA dies with both.
+- **Fingerprint.** Computed via stdlib in `provision_ca` and
+  logged once to stderr (`claude-bottle: mitm ca fingerprint:
+  sha256:<hex>…`). The private key never appears in any log.
+
+### Data model changes
+
+None to the manifest schema. The dry-run JSON contract grows a
+reserved `egress.tls_interception` block; the fingerprint is
+always null at dry-run because the CA doesn't exist yet.
+
+### Existing code touched
+
+Surgical, all on the existing pipelock path:
+
+- `claude_bottle/pipelock.py` — config builder + YAML renderer.
+- `claude_bottle/backend/__init__.py` — abstract `provision_ca`.
+- `claude_bottle/backend/docker/pipelock.py` — `tls init` helper,
+  sidecar volume mount.
+- `claude_bottle/backend/docker/prepare.py` — CA paths on plan.
+- `claude_bottle/backend/docker/launch.py` — CA env trio on agent.
+- `claude_bottle/backend/docker/backend.py` — `provision_ca`
+  dispatch + thread `self._proxy` through prepare/launch unchanged
+  shape.
+- `claude_bottle/backend/docker/bottle_plan.py` — preflight
+  rendering.
+- `claude_bottle/backend/docker/provision/ca.py` (new).
+
+Net diff is meaningfully smaller than PR #8 because pipelock
+already does the work — no addon, no second sidecar, no second
+backend module.
+
+### External dependencies
+
+- **Pipelock image** — unchanged pin from PRD 0001
+  (`ghcr.io/luckypipewrench/pipelock@sha256:3b1a3941…`,
+  matching pipelock v2.3.0). No new image dependency.
+- **No host-side crypto deps.** CA generation uses the pipelock
+  image's own `tls init` command in a one-shot container.
+  Fingerprint uses Python stdlib `ssl` + `hashlib`.
+
+## Open questions
+
+- **Mount semantics for the stage dir.** The sidecar runs with a
+  `-v <host-stage>:/h:ro` bind mount. The CA files were written by
+  the one-shot `pipelock tls init` container with whatever UID
+  pipelock's image uses; the sidecar reads them as that same UID.
+  Should work, but confirm on first impl by inspecting the file
+  modes/owners and that the sidecar actually loads them. Fallback:
+  `docker cp` the cert/key into the running sidecar after `docker
+  create` (mirror PR #8's mitmproxy lifecycle).
+- **Cert validity / TTL.** Defaults are `cert_ttl: 24h` for
+  per-host leaves; the CA validity from `pipelock tls init` is
+  10 years by default (`--validity 87600h`). The CA outlives the
+  bottle either way; per-bottle ephemerality is enforced by
+  *generating a fresh one each launch*, not by setting a short
+  CA validity. Document; no tuning in v1.
+- **`passthrough_domains` shape.** Once we expose this through
+  the manifest in a follow-up, the natural place is
+  `bottle.egress.tls_passthrough_domains: [host, ...]`, mirroring
+  the existing `egress.allowlist` shape.
+- **Stage-dir cleanup ordering.** The stage dir holds the CA
+  private key briefly. `start.py`'s existing `finally` block
+  `shutil.rmtree`s it. Confirm the rmtree fires after the sidecar
+  is stopped, so the sidecar isn't reading a deleted mount when
+  it shuts down. The current order is correct (teardown unwinds
+  via ExitStack before the outer `finally` runs); verify.
+
+## References
+
+- `docs/research/pipelock-assessment.md` (now corrected) —
+  pipelock capability assessment including the
+  `tls_interception` block.
+- `docs/prds/0001-per-agent-egress-proxy-via-pipelock.md` —
+  egress-proxy baseline this PRD extends.
+- `docs/prds/0003-bottle-backend-abstraction.md` — backend ABC
+  contract this PRD adds a `provision_ca` method to.
+- `docs/prds/0004-split-out-provisioners.md` — per-provisioner
+  module pattern reused for the new CA provisioner.
+- Pipelock `tls` CLI (in-image help):
+  `pipelock tls init / install-ca / show-ca`.
+- Closed PR #8 — earlier mitmproxy-based design built on the
+  falsified "pipelock can't MITM" premise; archived for context.
@@ -222,10 +222,14 @@ The following threat-model items from `network-egress-guard.md` are
  intercept raw UDP 53 packets.
 - **Domain fronting**: an agent can send `CONNECT allowed-host.com:443`
  through the proxy but embed a different SNI inside the TLS session.
-  Pipelock does not perform TLS inspection (no CA trust injection) and
-  cannot verify SNI vs. CONNECT header. The same limitation is shared
-  with smokescreen and is documented in `network-egress-guard.md` as a
-  known gap for the non-TLS-terminating proxy approach.
+  Pipelock supports TLS interception via its `tls_interception` config
+  block (`enabled`, `ca_cert`, `ca_key`, `cert_ttl`, `cert_cache_size`,
+  `passthrough_domains`, `max_response_bytes`) plus the `pipelock tls
+  init` / `install-ca` / `show-ca` CLI; with interception on, the
+  body and inner Host header become visible to its scanner pipeline,
+  closing the domain-fronting gap. With interception off (default in
+  the generated config), pipelock relays the CONNECT as an opaque
+  tunnel and only sees the outer hostname.
 - **SSH egress content**: SSH sessions to permitted hosts are opaque.
  Same limitation noted in both prior research notes.
 - **Agent killing the proxy process**: if pipelock runs inside the same
@@ -385,7 +389,7 @@ pipelock's differentiators.
 | Blocks RFC 1918 by default | only if explicitly added to rules | yes | yes, + DNS rebinding | no |
 | Content-based DLP (credential patterns) | no | no | yes, 48 patterns + encoding normalization | no |
 | MCP / WebSocket scanning | no | no | yes, bidirectional | no |
-| Domain fronting bypass | possible | possible | possible (no TLS termination) | n/a |
+| Domain fronting bypass | possible | possible | mitigated when `tls_interception` is enabled (CA trust required in client) | n/a |
 | macOS Docker Desktop (sidecar mode) | yes | yes | yes | yes |
 | macOS Docker Desktop (in-container sandbox) | yes | n/a | degraded (--best-effort) | yes |
 | NET_ADMIN / NET_RAW required | yes | no | no (sidecar) | no |
@@ -1,508 +0,0 @@
-# TLS interception for pipelock content scanning
-
-Research into adding TLS termination ("MITM") to the egress path so that
-pipelock's scanning pipeline can see plaintext HTTP request and response
-bodies, instead of only the `CONNECT` host and opaque ciphertext.
-
-## Summary
-
- Pipelock today sees `CONNECT` hostnames and the encrypted bytes that follow.
-  Its DLP, subdomain-entropy, and MCP scanners cannot fire on TLS-encrypted
-  bodies, which is the gap explicitly named under "Scope gaps" in
-  `pipelock-assessment.md` ("Pipelock does not perform TLS inspection (no CA
-  trust injection)").
- Closing that gap requires a TLS-terminating proxy that bumps `CONNECT`,
-  presents a leaf certificate for the target hostname signed by a CA the
-  bottle's trust store accepts, decrypts the inner HTTP, and re-establishes
-  TLS to the real upstream.
- The mature open-source option is **mitmproxy**. Squid + `ssl_bump` is the
-  heavier production-grade alternative. The Go ecosystem (`goproxy`,
-  `gomitmproxy`, `martian`) is suitable only if we want a custom binary
-  tightly coupled to pipelock.
- Recommended v1 topology: **mitmproxy in front of pipelock** on the same
-  egress route. mitmproxy terminates client TLS, forwards plaintext to
-  pipelock as its upstream HTTP proxy, and re-encrypts to the real upstream.
-  Pipelock stays unchanged.
- Per-bottle ephemeral CA, generated at bottle start and destroyed on
-  teardown. The CA private key lives only on the sidecar; the bottle's
-  trust store only ever sees the public cert.
- Cert pinning is a known caveat but a small one given the narrow allowlist
-  in this project. Selective bumping is the mitigation if a future
-  allowlisted host turns out to pin.
-
---
-
-## What pipelock cannot see today
-
-The current egress topology (per `pipelock-assessment.md`):
-
-```
-agent --HTTPS_PROXY--> pipelock --CONNECT host:443--> internet
-                                  \____________________________
-                                       opaque TLS bytes
-```
-
-The agent's client (Claude Code, `curl`, an MCP server, a Python SDK)
-sends `CONNECT api.anthropic.com:443`. Pipelock checks the hostname
-against its `api_allowlist`, replies `200 Connection Established`, and
-then blindly relays bytes between the two TCP halves. The TLS handshake
-and everything inside it happens end-to-end between the agent and the
-real upstream.
-
-What pipelock can scan in this mode:
-
- `CONNECT` target hostname (SNI is not even needed).
- TLS record framing and lengths (useful for budgets, useless for DLP).
- Plain HTTP/1.1 to non-HTTPS destinations (irrelevant — there are none
-  in `DEFAULT_ALLOWLIST`).
-
-What pipelock cannot scan in this mode:
-
- Request URL, method, headers, body.
- Response status, headers, body.
- MCP JSON-RPC payloads inside the TLS session.
- WebSocket frames inside a TLS-wrapped upgrade.
- Whether the inner SNI or HTTP `Host` / `:authority` matches the
-  outer `CONNECT` target (domain-fronting check).
-
-The 48-pattern DLP layer, the subdomain-entropy check (insofar as it
-inspects URLs rather than DNS-resolver queries), the request-redaction
-feature added in v2.3.0, and bidirectional MCP scanning all require
-plaintext to operate on. Without TLS termination, those layers are
-inert against any HTTPS destination — which is every destination in
-the current allowlist.
-
---
-
-## How TLS interception works
-
-The mechanics of `CONNECT` bumping, end to end:
-
-1. **Agent issues `CONNECT`.** The HTTP client sees `HTTPS_PROXY` set,
-   so it opens a TCP connection to the proxy and sends
-   `CONNECT api.anthropic.com:443 HTTP/1.1`.
-2. **Proxy answers `200`.** Standard tunnel-established response.
-3. **Proxy starts TLS as the server.** Instead of relaying bytes, the
-   proxy itself performs a TLS handshake with the agent. It needs a
-   server certificate for `api.anthropic.com` — so on first contact for
-   that hostname, the proxy generates a leaf certificate with
-   `CN=api.anthropic.com` and a SAN for the same, signs it with its
-   own CA private key, and presents that cert. Subsequent connections
-   to the same hostname reuse the cached leaf.
-4. **Agent verifies the cert.** The agent's TLS library walks the chain
-   to a trusted root. Because the bottle's trust store contains the
-   proxy's CA cert, validation succeeds. The agent has no way to tell
-   it isn't talking to the real `api.anthropic.com`.
-5. **Proxy opens its own TLS to the real upstream.** As a client this
-   time, using the system root store, talking to the real
-   `api.anthropic.com`. Real SNI, real cert chain validated normally.
-6. **Proxy bridges the two TLS sessions.** Decrypts on the server side,
-   re-encrypts on the client side, and scans the plaintext in between.
-
-This is what every TLS-terminating egress proxy does. The trade-offs
-live in three places:
-
- **CA trust injection.** Step 4 only works if the bottle's trust
-  store contains the proxy's CA. Mechanics covered under "CA lifecycle"
-  below.
- **Cert generation cost.** Generating an RSA-2048 leaf cert takes
-  ~50 ms; ECDSA P-256 is ~5 ms. Cache leaves per (hostname, SAN list)
-  to keep this off the steady-state hot path.
- **Protocol coverage.** The proxy needs to speak HTTP/1.1, HTTP/2 (ALPN
-  `h2`), and ideally WebSocket. HTTP/3 / QUIC is UDP and requires a
-  separate code path; for v1, blocking UDP/443 at the iptables layer
-  forces clients to fall back to HTTP/2, which we can inspect.
-
---
-
-## Tools
-
-### mitmproxy
-
- **What it is.** Python (with Rust crypto bits) interactive HTTPS proxy.
-  Reference open-source implementation of the bump pattern. Ships as
-  `mitmproxy` (TUI), `mitmweb` (browser UI), and `mitmdump` (headless).
- **Cert handling.** Generates a CA on first run under `~/.mitmproxy/`.
-  Per-host leaves are generated on demand and cached in memory. Cert
-  cache keyed by (hostname, SAN extensions inferred from upstream cert).
- **Protocols.** HTTP/1.1, HTTP/2, WebSocket fully supported. HTTP/3
-  exists as experimental. Raw TCP / non-HTTP TLS supported via
-  `--mode reverse:` but not in CONNECT-bump mode.
- **Extensibility.** Python addon API. An addon module can inspect or
-  modify any `request` / `response` / `tcp_message` flow. The pipelock
-  integration in Topology D below uses this.
- **Selective bumping.** `ignore_hosts` regex; matching CONNECTs are
-  tunneled blindly instead of bumped. Critical for the cert-pinning
-  mitigation.
- **Docker image.** `mitmproxy/mitmproxy` on Docker Hub. Single binary
-  for the CLI, ~80 MB image. Configurable via flags or `~/.mitmproxy/config.yaml`.
- **Project URL.** <https://mitmproxy.org>, <https://github.com/mitmproxy/mitmproxy>.
-
-Most mature, best-documented, lowest-effort integration. Default choice
-for v1.
-
-### Squid + ssl_bump
-
- **What it is.** Squid is a long-running C++ caching proxy.
-  `ssl_bump` is its TLS-interception feature, controlled by per-CONNECT
-  actions: `splice` (tunnel blindly), `bump` (decrypt and re-encrypt),
-  `peek` (look at TLS hello then decide), `stare` (look at server cert
-  then decide), `terminate` (abort the connection).
- **Cert handling.** Configured via `sslcrtd_program` — a helper that
-  generates and caches per-host certs. CA cert and key referenced by
-  PEM paths in `squid.conf`.
- **Protocols.** HTTP/1.1 fully; HTTP/2 to clients via recent versions;
-  no scripted addons.
- **Extensibility.** ICAP (Internet Content Adaptation Protocol) for
-  external scanners — Squid POSTs each request/response to an ICAP
-  service that can modify or reject. This is the formal version of
-  Topology D below.
- **Production track record.** Used at corporate-proxy scale (large
-  enterprises, ISPs). Heavyweight for a single-bottle sidecar.
- **Project URL.** <https://wiki.squid-cache.org/Features/SslPeekAndSplice>.
-
-Right tool if pipelock grows an ICAP server endpoint. Otherwise, more
-config surface than this project needs.
-
-### Go libraries: goproxy, gomitmproxy, martian
-
- **`goproxy`** (elazarl) — long-lived Go library, basic CONNECT-bumping
-  proxy with a handler API. Sparse on HTTP/2.
-  <https://github.com/elazarl/goproxy>
- **`gomitmproxy`** (AdGuard) — newer, cleaner API; built for AdGuard
-  Home / DNS-filtering products. HTTP/2 support is partial.
-  <https://github.com/AdguardTeam/gomitmproxy>
- **`martian`** (Google) — request/response modifier framework with a
-  JSON-configurable rule engine. Used internally at Google; public
-  ecosystem thin.
-  <https://github.com/google/martian>
-
-These are relevant only if we decide to write a custom TLS-terminating
-binary that links pipelock's scanning packages directly — Topology C
-below. They are not faster than mitmproxy for the v1 sidecar shape;
-they are smaller and more direct, at the cost of writing more Go.
-
-### Disqualified
-
- **Caddy, Envoy, HAProxy.** All can terminate TLS at a reverse-proxy
-  vhost. None ship a "bump on CONNECT and forward plaintext to a
-  downstream proxy" mode out of the box. Adapting any of them to this
-  shape is more work than starting from mitmproxy.
- **Cloudflare Gateway, Zscaler, NetSkope, Forcepoint.** Managed cloud
-  egress with TLS inspection. Wrong topology — they live outside the
-  host, not as a per-bottle sidecar, and they require trusting a vendor
-  with full plaintext.
- **Charles Proxy, Burp Suite.** Closed-source GUI tools for developer
-  capture and security testing. Not appropriate as headless sidecars.
- **`mitmdump` standalone vs. embedding mitmproxy as a library.** Both
-  are mitmproxy. Calling out only to note: the project ships both a CLI
-  and a Python API; addons can be loaded either way.
-
---
-
-## Topologies
-
-Five candidate topologies, ordered roughly from least to most coupled
-between the two components.
-
-### A — mitmproxy in front of pipelock (recommended)
-
-```
-agent --HTTPS_PROXY--> mitmproxy --HTTP_PROXY--> pipelock --> internet
-                       (bump TLS)               (scan plain)  (real TLS)
-```
-
-mitmproxy terminates the agent's TLS connection, decrypts, and then
-forwards the inner HTTP request to pipelock by treating pipelock as
-its own upstream HTTP forward proxy. Pipelock receives plaintext HTTP
-exactly as if the agent had used HTTP, applies its full scanning
-pipeline, and forwards to mitmproxy's upstream client half — which
-re-establishes TLS to the real destination.
-
-Concretely the agent's `HTTPS_PROXY` points at mitmproxy; mitmproxy's
-`upstream_proxy` config points at pipelock; pipelock's network reach
-includes the real internet.
-
- **Wins.** Pipelock unchanged. mitmproxy unchanged from default
-  configuration. Each component has one job. Failure modes are clear
-  per layer.
- **Costs.** Two sidecars per bottle instead of one. One extra
-  decrypt / re-encrypt hop, ~5–15 ms per request in steady state.
- **Open question.** How exactly mitmproxy forwards to pipelock matters
-  for whether pipelock sees TLS again or only HTTP. mitmproxy's
-  `upstream` mode wraps the decrypted request in another CONNECT if the
-  destination is HTTPS — which would re-encrypt before pipelock sees
-  it, defeating the point. The correct mode is `upstream` with TLS
-  re-origination disabled, or `regular` mode with a chained proxy. The
-  v2 release of mitmproxy reworked this; needs verification against the
-  current docs at integration time.
-
-### B — pipelock in front of mitmproxy (ruled out)
-
-```
-agent --HTTPS_PROXY--> pipelock --CONNECT?--> mitmproxy --> internet
-                       (sees CONNECT only)   (bump TLS)
-```
-
-Pipelock would receive a `CONNECT` and decide to allow or deny based
-on hostname, then tunnel to mitmproxy. mitmproxy would terminate TLS
-and see plaintext — but pipelock would never see the plaintext, which
-is the whole point of the exercise. The scanning still happens (in
-mitmproxy), but it isn't pipelock doing it, so we'd need an entirely
-different rule engine. Ruled out.
-
-### C — Extend pipelock itself to terminate TLS
-
-Two sub-variants:
-
-**C.1 — Upstream a `tls_terminate` mode.** Submit a feature to
-pipelock that adds CONNECT bumping and per-host cert generation in Go,
-using `crypto/tls` and the existing scanning packages. Pipelock becomes
-a self-contained MITM proxy. License question matters here: the Apache
-2.0 core can grow new features in-tree, but if upstream insists this
-belongs in `enterprise/` (ELv2), we either accept ELv2 or fork.
-
-**C.2 — Wrap pipelock in a thin Go binary in the same container.** A
-small Go program does the TLS half (`CONNECT` parsing, cert generation,
-TLS handshake) and pipes plaintext to pipelock over UDS or loopback.
-The wrapper is ours; pipelock is unmodified. No license question.
-
- **Wins.** Single component on the egress path. Pipelock owns the
-  scanning end-to-end, including domain-fronting checks (SNI vs.
-  `Host` vs. `CONNECT`).
- **Costs.** Real Go engineering effort. CA generation, cert caching,
-  TLS handshake, HTTP/2 ALPN negotiation, WebSocket upgrade — all
-  things mitmproxy already solves.
- **When.** Right shape for v2 or v3 once the v1 mitmproxy-in-front
-  topology has proven the integration works and the scanning rules are
-  stable.
-
-### D — mitmproxy as the proxy, pipelock as a content-scan subroutine
-
-```
-agent --HTTPS_PROXY--> mitmproxy --> internet
-                       (bump TLS)
-                          |
-                          v
-                       POST /scan to pipelock
-                       <- allow / block / redact
-```
-
-A Python addon in mitmproxy sends each decrypted request (and response)
-to a pipelock HTTP `/scan` endpoint and gates the flow on the verdict.
-mitmproxy handles all networking; pipelock is the rule engine only.
-
- **Wins.** Clean separation of concerns. Pipelock doesn't have to
-  speak TLS at all. The addon is small, ~100 lines of Python.
- **Costs.** Requires pipelock to expose a scan API. The current Apache
-  2.0 core does not document one. If `/scan` lives in `enterprise/`,
-  ELv2 applies. If it doesn't exist, we'd be asking pipelock for a new
-  surface.
- **Variant.** Squid's ICAP path is the formalized version of the same
-  pattern.
-
-### E — Single container, two processes
-
-mitmproxy and pipelock share a container, started by `supervisord` or
-`s6-overlay`. Networking simplifies to localhost. Lifecycle complicates:
-container restart now means restarting both; failure of one process is
-not visible at the Docker layer; logs interleave.
-
- **Wins.** Slightly less Docker plumbing in `cli.py`.
- **Costs.** Operational complexity not worth the savings. The two
-  containers are independent processes with independent failure modes;
-  Docker is the right tool for that.
-
-Net: not recommended.
-
---
-
-## CA lifecycle
-
-The CA private key is the asset to defend. With it, anyone can issue
-certs that the bottle's trust store will accept for any hostname. So:
-
-**Per-bottle ephemeral CA.** At bottle start, generate a fresh
-RSA-2048 or ECDSA-P256 CA inside the mitmproxy sidecar. Export only
-the public cert (PEM) into the bottle's trust store at one of:
-
- `/usr/local/share/ca-certificates/claude-bottle-mitm.crt` followed by
-  `update-ca-certificates` (Debian/Ubuntu base images).
- `/etc/pki/ca-trust/source/anchors/` with `update-ca-trust`
-  (Red-Hat-family).
- `$NODE_EXTRA_CA_CERTS` for Node-based agents (Claude Code).
- `$SSL_CERT_FILE` / `$REQUESTS_CA_BUNDLE` for Python SDKs.
-
-The private key never leaves the sidecar's filesystem. The CA cert
-public half is the only artifact that crosses into the bottle.
-
-On bottle teardown, the sidecar container is destroyed; the CA dies
-with it. The next bottle gets a fresh CA. No long-lived MITM CA on
-disk.
-
-**Why not a shared per-host CA.** A persistent CA across bottles is
-faster (no generation at start) but is a real liability: if any bottle
-exfiltrates the CA cert public half (which it can — it's in the trust
-store by design), an attacker on the host network could in principle
-impersonate any host to any bottle. With a per-bottle CA, the exfil
-gains nothing: the CA is bottle-local and dies in minutes.
-
-**Generation cost.** RSA-2048 CA generation is ~200 ms; ECDSA-P256 is
-~5 ms. Either is irrelevant against the per-bottle Docker pull and
-network setup cost.
-
-**Where the CA lives in the bottle's trust store.** Both: a
-distribution-standard path with `update-ca-certificates`, and the
-env-var path. Belt and suspenders, because some Node and Python
-libraries honor the env vars only, and some load only `/etc/ssl/certs/`
-directly.
-
---
-
-## Cert pinning (brief)
-
-A client that pins ignores the trust store and refuses any cert whose
-public key isn't on a hardcoded list. Three observations for this
-project:
-
- The current `DEFAULT_ALLOWLIST` (`api.anthropic.com`,
-  `statsig.anthropic.com`, `sentry.io`, `claude.ai`,
-  `platform.claude.com`, `downloads.claude.ai`,
-  `raw.githubusercontent.com`) does not appear to include any host that
-  pins against server-side SDKs. Server-side SDKs (Node, Python) almost
-  universally honor system trust and `NODE_EXTRA_CA_CERTS` /
-  `SSL_CERT_FILE`. Mobile SDKs and Chromium pin; we don't run those.
- If a future allowlisted host turns out to pin, the mitigation is
-  selective bumping via mitmproxy `ignore_hosts`: that specific
-  hostname tunnels blindly and pipelock loses DLP coverage for it.
-  Coverage on every other host is unaffected.
- The cost of finding out: a single 5-minute test before adding a host
-  — point mitmproxy at the host, observe whether the client succeeds.
-
-Not a v1 blocker. Document the failure mode and the mitigation.
-
---
-
-## Comparison table
-
-| | A: mitmproxy → pipelock | B: pipelock → mitmproxy | C: TLS in pipelock | D: mitmproxy + scan API | E: one container |
-|---|---|---|---|---|---|
-| Pipelock sees plaintext | yes | no | yes | yes (via /scan) | yes |
-| Code change to pipelock | none | none | substantial | adds /scan endpoint | none |
-| Sidecar count | 2 | 2 | 1 | 2 | 1 |
-| Cert generation owner | mitmproxy | mitmproxy | pipelock | mitmproxy | mitmproxy |
-| Selective bumping | mitmproxy `ignore_hosts` | mitmproxy `ignore_hosts` | pipelock config | mitmproxy `ignore_hosts` | mitmproxy `ignore_hosts` |
-| Failure isolation per process | yes | yes | n/a (one process) | yes | no (shared container) |
-| License question | none | none | ELv2 risk | ELv2 risk | none |
-| v1 effort | low | low (but pointless) | high | medium | low |
-| Long-term shape | interim | n/a | best | possible | not recommended |
-
---
-
-## Recommendation
-
-**Adopt Topology A for v1.** Add a mitmproxy sidecar to the egress
-topology, in front of pipelock on the same per-bottle internal network.
-The agent's `HTTPS_PROXY` points at mitmproxy; mitmproxy's upstream is
-pipelock; pipelock's upstream is the real internet.
-
-Concretely:
-
-1. Add a `MitmproxyProxy` class alongside `PipelockProxy`, with the
-   same `prepare` / `start` / `stop` lifecycle. The class generates
-   a per-bottle CA in `stage_dir`, exports the public cert into a
-   second file, and writes a mitmproxy config that:
-   - bumps every CONNECT by default
-   - uses `upstream_proxy = http://pipelock-<slug>:<port>`
-   - listens on a known port inside the per-bottle internal network
-2. Extend the bottle launch step to copy the CA public cert into the
-   agent container under
-   `/usr/local/share/ca-certificates/claude-bottle-mitm.crt`, run
-   `update-ca-certificates`, and set `NODE_EXTRA_CA_CERTS` /
-   `SSL_CERT_FILE` / `REQUESTS_CA_BUNDLE` accordingly.
-3. Repoint the agent's `HTTPS_PROXY` and `HTTP_PROXY` from the pipelock
-   container to the mitmproxy container.
-4. Verify mitmproxy's upstream-proxy mode forwards plaintext (not a
-   re-wrapped CONNECT) to pipelock; if not, use `regular` mode with a
-   chained proxy directive.
-5. Test that pipelock's DLP, subdomain-entropy, and MCP scanners now
-   fire on real request bodies for `api.anthropic.com` traffic.
-
-**Defer Topologies C and D.** Topology C (extending pipelock to
-terminate TLS) is the cleanest long-term shape but is a substantial
-build and runs into the Apache 2.0 vs. ELv2 question. Topology D
-(mitmproxy with pipelock as a scan API) is attractive but requires a
-pipelock surface that doesn't exist today. Both are valid v2 targets;
-neither is the right starting point.
-
-The `network-egress-guard.md` v1 iptables + dnsmasq layer remains
-necessary alongside this — TLS interception covers HTTP/HTTPS only;
-raw TCP, UDP/443 (QUIC), UDP/53 (DNS), and ICMP still need the
-IP-level default-deny.
-
---
-
-## Open questions
-
-1. **mitmproxy upstream-proxy mode mechanics.** Does mitmproxy in
-   `upstream_proxy` mode forward decrypted HTTP plaintext to the
-   upstream, or does it wrap it in a new CONNECT? The documented
-   behavior changed between mitmproxy 8 and 10. Needs verification
-   against the version we pin.
-2. **Pipelock's behavior when receiving plain HTTP.** Pipelock's
-   `forward_proxy.enabled: true` accepts both `GET http://...` (plain
-   HTTP) and `CONNECT host:443` (HTTPS). After Topology A is wired up,
-   pipelock will see only plain HTTP — does its DLP / MCP scanning
-   pipeline run the full set of layers, or are some gated on the
-   CONNECT path? Confirm by reading
-   `github.com/luckyPipewrench/pipelock/blob/main/docs/configuration.md`.
-3. **CA installation in the Anthropic-provided Claude Code Docker image.**
-   The base image's distribution determines whether `update-ca-certificates`
-   (Debian/Ubuntu) or `update-ca-trust` (Red Hat) is the right command.
-   The current `Dockerfile` should be inspected before assuming Debian.
-4. **HTTP/2 over the agent → mitmproxy hop.** Node's HTTP client
-   negotiates `h2` via ALPN. mitmproxy speaks `h2` to clients in recent
-   versions. Confirm the version we pin supports `h2` end-to-end and
-   doesn't downgrade to `http/1.1` (which would be a silent
-   performance regression).
-5. **Selective-bump policy surface.** Where does the
-   "tunnel this hostname blindly" decision live? Options: a field on
-   `bottle.egress` in the manifest, a fixed list of known-pinning
-   hosts baked into the mitmproxy config, or pipelock-side opt-out.
-   Manifest field is most consistent with the existing
-   `bottle.egress.allowlist` shape.
-6. **Image pin for mitmproxy.** The `pipelock-assessment.md`
-   recommendation is to pin by digest. The mitmproxy Docker Hub image
-   should be pinned the same way. Which release line? `mitmproxy/mitmproxy`
-   ships rolling and tagged versions; the tagged `:11.x` line is the
-   right baseline.
-7. **CA generation in Python (mitmproxy) vs. as a separate step.**
-   mitmproxy generates a CA on first launch if none is provided. For
-   per-bottle ephemerality, we want the CA to be ours, not whatever
-   mitmproxy chooses — so generate the CA in the host-side prepare
-   step and inject it via `--certs *=...`. Mechanics need confirming.
-8. **Domain fronting verification.** Once pipelock sees plaintext, it
-   has access to the inner `Host` / `:authority`. A new rule that
-   compares it against the outer `CONNECT` target catches domain
-   fronting. Worth a follow-up note on whether pipelock has such a
-   rule or whether we add it.
-
---
-
-## References
-
- mitmproxy: <https://mitmproxy.org>, <https://github.com/mitmproxy/mitmproxy>
- mitmproxy `upstream_proxy` mode: <https://docs.mitmproxy.org/stable/concepts/modes/#upstream-proxy>
- mitmproxy CA cert installation: <https://docs.mitmproxy.org/stable/concepts/certificates/>
- Squid `ssl_bump`: <https://wiki.squid-cache.org/Features/SslPeekAndSplice>
- Squid ICAP: <https://wiki.squid-cache.org/Features/ICAP>
- `goproxy`: <https://github.com/elazarl/goproxy>
- `gomitmproxy`: <https://github.com/AdguardTeam/gomitmproxy>
- `martian`: <https://github.com/google/martian>
- Node TLS / `NODE_EXTRA_CA_CERTS`: <https://nodejs.org/api/cli.html#node_extra_ca_certsfile>
- Python `SSL_CERT_FILE` and `REQUESTS_CA_BUNDLE`: <https://docs.python.org/3/library/ssl.html#ssl.SSLContext.load_verify_locations>
- Prior research — pipelock assessment: `docs/research/pipelock-assessment.md`
- Prior research — network egress guard: `docs/research/network-egress-guard.md`
- Prior research — secret exfil tripwire encodings: `docs/research/secret-exfil-tripwire-encodings.md`
-
-Research conducted 2026-05-12.