diff --git a/docs/prds/0006-pipelock-tls-interception.md b/docs/prds/0006-pipelock-tls-interception.md new file mode 100644 index 0000000..20bbe7b --- /dev/null +++ b/docs/prds/0006-pipelock-tls-interception.md @@ -0,0 +1,303 @@ +# PRD 0006: pipelock native TLS interception + +- **Status:** Draft +- **Author:** didericis +- **Created:** 2026-05-12 + +## Summary + +Turn on pipelock's built-in `tls_interception` so its DLP / URL / +header / MCP scanners fire on the plaintext of HTTPS requests +instead of only the outer `CONNECT` hostname. Pipelock generates a +per-bottle ephemeral CA at launch (`pipelock tls init`); the +public cert is installed into the agent container's trust store +and the private key dies with the sidecar on teardown. The +existing per-agent sidecar topology from PRD 0001 is otherwise +unchanged — one container, no addon, no second proxy. + +This supersedes the closed PR #8 / branch `mitmproxy-tls-interception`, +which built a mitmproxy + addon chain on the (falsified) premise +that pipelock could not MITM. Empirical proof from the impl-time +spike: with `tls_interception: { enabled: true, ca_cert, ca_key }` +in the pipelock config, pipelock answered a credential POST over +HTTPS with `STATUS=403 / body: blocked: request body contains +secret: GitHub Token` and emitted both +`scanner:"tls_intercept"` and `scanner:"body_dlp"` events. + +## Problem + +PRD 0001 wired pipelock onto every bottle's egress, but pipelock +ran with its default `tls_interception.enabled: false`. The agent +container's only egress route is pipelock, but pipelock only saw +`CONNECT` hostnames and the encrypted bytes inside the tunnel. +Pipelock's headline scanners — request body DLP (48 credential +patterns), header DLP, URL DLP, subdomain entropy, MCP scanning, +response-body scanning — all need plaintext to fire. Against the +HTTPS-only hosts in `DEFAULT_ALLOWLIST` (`api.anthropic.com`, +`raw.githubusercontent.com`, etc.) they are effectively disabled. + +The existing `tests/integration/test_pipelock_blocks_secret_post` +test only fires because it forces the agent to send plain HTTP +through pipelock's forward-proxy mode. Real Claude Code traffic +uses HTTPS via CONNECT and slips past the scanner. + +## Goals / Success Criteria + +The feature works when all of the following are observable: + +- A Node / curl request from inside a launched bottle to a + CONNECT-bumped HTTPS host (e.g. `https://api.anthropic.com/dlp-probe`) + carrying a pipelock-recognized credential pattern in the body + returns 403 from pipelock with the documented + `blocked: request body contains secret: …` body. Pipelock's + `body_dlp` event fires on the decrypted request. +- A clean HTTPS GET from inside the bottle to an allowlisted host + (e.g. `https://raw.githubusercontent.com/...`) returns the real + upstream response — TLS interception doesn't break legitimate + traffic. +- The agent's TLS library trusts pipelock's bumped leaf certs + (per the bottle's installed CA); no TLS-trust errors. +- Claude Code reaches `api.anthropic.com` end-to-end through the + bottle and completes a chat round-trip. + +The feature is **done** when all of the following ship: + +- `pipelock_build_config` / `pipelock_render_yaml` emit a + `tls_interception` block with `enabled: true` and the per-bottle + CA cert/key paths. The defaults + (`cert_ttl: 24h`, `cert_cache_size: 10000`, + `passthrough_domains: []`) are kept; only `enabled` and the + cert paths are populated. +- The prepare step generates a per-bottle CA via `pipelock tls init` + in a one-shot container, writes `ca.pem` and `ca-key.pem` to + `stage_dir`. Paths land on the `DockerBottlePlan`. +- `DockerPipelockProxy.start` mounts the stage dir into the + sidecar (read-only) so the running pipelock can read its CA. +- `BottleBackend.provision_ca` (new) copies the CA public cert + into the agent at + `/usr/local/share/ca-certificates/claude-bottle-mitm.crt`, runs + `update-ca-certificates`, and sets the `NODE_EXTRA_CA_CERTS` / + `SSL_CERT_FILE` / `REQUESTS_CA_BUNDLE` env trio on the agent + container's runtime env. Default no-op on the abstract base so + other backends aren't forced to implement. +- The launch step prints a one-line stderr log with the SHA-256 + fingerprint of the public CA cert (computed via stdlib + `ssl.PEM_cert_to_DER_cert` + `hashlib.sha256`). +- On bottle teardown the sidecar is removed and the CA private + key is gone with it. +- Two new integration tests under `tests/integration/`: + - HTTPS variant of the credential-post block test (proves the + `tls_intercept` + `body_dlp` chain fires end-to-end). + - Clean HTTPS GET test (proves the allow path doesn't break TLS + trust and returns real upstream content). +- The dry-run preflight (`start --dry-run`) renders the new TLS + layer. Text: one line under the egress summary. JSON: a + reserved `egress.tls_interception: { enabled: true, + ca_fingerprint: null }` block — fingerprint is null at dry-run + because the CA only exists after launch. + +## Non-goals + +- A second proxy in the chain. Pipelock does the bumping + natively; the mitmproxy approach was based on a wrong premise + (closed PR #8). +- Per-bottle override to disable interception. v1 always enables + `tls_interception`. The pipelock-side `passthrough_domains` + list is the right knob if a future allowlisted host turns out + to pin certs — exposing it through the manifest is a follow-up. +- A long-lived / shared CA across bottles. Each bottle gets a + fresh CA generated by `pipelock tls init` and destroyed with the + sidecar. +- Tuning `cert_ttl`, `cert_cache_size`, `max_response_bytes`, + `cross_request_detection`, or other pipelock advanced features. + Defaults from `pipelock generate config --preset strict` are + fine for v1. +- Trust-store paths for non-Debian agent images. + `node:22-slim` is Debian; `update-ca-certificates` is the right + command. A Red-Hat-family base would need `update-ca-trust`. +- HTTP/3 / QUIC. Pipelock's interception is HTTP/HTTPS-over-TLS; + UDP/443 still needs an iptables layer (separate PRD). + +## Scope + +### In scope + +- **`claude_bottle/pipelock.py`** changes: + - Extend `pipelock_build_config` to include + `tls_interception: { enabled: true, ca_cert: , ca_key: + }`. Paths are populated from the plan; the function's + signature grows a `cert_path` / `key_path` pair or reads them + off `Bottle` once they're stored. + - Extend `pipelock_render_yaml` to emit the new block. +- **`claude_bottle/backend/docker/pipelock.py`** changes: + - New helper `pipelock_tls_init(stage_dir)` runs the upstream + image as a one-shot: + `docker run --rm -v :/h -e PIPELOCK_HOME=/h pipelock tls init`, + leaving `ca.pem` and `ca-key.pem` under `stage_dir`. The host + file owner is whatever the upstream image's user is; the + sidecar mount is read-only so this is fine. + - `DockerPipelockProxy.start` mounts the stage dir into the + sidecar at `/h:ro` and references the CA paths in the rendered + YAML. +- **`claude_bottle/backend/__init__.py`**: new abstract method + `provision_ca(plan, target)` on `BottleBackend`, default no-op. + `BottleBackend.provision` orchestrates `ca → prompt → skills → + ssh → git`. +- **`claude_bottle/backend/docker/provision/ca.py`** (new): + - Reads the cert from `stage_dir` (already written by prepare). + - `docker cp` into the agent. + - `docker exec -u 0 ... chmod 644 ...` + `update-ca-certificates`. + - Computes the SHA-256 fingerprint with stdlib (`ssl` + + `hashlib`), emits one stderr log line. +- **`claude_bottle/backend/docker/launch.py`**: + - Three new `-e` flags on the agent's `docker run`: + `NODE_EXTRA_CA_CERTS=/usr/local/share/ca-certificates/claude-bottle-mitm.crt`, + `SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt`, + `REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt`. + - `HTTPS_PROXY` / `HTTP_PROXY` continue to point at pipelock + (unchanged from PRD 0001 — the mitmproxy detour in PR #8 is + abandoned). +- **`claude_bottle/backend/docker/bottle_plan.py`**: + - One new `info(...)` line in `print()` noting TLS interception + is on. + - `to_dict()` gains an `egress.tls_interception: { enabled: + true, ca_fingerprint: null }` block. Reserved for future + population. +- **`claude_bottle/backend/docker/prepare.py`**: call + `pipelock_tls_init(stage_dir)` and write the resolved cert/key + paths onto the plan (either on the existing `proxy_plan` field + or on the parent `DockerBottlePlan`). +- **Tests:** + - `tests/integration/test_pipelock_blocks_secret_https_post.py` + (new) — HTTPS variant of the existing block test. + - `tests/integration/test_pipelock_allows_normal_https.py` + (new) — clean HTTPS GET succeeds. + - `tests/unit/test_pipelock_yaml.py` updated to assert the new + `tls_interception` block in the rendered config. + - `tests/integration/test_dry_run_plan.py` updated to assert + the new `egress.tls_interception` JSON block. + +### Out of scope + +- Modifying pipelock itself. We're using existing config knobs. +- A manifest field to disable / customize interception per bottle. + Doable but premature. +- Wiring `passthrough_domains`. The default `[]` is correct for + v1; add the manifest field when a pinning host shows up. +- `cross_request_detection`, `entropy_budget`, + `fragment_reassembly`, `reverse_proxy`, `scan_api` — features + pipelock exposes but we don't need for the body-DLP gap. + +## Proposed Design + +### Topology + +``` +agent --HTTPS_PROXY--> pipelock --[bumps TLS]--> internet + (sees plaintext: URL, headers, body) +``` + +Same single-sidecar shape as PRD 0001. The only addition is +`tls_interception` in pipelock's config plus the per-bottle CA +generated at prepare time. + +### CA lifecycle + +- **Generation.** Host-side, at prepare time, via a one-shot + `docker run --rm -v :/h pipelock tls init`. Output is + `/ca.pem` + `/ca-key.pem`, both mode 600. +- **Sidecar mount.** `DockerPipelockProxy.start` adds + `-v :/h:ro` to the sidecar's `docker run`. The rendered + YAML references `/h/ca.pem` and `/h/ca-key.pem`. The private + key is read-only from pipelock's perspective; the host stage + dir is owned by the launching user. +- **Bottle install.** `provision_ca` (Docker impl) does + `docker cp /ca.pem agent:/usr/local/share/ca-certificates/claude-bottle-mitm.crt`, + then `update-ca-certificates`. The CA env trio is set at + `docker run -e` time (Docker propagates run-time env into + `docker exec`, verified in PR #8's spike). +- **Teardown.** The sidecar container is destroyed, the stage + dir is removed by `start.py`'s existing `finally` block, and + the CA dies with both. +- **Fingerprint.** Computed via stdlib in `provision_ca` and + logged once to stderr (`claude-bottle: mitm ca fingerprint: + sha256:…`). The private key never appears in any log. + +### Data model changes + +None to the manifest schema. The dry-run JSON contract grows a +reserved `egress.tls_interception` block; the fingerprint is +always null at dry-run because the CA doesn't exist yet. + +### Existing code touched + +Surgical, all on the existing pipelock path: + +- `claude_bottle/pipelock.py` — config builder + YAML renderer. +- `claude_bottle/backend/__init__.py` — abstract `provision_ca`. +- `claude_bottle/backend/docker/pipelock.py` — `tls init` helper, + sidecar volume mount. +- `claude_bottle/backend/docker/prepare.py` — CA paths on plan. +- `claude_bottle/backend/docker/launch.py` — CA env trio on agent. +- `claude_bottle/backend/docker/backend.py` — `provision_ca` + dispatch + thread `self._proxy` through prepare/launch unchanged + shape. +- `claude_bottle/backend/docker/bottle_plan.py` — preflight + rendering. +- `claude_bottle/backend/docker/provision/ca.py` (new). + +Net diff is meaningfully smaller than PR #8 because pipelock +already does the work — no addon, no second sidecar, no second +backend module. + +### External dependencies + +- **Pipelock image** — unchanged pin from PRD 0001 + (`ghcr.io/luckypipewrench/pipelock@sha256:3b1a3941…`, + matching pipelock v2.3.0). No new image dependency. +- **No host-side crypto deps.** CA generation uses the pipelock + image's own `tls init` command in a one-shot container. + Fingerprint uses Python stdlib `ssl` + `hashlib`. + +## Open questions + +- **Mount semantics for the stage dir.** The sidecar runs with a + `-v :/h:ro` bind mount. The CA files were written by + the one-shot `pipelock tls init` container with whatever UID + pipelock's image uses; the sidecar reads them as that same UID. + Should work, but confirm on first impl by inspecting the file + modes/owners and that the sidecar actually loads them. Fallback: + `docker cp` the cert/key into the running sidecar after `docker + create` (mirror PR #8's mitmproxy lifecycle). +- **Cert validity / TTL.** Defaults are `cert_ttl: 24h` for + per-host leaves; the CA validity from `pipelock tls init` is + 10 years by default (`--validity 87600h`). The CA outlives the + bottle either way; per-bottle ephemerality is enforced by + *generating a fresh one each launch*, not by setting a short + CA validity. Document; no tuning in v1. +- **`passthrough_domains` shape.** Once we expose this through + the manifest in a follow-up, the natural place is + `bottle.egress.tls_passthrough_domains: [host, ...]`, mirroring + the existing `egress.allowlist` shape. +- **Stage-dir cleanup ordering.** The stage dir holds the CA + private key briefly. `start.py`'s existing `finally` block + `shutil.rmtree`s it. Confirm the rmtree fires after the sidecar + is stopped, so the sidecar isn't reading a deleted mount when + it shuts down. The current order is correct (teardown unwinds + via ExitStack before the outer `finally` runs); verify. + +## References + +- `docs/research/pipelock-assessment.md` (now corrected) — + pipelock capability assessment including the + `tls_interception` block. +- `docs/prds/0001-per-agent-egress-proxy-via-pipelock.md` — + egress-proxy baseline this PRD extends. +- `docs/prds/0003-bottle-backend-abstraction.md` — backend ABC + contract this PRD adds a `provision_ca` method to. +- `docs/prds/0004-split-out-provisioners.md` — per-provisioner + module pattern reused for the new CA provisioner. +- Pipelock `tls` CLI (in-image help): + `pipelock tls init / install-ca / show-ca`. +- Closed PR #8 — earlier mitmproxy-based design built on the + falsified "pipelock can't MITM" premise; archived for context. diff --git a/docs/research/pipelock-assessment.md b/docs/research/pipelock-assessment.md index e547afc..5c6c1d2 100644 --- a/docs/research/pipelock-assessment.md +++ b/docs/research/pipelock-assessment.md @@ -222,10 +222,14 @@ The following threat-model items from `network-egress-guard.md` are intercept raw UDP 53 packets. - **Domain fronting**: an agent can send `CONNECT allowed-host.com:443` through the proxy but embed a different SNI inside the TLS session. - Pipelock does not perform TLS inspection (no CA trust injection) and - cannot verify SNI vs. CONNECT header. The same limitation is shared - with smokescreen and is documented in `network-egress-guard.md` as a - known gap for the non-TLS-terminating proxy approach. + Pipelock supports TLS interception via its `tls_interception` config + block (`enabled`, `ca_cert`, `ca_key`, `cert_ttl`, `cert_cache_size`, + `passthrough_domains`, `max_response_bytes`) plus the `pipelock tls + init` / `install-ca` / `show-ca` CLI; with interception on, the + body and inner Host header become visible to its scanner pipeline, + closing the domain-fronting gap. With interception off (default in + the generated config), pipelock relays the CONNECT as an opaque + tunnel and only sees the outer hostname. - **SSH egress content**: SSH sessions to permitted hosts are opaque. Same limitation noted in both prior research notes. - **Agent killing the proxy process**: if pipelock runs inside the same @@ -385,7 +389,7 @@ pipelock's differentiators. | Blocks RFC 1918 by default | only if explicitly added to rules | yes | yes, + DNS rebinding | no | | Content-based DLP (credential patterns) | no | no | yes, 48 patterns + encoding normalization | no | | MCP / WebSocket scanning | no | no | yes, bidirectional | no | -| Domain fronting bypass | possible | possible | possible (no TLS termination) | n/a | +| Domain fronting bypass | possible | possible | mitigated when `tls_interception` is enabled (CA trust required in client) | n/a | | macOS Docker Desktop (sidecar mode) | yes | yes | yes | yes | | macOS Docker Desktop (in-container sandbox) | yes | n/a | degraded (--best-effort) | yes | | NET_ADMIN / NET_RAW required | yes | no | no (sidecar) | no | diff --git a/docs/research/tls-mitm-for-pipelock.md b/docs/research/tls-mitm-for-pipelock.md deleted file mode 100644 index aa8c4d0..0000000 --- a/docs/research/tls-mitm-for-pipelock.md +++ /dev/null @@ -1,508 +0,0 @@ -# TLS interception for pipelock content scanning - -Research into adding TLS termination ("MITM") to the egress path so that -pipelock's scanning pipeline can see plaintext HTTP request and response -bodies, instead of only the `CONNECT` host and opaque ciphertext. - -## Summary - -- Pipelock today sees `CONNECT` hostnames and the encrypted bytes that follow. - Its DLP, subdomain-entropy, and MCP scanners cannot fire on TLS-encrypted - bodies, which is the gap explicitly named under "Scope gaps" in - `pipelock-assessment.md` ("Pipelock does not perform TLS inspection (no CA - trust injection)"). -- Closing that gap requires a TLS-terminating proxy that bumps `CONNECT`, - presents a leaf certificate for the target hostname signed by a CA the - bottle's trust store accepts, decrypts the inner HTTP, and re-establishes - TLS to the real upstream. -- The mature open-source option is **mitmproxy**. Squid + `ssl_bump` is the - heavier production-grade alternative. The Go ecosystem (`goproxy`, - `gomitmproxy`, `martian`) is suitable only if we want a custom binary - tightly coupled to pipelock. -- Recommended v1 topology: **mitmproxy in front of pipelock** on the same - egress route. mitmproxy terminates client TLS, forwards plaintext to - pipelock as its upstream HTTP proxy, and re-encrypts to the real upstream. - Pipelock stays unchanged. -- Per-bottle ephemeral CA, generated at bottle start and destroyed on - teardown. The CA private key lives only on the sidecar; the bottle's - trust store only ever sees the public cert. -- Cert pinning is a known caveat but a small one given the narrow allowlist - in this project. Selective bumping is the mitigation if a future - allowlisted host turns out to pin. - ---- - -## What pipelock cannot see today - -The current egress topology (per `pipelock-assessment.md`): - -``` -agent --HTTPS_PROXY--> pipelock --CONNECT host:443--> internet - \____________________________ - opaque TLS bytes -``` - -The agent's client (Claude Code, `curl`, an MCP server, a Python SDK) -sends `CONNECT api.anthropic.com:443`. Pipelock checks the hostname -against its `api_allowlist`, replies `200 Connection Established`, and -then blindly relays bytes between the two TCP halves. The TLS handshake -and everything inside it happens end-to-end between the agent and the -real upstream. - -What pipelock can scan in this mode: - -- `CONNECT` target hostname (SNI is not even needed). -- TLS record framing and lengths (useful for budgets, useless for DLP). -- Plain HTTP/1.1 to non-HTTPS destinations (irrelevant — there are none - in `DEFAULT_ALLOWLIST`). - -What pipelock cannot scan in this mode: - -- Request URL, method, headers, body. -- Response status, headers, body. -- MCP JSON-RPC payloads inside the TLS session. -- WebSocket frames inside a TLS-wrapped upgrade. -- Whether the inner SNI or HTTP `Host` / `:authority` matches the - outer `CONNECT` target (domain-fronting check). - -The 48-pattern DLP layer, the subdomain-entropy check (insofar as it -inspects URLs rather than DNS-resolver queries), the request-redaction -feature added in v2.3.0, and bidirectional MCP scanning all require -plaintext to operate on. Without TLS termination, those layers are -inert against any HTTPS destination — which is every destination in -the current allowlist. - ---- - -## How TLS interception works - -The mechanics of `CONNECT` bumping, end to end: - -1. **Agent issues `CONNECT`.** The HTTP client sees `HTTPS_PROXY` set, - so it opens a TCP connection to the proxy and sends - `CONNECT api.anthropic.com:443 HTTP/1.1`. -2. **Proxy answers `200`.** Standard tunnel-established response. -3. **Proxy starts TLS as the server.** Instead of relaying bytes, the - proxy itself performs a TLS handshake with the agent. It needs a - server certificate for `api.anthropic.com` — so on first contact for - that hostname, the proxy generates a leaf certificate with - `CN=api.anthropic.com` and a SAN for the same, signs it with its - own CA private key, and presents that cert. Subsequent connections - to the same hostname reuse the cached leaf. -4. **Agent verifies the cert.** The agent's TLS library walks the chain - to a trusted root. Because the bottle's trust store contains the - proxy's CA cert, validation succeeds. The agent has no way to tell - it isn't talking to the real `api.anthropic.com`. -5. **Proxy opens its own TLS to the real upstream.** As a client this - time, using the system root store, talking to the real - `api.anthropic.com`. Real SNI, real cert chain validated normally. -6. **Proxy bridges the two TLS sessions.** Decrypts on the server side, - re-encrypts on the client side, and scans the plaintext in between. - -This is what every TLS-terminating egress proxy does. The trade-offs -live in three places: - -- **CA trust injection.** Step 4 only works if the bottle's trust - store contains the proxy's CA. Mechanics covered under "CA lifecycle" - below. -- **Cert generation cost.** Generating an RSA-2048 leaf cert takes - ~50 ms; ECDSA P-256 is ~5 ms. Cache leaves per (hostname, SAN list) - to keep this off the steady-state hot path. -- **Protocol coverage.** The proxy needs to speak HTTP/1.1, HTTP/2 (ALPN - `h2`), and ideally WebSocket. HTTP/3 / QUIC is UDP and requires a - separate code path; for v1, blocking UDP/443 at the iptables layer - forces clients to fall back to HTTP/2, which we can inspect. - ---- - -## Tools - -### mitmproxy - -- **What it is.** Python (with Rust crypto bits) interactive HTTPS proxy. - Reference open-source implementation of the bump pattern. Ships as - `mitmproxy` (TUI), `mitmweb` (browser UI), and `mitmdump` (headless). -- **Cert handling.** Generates a CA on first run under `~/.mitmproxy/`. - Per-host leaves are generated on demand and cached in memory. Cert - cache keyed by (hostname, SAN extensions inferred from upstream cert). -- **Protocols.** HTTP/1.1, HTTP/2, WebSocket fully supported. HTTP/3 - exists as experimental. Raw TCP / non-HTTP TLS supported via - `--mode reverse:` but not in CONNECT-bump mode. -- **Extensibility.** Python addon API. An addon module can inspect or - modify any `request` / `response` / `tcp_message` flow. The pipelock - integration in Topology D below uses this. -- **Selective bumping.** `ignore_hosts` regex; matching CONNECTs are - tunneled blindly instead of bumped. Critical for the cert-pinning - mitigation. -- **Docker image.** `mitmproxy/mitmproxy` on Docker Hub. Single binary - for the CLI, ~80 MB image. Configurable via flags or `~/.mitmproxy/config.yaml`. -- **Project URL.** , . - -Most mature, best-documented, lowest-effort integration. Default choice -for v1. - -### Squid + ssl_bump - -- **What it is.** Squid is a long-running C++ caching proxy. - `ssl_bump` is its TLS-interception feature, controlled by per-CONNECT - actions: `splice` (tunnel blindly), `bump` (decrypt and re-encrypt), - `peek` (look at TLS hello then decide), `stare` (look at server cert - then decide), `terminate` (abort the connection). -- **Cert handling.** Configured via `sslcrtd_program` — a helper that - generates and caches per-host certs. CA cert and key referenced by - PEM paths in `squid.conf`. -- **Protocols.** HTTP/1.1 fully; HTTP/2 to clients via recent versions; - no scripted addons. -- **Extensibility.** ICAP (Internet Content Adaptation Protocol) for - external scanners — Squid POSTs each request/response to an ICAP - service that can modify or reject. This is the formal version of - Topology D below. -- **Production track record.** Used at corporate-proxy scale (large - enterprises, ISPs). Heavyweight for a single-bottle sidecar. -- **Project URL.** . - -Right tool if pipelock grows an ICAP server endpoint. Otherwise, more -config surface than this project needs. - -### Go libraries: goproxy, gomitmproxy, martian - -- **`goproxy`** (elazarl) — long-lived Go library, basic CONNECT-bumping - proxy with a handler API. Sparse on HTTP/2. - -- **`gomitmproxy`** (AdGuard) — newer, cleaner API; built for AdGuard - Home / DNS-filtering products. HTTP/2 support is partial. - -- **`martian`** (Google) — request/response modifier framework with a - JSON-configurable rule engine. Used internally at Google; public - ecosystem thin. - - -These are relevant only if we decide to write a custom TLS-terminating -binary that links pipelock's scanning packages directly — Topology C -below. They are not faster than mitmproxy for the v1 sidecar shape; -they are smaller and more direct, at the cost of writing more Go. - -### Disqualified - -- **Caddy, Envoy, HAProxy.** All can terminate TLS at a reverse-proxy - vhost. None ship a "bump on CONNECT and forward plaintext to a - downstream proxy" mode out of the box. Adapting any of them to this - shape is more work than starting from mitmproxy. -- **Cloudflare Gateway, Zscaler, NetSkope, Forcepoint.** Managed cloud - egress with TLS inspection. Wrong topology — they live outside the - host, not as a per-bottle sidecar, and they require trusting a vendor - with full plaintext. -- **Charles Proxy, Burp Suite.** Closed-source GUI tools for developer - capture and security testing. Not appropriate as headless sidecars. -- **`mitmdump` standalone vs. embedding mitmproxy as a library.** Both - are mitmproxy. Calling out only to note: the project ships both a CLI - and a Python API; addons can be loaded either way. - ---- - -## Topologies - -Five candidate topologies, ordered roughly from least to most coupled -between the two components. - -### A — mitmproxy in front of pipelock (recommended) - -``` -agent --HTTPS_PROXY--> mitmproxy --HTTP_PROXY--> pipelock --> internet - (bump TLS) (scan plain) (real TLS) -``` - -mitmproxy terminates the agent's TLS connection, decrypts, and then -forwards the inner HTTP request to pipelock by treating pipelock as -its own upstream HTTP forward proxy. Pipelock receives plaintext HTTP -exactly as if the agent had used HTTP, applies its full scanning -pipeline, and forwards to mitmproxy's upstream client half — which -re-establishes TLS to the real destination. - -Concretely the agent's `HTTPS_PROXY` points at mitmproxy; mitmproxy's -`upstream_proxy` config points at pipelock; pipelock's network reach -includes the real internet. - -- **Wins.** Pipelock unchanged. mitmproxy unchanged from default - configuration. Each component has one job. Failure modes are clear - per layer. -- **Costs.** Two sidecars per bottle instead of one. One extra - decrypt / re-encrypt hop, ~5–15 ms per request in steady state. -- **Open question.** How exactly mitmproxy forwards to pipelock matters - for whether pipelock sees TLS again or only HTTP. mitmproxy's - `upstream` mode wraps the decrypted request in another CONNECT if the - destination is HTTPS — which would re-encrypt before pipelock sees - it, defeating the point. The correct mode is `upstream` with TLS - re-origination disabled, or `regular` mode with a chained proxy. The - v2 release of mitmproxy reworked this; needs verification against the - current docs at integration time. - -### B — pipelock in front of mitmproxy (ruled out) - -``` -agent --HTTPS_PROXY--> pipelock --CONNECT?--> mitmproxy --> internet - (sees CONNECT only) (bump TLS) -``` - -Pipelock would receive a `CONNECT` and decide to allow or deny based -on hostname, then tunnel to mitmproxy. mitmproxy would terminate TLS -and see plaintext — but pipelock would never see the plaintext, which -is the whole point of the exercise. The scanning still happens (in -mitmproxy), but it isn't pipelock doing it, so we'd need an entirely -different rule engine. Ruled out. - -### C — Extend pipelock itself to terminate TLS - -Two sub-variants: - -**C.1 — Upstream a `tls_terminate` mode.** Submit a feature to -pipelock that adds CONNECT bumping and per-host cert generation in Go, -using `crypto/tls` and the existing scanning packages. Pipelock becomes -a self-contained MITM proxy. License question matters here: the Apache -2.0 core can grow new features in-tree, but if upstream insists this -belongs in `enterprise/` (ELv2), we either accept ELv2 or fork. - -**C.2 — Wrap pipelock in a thin Go binary in the same container.** A -small Go program does the TLS half (`CONNECT` parsing, cert generation, -TLS handshake) and pipes plaintext to pipelock over UDS or loopback. -The wrapper is ours; pipelock is unmodified. No license question. - -- **Wins.** Single component on the egress path. Pipelock owns the - scanning end-to-end, including domain-fronting checks (SNI vs. - `Host` vs. `CONNECT`). -- **Costs.** Real Go engineering effort. CA generation, cert caching, - TLS handshake, HTTP/2 ALPN negotiation, WebSocket upgrade — all - things mitmproxy already solves. -- **When.** Right shape for v2 or v3 once the v1 mitmproxy-in-front - topology has proven the integration works and the scanning rules are - stable. - -### D — mitmproxy as the proxy, pipelock as a content-scan subroutine - -``` -agent --HTTPS_PROXY--> mitmproxy --> internet - (bump TLS) - | - v - POST /scan to pipelock - <- allow / block / redact -``` - -A Python addon in mitmproxy sends each decrypted request (and response) -to a pipelock HTTP `/scan` endpoint and gates the flow on the verdict. -mitmproxy handles all networking; pipelock is the rule engine only. - -- **Wins.** Clean separation of concerns. Pipelock doesn't have to - speak TLS at all. The addon is small, ~100 lines of Python. -- **Costs.** Requires pipelock to expose a scan API. The current Apache - 2.0 core does not document one. If `/scan` lives in `enterprise/`, - ELv2 applies. If it doesn't exist, we'd be asking pipelock for a new - surface. -- **Variant.** Squid's ICAP path is the formalized version of the same - pattern. - -### E — Single container, two processes - -mitmproxy and pipelock share a container, started by `supervisord` or -`s6-overlay`. Networking simplifies to localhost. Lifecycle complicates: -container restart now means restarting both; failure of one process is -not visible at the Docker layer; logs interleave. - -- **Wins.** Slightly less Docker plumbing in `cli.py`. -- **Costs.** Operational complexity not worth the savings. The two - containers are independent processes with independent failure modes; - Docker is the right tool for that. - -Net: not recommended. - ---- - -## CA lifecycle - -The CA private key is the asset to defend. With it, anyone can issue -certs that the bottle's trust store will accept for any hostname. So: - -**Per-bottle ephemeral CA.** At bottle start, generate a fresh -RSA-2048 or ECDSA-P256 CA inside the mitmproxy sidecar. Export only -the public cert (PEM) into the bottle's trust store at one of: - -- `/usr/local/share/ca-certificates/claude-bottle-mitm.crt` followed by - `update-ca-certificates` (Debian/Ubuntu base images). -- `/etc/pki/ca-trust/source/anchors/` with `update-ca-trust` - (Red-Hat-family). -- `$NODE_EXTRA_CA_CERTS` for Node-based agents (Claude Code). -- `$SSL_CERT_FILE` / `$REQUESTS_CA_BUNDLE` for Python SDKs. - -The private key never leaves the sidecar's filesystem. The CA cert -public half is the only artifact that crosses into the bottle. - -On bottle teardown, the sidecar container is destroyed; the CA dies -with it. The next bottle gets a fresh CA. No long-lived MITM CA on -disk. - -**Why not a shared per-host CA.** A persistent CA across bottles is -faster (no generation at start) but is a real liability: if any bottle -exfiltrates the CA cert public half (which it can — it's in the trust -store by design), an attacker on the host network could in principle -impersonate any host to any bottle. With a per-bottle CA, the exfil -gains nothing: the CA is bottle-local and dies in minutes. - -**Generation cost.** RSA-2048 CA generation is ~200 ms; ECDSA-P256 is -~5 ms. Either is irrelevant against the per-bottle Docker pull and -network setup cost. - -**Where the CA lives in the bottle's trust store.** Both: a -distribution-standard path with `update-ca-certificates`, and the -env-var path. Belt and suspenders, because some Node and Python -libraries honor the env vars only, and some load only `/etc/ssl/certs/` -directly. - ---- - -## Cert pinning (brief) - -A client that pins ignores the trust store and refuses any cert whose -public key isn't on a hardcoded list. Three observations for this -project: - -- The current `DEFAULT_ALLOWLIST` (`api.anthropic.com`, - `statsig.anthropic.com`, `sentry.io`, `claude.ai`, - `platform.claude.com`, `downloads.claude.ai`, - `raw.githubusercontent.com`) does not appear to include any host that - pins against server-side SDKs. Server-side SDKs (Node, Python) almost - universally honor system trust and `NODE_EXTRA_CA_CERTS` / - `SSL_CERT_FILE`. Mobile SDKs and Chromium pin; we don't run those. -- If a future allowlisted host turns out to pin, the mitigation is - selective bumping via mitmproxy `ignore_hosts`: that specific - hostname tunnels blindly and pipelock loses DLP coverage for it. - Coverage on every other host is unaffected. -- The cost of finding out: a single 5-minute test before adding a host - — point mitmproxy at the host, observe whether the client succeeds. - -Not a v1 blocker. Document the failure mode and the mitigation. - ---- - -## Comparison table - -| | A: mitmproxy → pipelock | B: pipelock → mitmproxy | C: TLS in pipelock | D: mitmproxy + scan API | E: one container | -|---|---|---|---|---|---| -| Pipelock sees plaintext | yes | no | yes | yes (via /scan) | yes | -| Code change to pipelock | none | none | substantial | adds /scan endpoint | none | -| Sidecar count | 2 | 2 | 1 | 2 | 1 | -| Cert generation owner | mitmproxy | mitmproxy | pipelock | mitmproxy | mitmproxy | -| Selective bumping | mitmproxy `ignore_hosts` | mitmproxy `ignore_hosts` | pipelock config | mitmproxy `ignore_hosts` | mitmproxy `ignore_hosts` | -| Failure isolation per process | yes | yes | n/a (one process) | yes | no (shared container) | -| License question | none | none | ELv2 risk | ELv2 risk | none | -| v1 effort | low | low (but pointless) | high | medium | low | -| Long-term shape | interim | n/a | best | possible | not recommended | - ---- - -## Recommendation - -**Adopt Topology A for v1.** Add a mitmproxy sidecar to the egress -topology, in front of pipelock on the same per-bottle internal network. -The agent's `HTTPS_PROXY` points at mitmproxy; mitmproxy's upstream is -pipelock; pipelock's upstream is the real internet. - -Concretely: - -1. Add a `MitmproxyProxy` class alongside `PipelockProxy`, with the - same `prepare` / `start` / `stop` lifecycle. The class generates - a per-bottle CA in `stage_dir`, exports the public cert into a - second file, and writes a mitmproxy config that: - - bumps every CONNECT by default - - uses `upstream_proxy = http://pipelock-:` - - listens on a known port inside the per-bottle internal network -2. Extend the bottle launch step to copy the CA public cert into the - agent container under - `/usr/local/share/ca-certificates/claude-bottle-mitm.crt`, run - `update-ca-certificates`, and set `NODE_EXTRA_CA_CERTS` / - `SSL_CERT_FILE` / `REQUESTS_CA_BUNDLE` accordingly. -3. Repoint the agent's `HTTPS_PROXY` and `HTTP_PROXY` from the pipelock - container to the mitmproxy container. -4. Verify mitmproxy's upstream-proxy mode forwards plaintext (not a - re-wrapped CONNECT) to pipelock; if not, use `regular` mode with a - chained proxy directive. -5. Test that pipelock's DLP, subdomain-entropy, and MCP scanners now - fire on real request bodies for `api.anthropic.com` traffic. - -**Defer Topologies C and D.** Topology C (extending pipelock to -terminate TLS) is the cleanest long-term shape but is a substantial -build and runs into the Apache 2.0 vs. ELv2 question. Topology D -(mitmproxy with pipelock as a scan API) is attractive but requires a -pipelock surface that doesn't exist today. Both are valid v2 targets; -neither is the right starting point. - -The `network-egress-guard.md` v1 iptables + dnsmasq layer remains -necessary alongside this — TLS interception covers HTTP/HTTPS only; -raw TCP, UDP/443 (QUIC), UDP/53 (DNS), and ICMP still need the -IP-level default-deny. - ---- - -## Open questions - -1. **mitmproxy upstream-proxy mode mechanics.** Does mitmproxy in - `upstream_proxy` mode forward decrypted HTTP plaintext to the - upstream, or does it wrap it in a new CONNECT? The documented - behavior changed between mitmproxy 8 and 10. Needs verification - against the version we pin. -2. **Pipelock's behavior when receiving plain HTTP.** Pipelock's - `forward_proxy.enabled: true` accepts both `GET http://...` (plain - HTTP) and `CONNECT host:443` (HTTPS). After Topology A is wired up, - pipelock will see only plain HTTP — does its DLP / MCP scanning - pipeline run the full set of layers, or are some gated on the - CONNECT path? Confirm by reading - `github.com/luckyPipewrench/pipelock/blob/main/docs/configuration.md`. -3. **CA installation in the Anthropic-provided Claude Code Docker image.** - The base image's distribution determines whether `update-ca-certificates` - (Debian/Ubuntu) or `update-ca-trust` (Red Hat) is the right command. - The current `Dockerfile` should be inspected before assuming Debian. -4. **HTTP/2 over the agent → mitmproxy hop.** Node's HTTP client - negotiates `h2` via ALPN. mitmproxy speaks `h2` to clients in recent - versions. Confirm the version we pin supports `h2` end-to-end and - doesn't downgrade to `http/1.1` (which would be a silent - performance regression). -5. **Selective-bump policy surface.** Where does the - "tunnel this hostname blindly" decision live? Options: a field on - `bottle.egress` in the manifest, a fixed list of known-pinning - hosts baked into the mitmproxy config, or pipelock-side opt-out. - Manifest field is most consistent with the existing - `bottle.egress.allowlist` shape. -6. **Image pin for mitmproxy.** The `pipelock-assessment.md` - recommendation is to pin by digest. The mitmproxy Docker Hub image - should be pinned the same way. Which release line? `mitmproxy/mitmproxy` - ships rolling and tagged versions; the tagged `:11.x` line is the - right baseline. -7. **CA generation in Python (mitmproxy) vs. as a separate step.** - mitmproxy generates a CA on first launch if none is provided. For - per-bottle ephemerality, we want the CA to be ours, not whatever - mitmproxy chooses — so generate the CA in the host-side prepare - step and inject it via `--certs *=...`. Mechanics need confirming. -8. **Domain fronting verification.** Once pipelock sees plaintext, it - has access to the inner `Host` / `:authority`. A new rule that - compares it against the outer `CONNECT` target catches domain - fronting. Worth a follow-up note on whether pipelock has such a - rule or whether we add it. - ---- - -## References - -- mitmproxy: , -- mitmproxy `upstream_proxy` mode: -- mitmproxy CA cert installation: -- Squid `ssl_bump`: -- Squid ICAP: -- `goproxy`: -- `gomitmproxy`: -- `martian`: -- Node TLS / `NODE_EXTRA_CA_CERTS`: -- Python `SSL_CERT_FILE` and `REQUESTS_CA_BUNDLE`: -- Prior research — pipelock assessment: `docs/research/pipelock-assessment.md` -- Prior research — network egress guard: `docs/research/network-egress-guard.md` -- Prior research — secret exfil tripwire encodings: `docs/research/secret-exfil-tripwire-encodings.md` - -Research conducted 2026-05-12.