docs(research): TLS interception topologies for pipelock content scanning
Survey of TLS-MITM tools (mitmproxy, Squid+ssl_bump, Go libraries) and five candidate topologies for adding TLS termination to the egress path so pipelock's DLP, subdomain-entropy, and MCP scanners can fire on plaintext bodies. Recommends mitmproxy in front of pipelock for v1 with a per-bottle ephemeral CA. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,508 @@
|
||||
# TLS interception for pipelock content scanning
|
||||
|
||||
Research into adding TLS termination ("MITM") to the egress path so that
|
||||
pipelock's scanning pipeline can see plaintext HTTP request and response
|
||||
bodies, instead of only the `CONNECT` host and opaque ciphertext.
|
||||
|
||||
## Summary
|
||||
|
||||
- Pipelock today sees `CONNECT` hostnames and the encrypted bytes that follow.
|
||||
Its DLP, subdomain-entropy, and MCP scanners cannot fire on TLS-encrypted
|
||||
bodies, which is the gap explicitly named under "Scope gaps" in
|
||||
`pipelock-assessment.md` ("Pipelock does not perform TLS inspection (no CA
|
||||
trust injection)").
|
||||
- Closing that gap requires a TLS-terminating proxy that bumps `CONNECT`,
|
||||
presents a leaf certificate for the target hostname signed by a CA the
|
||||
bottle's trust store accepts, decrypts the inner HTTP, and re-establishes
|
||||
TLS to the real upstream.
|
||||
- The mature open-source option is **mitmproxy**. Squid + `ssl_bump` is the
|
||||
heavier production-grade alternative. The Go ecosystem (`goproxy`,
|
||||
`gomitmproxy`, `martian`) is suitable only if we want a custom binary
|
||||
tightly coupled to pipelock.
|
||||
- Recommended v1 topology: **mitmproxy in front of pipelock** on the same
|
||||
egress route. mitmproxy terminates client TLS, forwards plaintext to
|
||||
pipelock as its upstream HTTP proxy, and re-encrypts to the real upstream.
|
||||
Pipelock stays unchanged.
|
||||
- Per-bottle ephemeral CA, generated at bottle start and destroyed on
|
||||
teardown. The CA private key lives only on the sidecar; the bottle's
|
||||
trust store only ever sees the public cert.
|
||||
- Cert pinning is a known caveat but a small one given the narrow allowlist
|
||||
in this project. Selective bumping is the mitigation if a future
|
||||
allowlisted host turns out to pin.
|
||||
|
||||
---
|
||||
|
||||
## What pipelock cannot see today
|
||||
|
||||
The current egress topology (per `pipelock-assessment.md`):
|
||||
|
||||
```
|
||||
agent --HTTPS_PROXY--> pipelock --CONNECT host:443--> internet
|
||||
\____________________________
|
||||
opaque TLS bytes
|
||||
```
|
||||
|
||||
The agent's client (Claude Code, `curl`, an MCP server, a Python SDK)
|
||||
sends `CONNECT api.anthropic.com:443`. Pipelock checks the hostname
|
||||
against its `api_allowlist`, replies `200 Connection Established`, and
|
||||
then blindly relays bytes between the two TCP halves. The TLS handshake
|
||||
and everything inside it happens end-to-end between the agent and the
|
||||
real upstream.
|
||||
|
||||
What pipelock can scan in this mode:
|
||||
|
||||
- `CONNECT` target hostname (SNI is not even needed).
|
||||
- TLS record framing and lengths (useful for budgets, useless for DLP).
|
||||
- Plain HTTP/1.1 to non-HTTPS destinations (irrelevant — there are none
|
||||
in `DEFAULT_ALLOWLIST`).
|
||||
|
||||
What pipelock cannot scan in this mode:
|
||||
|
||||
- Request URL, method, headers, body.
|
||||
- Response status, headers, body.
|
||||
- MCP JSON-RPC payloads inside the TLS session.
|
||||
- WebSocket frames inside a TLS-wrapped upgrade.
|
||||
- Whether the inner SNI or HTTP `Host` / `:authority` matches the
|
||||
outer `CONNECT` target (domain-fronting check).
|
||||
|
||||
The 48-pattern DLP layer, the subdomain-entropy check (insofar as it
|
||||
inspects URLs rather than DNS-resolver queries), the request-redaction
|
||||
feature added in v2.3.0, and bidirectional MCP scanning all require
|
||||
plaintext to operate on. Without TLS termination, those layers are
|
||||
inert against any HTTPS destination — which is every destination in
|
||||
the current allowlist.
|
||||
|
||||
---
|
||||
|
||||
## How TLS interception works
|
||||
|
||||
The mechanics of `CONNECT` bumping, end to end:
|
||||
|
||||
1. **Agent issues `CONNECT`.** The HTTP client sees `HTTPS_PROXY` set,
|
||||
so it opens a TCP connection to the proxy and sends
|
||||
`CONNECT api.anthropic.com:443 HTTP/1.1`.
|
||||
2. **Proxy answers `200`.** Standard tunnel-established response.
|
||||
3. **Proxy starts TLS as the server.** Instead of relaying bytes, the
|
||||
proxy itself performs a TLS handshake with the agent. It needs a
|
||||
server certificate for `api.anthropic.com` — so on first contact for
|
||||
that hostname, the proxy generates a leaf certificate with
|
||||
`CN=api.anthropic.com` and a SAN for the same, signs it with its
|
||||
own CA private key, and presents that cert. Subsequent connections
|
||||
to the same hostname reuse the cached leaf.
|
||||
4. **Agent verifies the cert.** The agent's TLS library walks the chain
|
||||
to a trusted root. Because the bottle's trust store contains the
|
||||
proxy's CA cert, validation succeeds. The agent has no way to tell
|
||||
it isn't talking to the real `api.anthropic.com`.
|
||||
5. **Proxy opens its own TLS to the real upstream.** As a client this
|
||||
time, using the system root store, talking to the real
|
||||
`api.anthropic.com`. Real SNI, real cert chain validated normally.
|
||||
6. **Proxy bridges the two TLS sessions.** Decrypts on the server side,
|
||||
re-encrypts on the client side, and scans the plaintext in between.
|
||||
|
||||
This is what every TLS-terminating egress proxy does. The trade-offs
|
||||
live in three places:
|
||||
|
||||
- **CA trust injection.** Step 4 only works if the bottle's trust
|
||||
store contains the proxy's CA. Mechanics covered under "CA lifecycle"
|
||||
below.
|
||||
- **Cert generation cost.** Generating an RSA-2048 leaf cert takes
|
||||
~50 ms; ECDSA P-256 is ~5 ms. Cache leaves per (hostname, SAN list)
|
||||
to keep this off the steady-state hot path.
|
||||
- **Protocol coverage.** The proxy needs to speak HTTP/1.1, HTTP/2 (ALPN
|
||||
`h2`), and ideally WebSocket. HTTP/3 / QUIC is UDP and requires a
|
||||
separate code path; for v1, blocking UDP/443 at the iptables layer
|
||||
forces clients to fall back to HTTP/2, which we can inspect.
|
||||
|
||||
---
|
||||
|
||||
## Tools
|
||||
|
||||
### mitmproxy
|
||||
|
||||
- **What it is.** Python (with Rust crypto bits) interactive HTTPS proxy.
|
||||
Reference open-source implementation of the bump pattern. Ships as
|
||||
`mitmproxy` (TUI), `mitmweb` (browser UI), and `mitmdump` (headless).
|
||||
- **Cert handling.** Generates a CA on first run under `~/.mitmproxy/`.
|
||||
Per-host leaves are generated on demand and cached in memory. Cert
|
||||
cache keyed by (hostname, SAN extensions inferred from upstream cert).
|
||||
- **Protocols.** HTTP/1.1, HTTP/2, WebSocket fully supported. HTTP/3
|
||||
exists as experimental. Raw TCP / non-HTTP TLS supported via
|
||||
`--mode reverse:` but not in CONNECT-bump mode.
|
||||
- **Extensibility.** Python addon API. An addon module can inspect or
|
||||
modify any `request` / `response` / `tcp_message` flow. The pipelock
|
||||
integration in Topology D below uses this.
|
||||
- **Selective bumping.** `ignore_hosts` regex; matching CONNECTs are
|
||||
tunneled blindly instead of bumped. Critical for the cert-pinning
|
||||
mitigation.
|
||||
- **Docker image.** `mitmproxy/mitmproxy` on Docker Hub. Single binary
|
||||
for the CLI, ~80 MB image. Configurable via flags or `~/.mitmproxy/config.yaml`.
|
||||
- **Project URL.** <https://mitmproxy.org>, <https://github.com/mitmproxy/mitmproxy>.
|
||||
|
||||
Most mature, best-documented, lowest-effort integration. Default choice
|
||||
for v1.
|
||||
|
||||
### Squid + ssl_bump
|
||||
|
||||
- **What it is.** Squid is a long-running C++ caching proxy.
|
||||
`ssl_bump` is its TLS-interception feature, controlled by per-CONNECT
|
||||
actions: `splice` (tunnel blindly), `bump` (decrypt and re-encrypt),
|
||||
`peek` (look at TLS hello then decide), `stare` (look at server cert
|
||||
then decide), `terminate` (abort the connection).
|
||||
- **Cert handling.** Configured via `sslcrtd_program` — a helper that
|
||||
generates and caches per-host certs. CA cert and key referenced by
|
||||
PEM paths in `squid.conf`.
|
||||
- **Protocols.** HTTP/1.1 fully; HTTP/2 to clients via recent versions;
|
||||
no scripted addons.
|
||||
- **Extensibility.** ICAP (Internet Content Adaptation Protocol) for
|
||||
external scanners — Squid POSTs each request/response to an ICAP
|
||||
service that can modify or reject. This is the formal version of
|
||||
Topology D below.
|
||||
- **Production track record.** Used at corporate-proxy scale (large
|
||||
enterprises, ISPs). Heavyweight for a single-bottle sidecar.
|
||||
- **Project URL.** <https://wiki.squid-cache.org/Features/SslPeekAndSplice>.
|
||||
|
||||
Right tool if pipelock grows an ICAP server endpoint. Otherwise, more
|
||||
config surface than this project needs.
|
||||
|
||||
### Go libraries: goproxy, gomitmproxy, martian
|
||||
|
||||
- **`goproxy`** (elazarl) — long-lived Go library, basic CONNECT-bumping
|
||||
proxy with a handler API. Sparse on HTTP/2.
|
||||
<https://github.com/elazarl/goproxy>
|
||||
- **`gomitmproxy`** (AdGuard) — newer, cleaner API; built for AdGuard
|
||||
Home / DNS-filtering products. HTTP/2 support is partial.
|
||||
<https://github.com/AdguardTeam/gomitmproxy>
|
||||
- **`martian`** (Google) — request/response modifier framework with a
|
||||
JSON-configurable rule engine. Used internally at Google; public
|
||||
ecosystem thin.
|
||||
<https://github.com/google/martian>
|
||||
|
||||
These are relevant only if we decide to write a custom TLS-terminating
|
||||
binary that links pipelock's scanning packages directly — Topology C
|
||||
below. They are not faster than mitmproxy for the v1 sidecar shape;
|
||||
they are smaller and more direct, at the cost of writing more Go.
|
||||
|
||||
### Disqualified
|
||||
|
||||
- **Caddy, Envoy, HAProxy.** All can terminate TLS at a reverse-proxy
|
||||
vhost. None ship a "bump on CONNECT and forward plaintext to a
|
||||
downstream proxy" mode out of the box. Adapting any of them to this
|
||||
shape is more work than starting from mitmproxy.
|
||||
- **Cloudflare Gateway, Zscaler, NetSkope, Forcepoint.** Managed cloud
|
||||
egress with TLS inspection. Wrong topology — they live outside the
|
||||
host, not as a per-bottle sidecar, and they require trusting a vendor
|
||||
with full plaintext.
|
||||
- **Charles Proxy, Burp Suite.** Closed-source GUI tools for developer
|
||||
capture and security testing. Not appropriate as headless sidecars.
|
||||
- **`mitmdump` standalone vs. embedding mitmproxy as a library.** Both
|
||||
are mitmproxy. Calling out only to note: the project ships both a CLI
|
||||
and a Python API; addons can be loaded either way.
|
||||
|
||||
---
|
||||
|
||||
## Topologies
|
||||
|
||||
Five candidate topologies, ordered roughly from least to most coupled
|
||||
between the two components.
|
||||
|
||||
### A — mitmproxy in front of pipelock (recommended)
|
||||
|
||||
```
|
||||
agent --HTTPS_PROXY--> mitmproxy --HTTP_PROXY--> pipelock --> internet
|
||||
(bump TLS) (scan plain) (real TLS)
|
||||
```
|
||||
|
||||
mitmproxy terminates the agent's TLS connection, decrypts, and then
|
||||
forwards the inner HTTP request to pipelock by treating pipelock as
|
||||
its own upstream HTTP forward proxy. Pipelock receives plaintext HTTP
|
||||
exactly as if the agent had used HTTP, applies its full scanning
|
||||
pipeline, and forwards to mitmproxy's upstream client half — which
|
||||
re-establishes TLS to the real destination.
|
||||
|
||||
Concretely the agent's `HTTPS_PROXY` points at mitmproxy; mitmproxy's
|
||||
`upstream_proxy` config points at pipelock; pipelock's network reach
|
||||
includes the real internet.
|
||||
|
||||
- **Wins.** Pipelock unchanged. mitmproxy unchanged from default
|
||||
configuration. Each component has one job. Failure modes are clear
|
||||
per layer.
|
||||
- **Costs.** Two sidecars per bottle instead of one. One extra
|
||||
decrypt / re-encrypt hop, ~5–15 ms per request in steady state.
|
||||
- **Open question.** How exactly mitmproxy forwards to pipelock matters
|
||||
for whether pipelock sees TLS again or only HTTP. mitmproxy's
|
||||
`upstream` mode wraps the decrypted request in another CONNECT if the
|
||||
destination is HTTPS — which would re-encrypt before pipelock sees
|
||||
it, defeating the point. The correct mode is `upstream` with TLS
|
||||
re-origination disabled, or `regular` mode with a chained proxy. The
|
||||
v2 release of mitmproxy reworked this; needs verification against the
|
||||
current docs at integration time.
|
||||
|
||||
### B — pipelock in front of mitmproxy (ruled out)
|
||||
|
||||
```
|
||||
agent --HTTPS_PROXY--> pipelock --CONNECT?--> mitmproxy --> internet
|
||||
(sees CONNECT only) (bump TLS)
|
||||
```
|
||||
|
||||
Pipelock would receive a `CONNECT` and decide to allow or deny based
|
||||
on hostname, then tunnel to mitmproxy. mitmproxy would terminate TLS
|
||||
and see plaintext — but pipelock would never see the plaintext, which
|
||||
is the whole point of the exercise. The scanning still happens (in
|
||||
mitmproxy), but it isn't pipelock doing it, so we'd need an entirely
|
||||
different rule engine. Ruled out.
|
||||
|
||||
### C — Extend pipelock itself to terminate TLS
|
||||
|
||||
Two sub-variants:
|
||||
|
||||
**C.1 — Upstream a `tls_terminate` mode.** Submit a feature to
|
||||
pipelock that adds CONNECT bumping and per-host cert generation in Go,
|
||||
using `crypto/tls` and the existing scanning packages. Pipelock becomes
|
||||
a self-contained MITM proxy. License question matters here: the Apache
|
||||
2.0 core can grow new features in-tree, but if upstream insists this
|
||||
belongs in `enterprise/` (ELv2), we either accept ELv2 or fork.
|
||||
|
||||
**C.2 — Wrap pipelock in a thin Go binary in the same container.** A
|
||||
small Go program does the TLS half (`CONNECT` parsing, cert generation,
|
||||
TLS handshake) and pipes plaintext to pipelock over UDS or loopback.
|
||||
The wrapper is ours; pipelock is unmodified. No license question.
|
||||
|
||||
- **Wins.** Single component on the egress path. Pipelock owns the
|
||||
scanning end-to-end, including domain-fronting checks (SNI vs.
|
||||
`Host` vs. `CONNECT`).
|
||||
- **Costs.** Real Go engineering effort. CA generation, cert caching,
|
||||
TLS handshake, HTTP/2 ALPN negotiation, WebSocket upgrade — all
|
||||
things mitmproxy already solves.
|
||||
- **When.** Right shape for v2 or v3 once the v1 mitmproxy-in-front
|
||||
topology has proven the integration works and the scanning rules are
|
||||
stable.
|
||||
|
||||
### D — mitmproxy as the proxy, pipelock as a content-scan subroutine
|
||||
|
||||
```
|
||||
agent --HTTPS_PROXY--> mitmproxy --> internet
|
||||
(bump TLS)
|
||||
|
|
||||
v
|
||||
POST /scan to pipelock
|
||||
<- allow / block / redact
|
||||
```
|
||||
|
||||
A Python addon in mitmproxy sends each decrypted request (and response)
|
||||
to a pipelock HTTP `/scan` endpoint and gates the flow on the verdict.
|
||||
mitmproxy handles all networking; pipelock is the rule engine only.
|
||||
|
||||
- **Wins.** Clean separation of concerns. Pipelock doesn't have to
|
||||
speak TLS at all. The addon is small, ~100 lines of Python.
|
||||
- **Costs.** Requires pipelock to expose a scan API. The current Apache
|
||||
2.0 core does not document one. If `/scan` lives in `enterprise/`,
|
||||
ELv2 applies. If it doesn't exist, we'd be asking pipelock for a new
|
||||
surface.
|
||||
- **Variant.** Squid's ICAP path is the formalized version of the same
|
||||
pattern.
|
||||
|
||||
### E — Single container, two processes
|
||||
|
||||
mitmproxy and pipelock share a container, started by `supervisord` or
|
||||
`s6-overlay`. Networking simplifies to localhost. Lifecycle complicates:
|
||||
container restart now means restarting both; failure of one process is
|
||||
not visible at the Docker layer; logs interleave.
|
||||
|
||||
- **Wins.** Slightly less Docker plumbing in `cli.py`.
|
||||
- **Costs.** Operational complexity not worth the savings. The two
|
||||
containers are independent processes with independent failure modes;
|
||||
Docker is the right tool for that.
|
||||
|
||||
Net: not recommended.
|
||||
|
||||
---
|
||||
|
||||
## CA lifecycle
|
||||
|
||||
The CA private key is the asset to defend. With it, anyone can issue
|
||||
certs that the bottle's trust store will accept for any hostname. So:
|
||||
|
||||
**Per-bottle ephemeral CA.** At bottle start, generate a fresh
|
||||
RSA-2048 or ECDSA-P256 CA inside the mitmproxy sidecar. Export only
|
||||
the public cert (PEM) into the bottle's trust store at one of:
|
||||
|
||||
- `/usr/local/share/ca-certificates/claude-bottle-mitm.crt` followed by
|
||||
`update-ca-certificates` (Debian/Ubuntu base images).
|
||||
- `/etc/pki/ca-trust/source/anchors/` with `update-ca-trust`
|
||||
(Red-Hat-family).
|
||||
- `$NODE_EXTRA_CA_CERTS` for Node-based agents (Claude Code).
|
||||
- `$SSL_CERT_FILE` / `$REQUESTS_CA_BUNDLE` for Python SDKs.
|
||||
|
||||
The private key never leaves the sidecar's filesystem. The CA cert
|
||||
public half is the only artifact that crosses into the bottle.
|
||||
|
||||
On bottle teardown, the sidecar container is destroyed; the CA dies
|
||||
with it. The next bottle gets a fresh CA. No long-lived MITM CA on
|
||||
disk.
|
||||
|
||||
**Why not a shared per-host CA.** A persistent CA across bottles is
|
||||
faster (no generation at start) but is a real liability: if any bottle
|
||||
exfiltrates the CA cert public half (which it can — it's in the trust
|
||||
store by design), an attacker on the host network could in principle
|
||||
impersonate any host to any bottle. With a per-bottle CA, the exfil
|
||||
gains nothing: the CA is bottle-local and dies in minutes.
|
||||
|
||||
**Generation cost.** RSA-2048 CA generation is ~200 ms; ECDSA-P256 is
|
||||
~5 ms. Either is irrelevant against the per-bottle Docker pull and
|
||||
network setup cost.
|
||||
|
||||
**Where the CA lives in the bottle's trust store.** Both: a
|
||||
distribution-standard path with `update-ca-certificates`, and the
|
||||
env-var path. Belt and suspenders, because some Node and Python
|
||||
libraries honor the env vars only, and some load only `/etc/ssl/certs/`
|
||||
directly.
|
||||
|
||||
---
|
||||
|
||||
## Cert pinning (brief)
|
||||
|
||||
A client that pins ignores the trust store and refuses any cert whose
|
||||
public key isn't on a hardcoded list. Three observations for this
|
||||
project:
|
||||
|
||||
- The current `DEFAULT_ALLOWLIST` (`api.anthropic.com`,
|
||||
`statsig.anthropic.com`, `sentry.io`, `claude.ai`,
|
||||
`platform.claude.com`, `downloads.claude.ai`,
|
||||
`raw.githubusercontent.com`) does not appear to include any host that
|
||||
pins against server-side SDKs. Server-side SDKs (Node, Python) almost
|
||||
universally honor system trust and `NODE_EXTRA_CA_CERTS` /
|
||||
`SSL_CERT_FILE`. Mobile SDKs and Chromium pin; we don't run those.
|
||||
- If a future allowlisted host turns out to pin, the mitigation is
|
||||
selective bumping via mitmproxy `ignore_hosts`: that specific
|
||||
hostname tunnels blindly and pipelock loses DLP coverage for it.
|
||||
Coverage on every other host is unaffected.
|
||||
- The cost of finding out: a single 5-minute test before adding a host
|
||||
— point mitmproxy at the host, observe whether the client succeeds.
|
||||
|
||||
Not a v1 blocker. Document the failure mode and the mitigation.
|
||||
|
||||
---
|
||||
|
||||
## Comparison table
|
||||
|
||||
| | A: mitmproxy → pipelock | B: pipelock → mitmproxy | C: TLS in pipelock | D: mitmproxy + scan API | E: one container |
|
||||
|---|---|---|---|---|---|
|
||||
| Pipelock sees plaintext | yes | no | yes | yes (via /scan) | yes |
|
||||
| Code change to pipelock | none | none | substantial | adds /scan endpoint | none |
|
||||
| Sidecar count | 2 | 2 | 1 | 2 | 1 |
|
||||
| Cert generation owner | mitmproxy | mitmproxy | pipelock | mitmproxy | mitmproxy |
|
||||
| Selective bumping | mitmproxy `ignore_hosts` | mitmproxy `ignore_hosts` | pipelock config | mitmproxy `ignore_hosts` | mitmproxy `ignore_hosts` |
|
||||
| Failure isolation per process | yes | yes | n/a (one process) | yes | no (shared container) |
|
||||
| License question | none | none | ELv2 risk | ELv2 risk | none |
|
||||
| v1 effort | low | low (but pointless) | high | medium | low |
|
||||
| Long-term shape | interim | n/a | best | possible | not recommended |
|
||||
|
||||
---
|
||||
|
||||
## Recommendation
|
||||
|
||||
**Adopt Topology A for v1.** Add a mitmproxy sidecar to the egress
|
||||
topology, in front of pipelock on the same per-bottle internal network.
|
||||
The agent's `HTTPS_PROXY` points at mitmproxy; mitmproxy's upstream is
|
||||
pipelock; pipelock's upstream is the real internet.
|
||||
|
||||
Concretely:
|
||||
|
||||
1. Add a `MitmproxyProxy` class alongside `PipelockProxy`, with the
|
||||
same `prepare` / `start` / `stop` lifecycle. The class generates
|
||||
a per-bottle CA in `stage_dir`, exports the public cert into a
|
||||
second file, and writes a mitmproxy config that:
|
||||
- bumps every CONNECT by default
|
||||
- uses `upstream_proxy = http://pipelock-<slug>:<port>`
|
||||
- listens on a known port inside the per-bottle internal network
|
||||
2. Extend the bottle launch step to copy the CA public cert into the
|
||||
agent container under
|
||||
`/usr/local/share/ca-certificates/claude-bottle-mitm.crt`, run
|
||||
`update-ca-certificates`, and set `NODE_EXTRA_CA_CERTS` /
|
||||
`SSL_CERT_FILE` / `REQUESTS_CA_BUNDLE` accordingly.
|
||||
3. Repoint the agent's `HTTPS_PROXY` and `HTTP_PROXY` from the pipelock
|
||||
container to the mitmproxy container.
|
||||
4. Verify mitmproxy's upstream-proxy mode forwards plaintext (not a
|
||||
re-wrapped CONNECT) to pipelock; if not, use `regular` mode with a
|
||||
chained proxy directive.
|
||||
5. Test that pipelock's DLP, subdomain-entropy, and MCP scanners now
|
||||
fire on real request bodies for `api.anthropic.com` traffic.
|
||||
|
||||
**Defer Topologies C and D.** Topology C (extending pipelock to
|
||||
terminate TLS) is the cleanest long-term shape but is a substantial
|
||||
build and runs into the Apache 2.0 vs. ELv2 question. Topology D
|
||||
(mitmproxy with pipelock as a scan API) is attractive but requires a
|
||||
pipelock surface that doesn't exist today. Both are valid v2 targets;
|
||||
neither is the right starting point.
|
||||
|
||||
The `network-egress-guard.md` v1 iptables + dnsmasq layer remains
|
||||
necessary alongside this — TLS interception covers HTTP/HTTPS only;
|
||||
raw TCP, UDP/443 (QUIC), UDP/53 (DNS), and ICMP still need the
|
||||
IP-level default-deny.
|
||||
|
||||
---
|
||||
|
||||
## Open questions
|
||||
|
||||
1. **mitmproxy upstream-proxy mode mechanics.** Does mitmproxy in
|
||||
`upstream_proxy` mode forward decrypted HTTP plaintext to the
|
||||
upstream, or does it wrap it in a new CONNECT? The documented
|
||||
behavior changed between mitmproxy 8 and 10. Needs verification
|
||||
against the version we pin.
|
||||
2. **Pipelock's behavior when receiving plain HTTP.** Pipelock's
|
||||
`forward_proxy.enabled: true` accepts both `GET http://...` (plain
|
||||
HTTP) and `CONNECT host:443` (HTTPS). After Topology A is wired up,
|
||||
pipelock will see only plain HTTP — does its DLP / MCP scanning
|
||||
pipeline run the full set of layers, or are some gated on the
|
||||
CONNECT path? Confirm by reading
|
||||
`github.com/luckyPipewrench/pipelock/blob/main/docs/configuration.md`.
|
||||
3. **CA installation in the Anthropic-provided Claude Code Docker image.**
|
||||
The base image's distribution determines whether `update-ca-certificates`
|
||||
(Debian/Ubuntu) or `update-ca-trust` (Red Hat) is the right command.
|
||||
The current `Dockerfile` should be inspected before assuming Debian.
|
||||
4. **HTTP/2 over the agent → mitmproxy hop.** Node's HTTP client
|
||||
negotiates `h2` via ALPN. mitmproxy speaks `h2` to clients in recent
|
||||
versions. Confirm the version we pin supports `h2` end-to-end and
|
||||
doesn't downgrade to `http/1.1` (which would be a silent
|
||||
performance regression).
|
||||
5. **Selective-bump policy surface.** Where does the
|
||||
"tunnel this hostname blindly" decision live? Options: a field on
|
||||
`bottle.egress` in the manifest, a fixed list of known-pinning
|
||||
hosts baked into the mitmproxy config, or pipelock-side opt-out.
|
||||
Manifest field is most consistent with the existing
|
||||
`bottle.egress.allowlist` shape.
|
||||
6. **Image pin for mitmproxy.** The `pipelock-assessment.md`
|
||||
recommendation is to pin by digest. The mitmproxy Docker Hub image
|
||||
should be pinned the same way. Which release line? `mitmproxy/mitmproxy`
|
||||
ships rolling and tagged versions; the tagged `:11.x` line is the
|
||||
right baseline.
|
||||
7. **CA generation in Python (mitmproxy) vs. as a separate step.**
|
||||
mitmproxy generates a CA on first launch if none is provided. For
|
||||
per-bottle ephemerality, we want the CA to be ours, not whatever
|
||||
mitmproxy chooses — so generate the CA in the host-side prepare
|
||||
step and inject it via `--certs *=...`. Mechanics need confirming.
|
||||
8. **Domain fronting verification.** Once pipelock sees plaintext, it
|
||||
has access to the inner `Host` / `:authority`. A new rule that
|
||||
compares it against the outer `CONNECT` target catches domain
|
||||
fronting. Worth a follow-up note on whether pipelock has such a
|
||||
rule or whether we add it.
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- mitmproxy: <https://mitmproxy.org>, <https://github.com/mitmproxy/mitmproxy>
|
||||
- mitmproxy `upstream_proxy` mode: <https://docs.mitmproxy.org/stable/concepts/modes/#upstream-proxy>
|
||||
- mitmproxy CA cert installation: <https://docs.mitmproxy.org/stable/concepts/certificates/>
|
||||
- Squid `ssl_bump`: <https://wiki.squid-cache.org/Features/SslPeekAndSplice>
|
||||
- Squid ICAP: <https://wiki.squid-cache.org/Features/ICAP>
|
||||
- `goproxy`: <https://github.com/elazarl/goproxy>
|
||||
- `gomitmproxy`: <https://github.com/AdguardTeam/gomitmproxy>
|
||||
- `martian`: <https://github.com/google/martian>
|
||||
- Node TLS / `NODE_EXTRA_CA_CERTS`: <https://nodejs.org/api/cli.html#node_extra_ca_certsfile>
|
||||
- Python `SSL_CERT_FILE` and `REQUESTS_CA_BUNDLE`: <https://docs.python.org/3/library/ssl.html#ssl.SSLContext.load_verify_locations>
|
||||
- Prior research — pipelock assessment: `docs/research/pipelock-assessment.md`
|
||||
- Prior research — network egress guard: `docs/research/network-egress-guard.md`
|
||||
- Prior research — secret exfil tripwire encodings: `docs/research/secret-exfil-tripwire-encodings.md`
|
||||
|
||||
Research conducted 2026-05-12.
|
||||
Reference in New Issue
Block a user