docs(prd): add 0006, enable pipelock's native TLS interception
test / unit (pull_request) Successful in 12s
test / integration (pull_request) Successful in 13s

Supersedes the abandoned PR #8 (`mitmproxy-tls-interception`),
which built a mitmproxy + addon chain on the (falsified) premise
that pipelock could not MITM. Empirical proof from the impl-time
spike: with `tls_interception: { enabled: true, ca_cert, ca_key }`
in pipelock's config, pipelock answered a credential POST over
HTTPS with `STATUS=403 / body: blocked: request body contains
secret: GitHub Token` and emitted both `scanner:"tls_intercept"`
and `scanner:"body_dlp"` events. Standalone, no second proxy.

Net change vs PR #8: one sidecar instead of two, no vendored
addon, no addon-verdict pattern matching, no HTTPS-trust /
DNS / lookup workarounds. Same end-state behavior — pipelock's
DLP fires on plaintext for HTTPS hosts in the allowlist.

Also cleaning up the now-stale TLS-research notes:

- `docs/research/tls-mitm-for-pipelock.md` is removed. Its
  entire premise (mitmproxy in front of pipelock) is moot now
  that pipelock does the work natively. The mechanics of CONNECT
  bumping and the CA-lifecycle considerations it documented are
  the same as what pipelock implements; the PRD restates the
  parts that matter for the integration.
- `docs/research/pipelock-assessment.md` had two stale claims
  corrected: the "Pipelock does not perform TLS inspection (no
  CA trust injection)" line in §Scope gaps and the
  "no TLS termination" cell in the comparison table. Both now
  point at the `tls_interception` config and `pipelock tls`
  CLI instead.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-12 14:15:44 -04:00
parent e45cd2fb07
commit 6716f091c1
3 changed files with 312 additions and 513 deletions
+303
View File
@@ -0,0 +1,303 @@
# PRD 0006: pipelock native TLS interception
- **Status:** Draft
- **Author:** didericis
- **Created:** 2026-05-12
## Summary
Turn on pipelock's built-in `tls_interception` so its DLP / URL /
header / MCP scanners fire on the plaintext of HTTPS requests
instead of only the outer `CONNECT` hostname. Pipelock generates a
per-bottle ephemeral CA at launch (`pipelock tls init`); the
public cert is installed into the agent container's trust store
and the private key dies with the sidecar on teardown. The
existing per-agent sidecar topology from PRD 0001 is otherwise
unchanged — one container, no addon, no second proxy.
This supersedes the closed PR #8 / branch `mitmproxy-tls-interception`,
which built a mitmproxy + addon chain on the (falsified) premise
that pipelock could not MITM. Empirical proof from the impl-time
spike: with `tls_interception: { enabled: true, ca_cert, ca_key }`
in the pipelock config, pipelock answered a credential POST over
HTTPS with `STATUS=403 / body: blocked: request body contains
secret: GitHub Token` and emitted both
`scanner:"tls_intercept"` and `scanner:"body_dlp"` events.
## Problem
PRD 0001 wired pipelock onto every bottle's egress, but pipelock
ran with its default `tls_interception.enabled: false`. The agent
container's only egress route is pipelock, but pipelock only saw
`CONNECT` hostnames and the encrypted bytes inside the tunnel.
Pipelock's headline scanners — request body DLP (48 credential
patterns), header DLP, URL DLP, subdomain entropy, MCP scanning,
response-body scanning — all need plaintext to fire. Against the
HTTPS-only hosts in `DEFAULT_ALLOWLIST` (`api.anthropic.com`,
`raw.githubusercontent.com`, etc.) they are effectively disabled.
The existing `tests/integration/test_pipelock_blocks_secret_post`
test only fires because it forces the agent to send plain HTTP
through pipelock's forward-proxy mode. Real Claude Code traffic
uses HTTPS via CONNECT and slips past the scanner.
## Goals / Success Criteria
The feature works when all of the following are observable:
- A Node / curl request from inside a launched bottle to a
CONNECT-bumped HTTPS host (e.g. `https://api.anthropic.com/dlp-probe`)
carrying a pipelock-recognized credential pattern in the body
returns 403 from pipelock with the documented
`blocked: request body contains secret: …` body. Pipelock's
`body_dlp` event fires on the decrypted request.
- A clean HTTPS GET from inside the bottle to an allowlisted host
(e.g. `https://raw.githubusercontent.com/...`) returns the real
upstream response — TLS interception doesn't break legitimate
traffic.
- The agent's TLS library trusts pipelock's bumped leaf certs
(per the bottle's installed CA); no TLS-trust errors.
- Claude Code reaches `api.anthropic.com` end-to-end through the
bottle and completes a chat round-trip.
The feature is **done** when all of the following ship:
- `pipelock_build_config` / `pipelock_render_yaml` emit a
`tls_interception` block with `enabled: true` and the per-bottle
CA cert/key paths. The defaults
(`cert_ttl: 24h`, `cert_cache_size: 10000`,
`passthrough_domains: []`) are kept; only `enabled` and the
cert paths are populated.
- The prepare step generates a per-bottle CA via `pipelock tls init`
in a one-shot container, writes `ca.pem` and `ca-key.pem` to
`stage_dir`. Paths land on the `DockerBottlePlan`.
- `DockerPipelockProxy.start` mounts the stage dir into the
sidecar (read-only) so the running pipelock can read its CA.
- `BottleBackend.provision_ca` (new) copies the CA public cert
into the agent at
`/usr/local/share/ca-certificates/claude-bottle-mitm.crt`, runs
`update-ca-certificates`, and sets the `NODE_EXTRA_CA_CERTS` /
`SSL_CERT_FILE` / `REQUESTS_CA_BUNDLE` env trio on the agent
container's runtime env. Default no-op on the abstract base so
other backends aren't forced to implement.
- The launch step prints a one-line stderr log with the SHA-256
fingerprint of the public CA cert (computed via stdlib
`ssl.PEM_cert_to_DER_cert` + `hashlib.sha256`).
- On bottle teardown the sidecar is removed and the CA private
key is gone with it.
- Two new integration tests under `tests/integration/`:
- HTTPS variant of the credential-post block test (proves the
`tls_intercept` + `body_dlp` chain fires end-to-end).
- Clean HTTPS GET test (proves the allow path doesn't break TLS
trust and returns real upstream content).
- The dry-run preflight (`start --dry-run`) renders the new TLS
layer. Text: one line under the egress summary. JSON: a
reserved `egress.tls_interception: { enabled: true,
ca_fingerprint: null }` block — fingerprint is null at dry-run
because the CA only exists after launch.
## Non-goals
- A second proxy in the chain. Pipelock does the bumping
natively; the mitmproxy approach was based on a wrong premise
(closed PR #8).
- Per-bottle override to disable interception. v1 always enables
`tls_interception`. The pipelock-side `passthrough_domains`
list is the right knob if a future allowlisted host turns out
to pin certs — exposing it through the manifest is a follow-up.
- A long-lived / shared CA across bottles. Each bottle gets a
fresh CA generated by `pipelock tls init` and destroyed with the
sidecar.
- Tuning `cert_ttl`, `cert_cache_size`, `max_response_bytes`,
`cross_request_detection`, or other pipelock advanced features.
Defaults from `pipelock generate config --preset strict` are
fine for v1.
- Trust-store paths for non-Debian agent images.
`node:22-slim` is Debian; `update-ca-certificates` is the right
command. A Red-Hat-family base would need `update-ca-trust`.
- HTTP/3 / QUIC. Pipelock's interception is HTTP/HTTPS-over-TLS;
UDP/443 still needs an iptables layer (separate PRD).
## Scope
### In scope
- **`claude_bottle/pipelock.py`** changes:
- Extend `pipelock_build_config` to include
`tls_interception: { enabled: true, ca_cert: <path>, ca_key:
<path> }`. Paths are populated from the plan; the function's
signature grows a `cert_path` / `key_path` pair or reads them
off `Bottle` once they're stored.
- Extend `pipelock_render_yaml` to emit the new block.
- **`claude_bottle/backend/docker/pipelock.py`** changes:
- New helper `pipelock_tls_init(stage_dir)` runs the upstream
image as a one-shot:
`docker run --rm -v <stage>:/h -e PIPELOCK_HOME=/h pipelock tls init`,
leaving `ca.pem` and `ca-key.pem` under `stage_dir`. The host
file owner is whatever the upstream image's user is; the
sidecar mount is read-only so this is fine.
- `DockerPipelockProxy.start` mounts the stage dir into the
sidecar at `/h:ro` and references the CA paths in the rendered
YAML.
- **`claude_bottle/backend/__init__.py`**: new abstract method
`provision_ca(plan, target)` on `BottleBackend`, default no-op.
`BottleBackend.provision` orchestrates `ca → prompt → skills →
ssh → git`.
- **`claude_bottle/backend/docker/provision/ca.py`** (new):
- Reads the cert from `stage_dir` (already written by prepare).
- `docker cp` into the agent.
- `docker exec -u 0 ... chmod 644 ...` + `update-ca-certificates`.
- Computes the SHA-256 fingerprint with stdlib (`ssl` +
`hashlib`), emits one stderr log line.
- **`claude_bottle/backend/docker/launch.py`**:
- Three new `-e` flags on the agent's `docker run`:
`NODE_EXTRA_CA_CERTS=/usr/local/share/ca-certificates/claude-bottle-mitm.crt`,
`SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt`,
`REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt`.
- `HTTPS_PROXY` / `HTTP_PROXY` continue to point at pipelock
(unchanged from PRD 0001 — the mitmproxy detour in PR #8 is
abandoned).
- **`claude_bottle/backend/docker/bottle_plan.py`**:
- One new `info(...)` line in `print()` noting TLS interception
is on.
- `to_dict()` gains an `egress.tls_interception: { enabled:
true, ca_fingerprint: null }` block. Reserved for future
population.
- **`claude_bottle/backend/docker/prepare.py`**: call
`pipelock_tls_init(stage_dir)` and write the resolved cert/key
paths onto the plan (either on the existing `proxy_plan` field
or on the parent `DockerBottlePlan`).
- **Tests:**
- `tests/integration/test_pipelock_blocks_secret_https_post.py`
(new) — HTTPS variant of the existing block test.
- `tests/integration/test_pipelock_allows_normal_https.py`
(new) — clean HTTPS GET succeeds.
- `tests/unit/test_pipelock_yaml.py` updated to assert the new
`tls_interception` block in the rendered config.
- `tests/integration/test_dry_run_plan.py` updated to assert
the new `egress.tls_interception` JSON block.
### Out of scope
- Modifying pipelock itself. We're using existing config knobs.
- A manifest field to disable / customize interception per bottle.
Doable but premature.
- Wiring `passthrough_domains`. The default `[]` is correct for
v1; add the manifest field when a pinning host shows up.
- `cross_request_detection`, `entropy_budget`,
`fragment_reassembly`, `reverse_proxy`, `scan_api` — features
pipelock exposes but we don't need for the body-DLP gap.
## Proposed Design
### Topology
```
agent --HTTPS_PROXY--> pipelock --[bumps TLS]--> internet
(sees plaintext: URL, headers, body)
```
Same single-sidecar shape as PRD 0001. The only addition is
`tls_interception` in pipelock's config plus the per-bottle CA
generated at prepare time.
### CA lifecycle
- **Generation.** Host-side, at prepare time, via a one-shot
`docker run --rm -v <stage>:/h pipelock tls init`. Output is
`<stage>/ca.pem` + `<stage>/ca-key.pem`, both mode 600.
- **Sidecar mount.** `DockerPipelockProxy.start` adds
`-v <stage>:/h:ro` to the sidecar's `docker run`. The rendered
YAML references `/h/ca.pem` and `/h/ca-key.pem`. The private
key is read-only from pipelock's perspective; the host stage
dir is owned by the launching user.
- **Bottle install.** `provision_ca` (Docker impl) does
`docker cp <stage>/ca.pem agent:/usr/local/share/ca-certificates/claude-bottle-mitm.crt`,
then `update-ca-certificates`. The CA env trio is set at
`docker run -e` time (Docker propagates run-time env into
`docker exec`, verified in PR #8's spike).
- **Teardown.** The sidecar container is destroyed, the stage
dir is removed by `start.py`'s existing `finally` block, and
the CA dies with both.
- **Fingerprint.** Computed via stdlib in `provision_ca` and
logged once to stderr (`claude-bottle: mitm ca fingerprint:
sha256:<hex>…`). The private key never appears in any log.
### Data model changes
None to the manifest schema. The dry-run JSON contract grows a
reserved `egress.tls_interception` block; the fingerprint is
always null at dry-run because the CA doesn't exist yet.
### Existing code touched
Surgical, all on the existing pipelock path:
- `claude_bottle/pipelock.py` — config builder + YAML renderer.
- `claude_bottle/backend/__init__.py` — abstract `provision_ca`.
- `claude_bottle/backend/docker/pipelock.py` — `tls init` helper,
sidecar volume mount.
- `claude_bottle/backend/docker/prepare.py` — CA paths on plan.
- `claude_bottle/backend/docker/launch.py` — CA env trio on agent.
- `claude_bottle/backend/docker/backend.py` — `provision_ca`
dispatch + thread `self._proxy` through prepare/launch unchanged
shape.
- `claude_bottle/backend/docker/bottle_plan.py` — preflight
rendering.
- `claude_bottle/backend/docker/provision/ca.py` (new).
Net diff is meaningfully smaller than PR #8 because pipelock
already does the work — no addon, no second sidecar, no second
backend module.
### External dependencies
- **Pipelock image** — unchanged pin from PRD 0001
(`ghcr.io/luckypipewrench/pipelock@sha256:3b1a3941`,
matching pipelock v2.3.0). No new image dependency.
- **No host-side crypto deps.** CA generation uses the pipelock
image's own `tls init` command in a one-shot container.
Fingerprint uses Python stdlib `ssl` + `hashlib`.
## Open questions
- **Mount semantics for the stage dir.** The sidecar runs with a
`-v <host-stage>:/h:ro` bind mount. The CA files were written by
the one-shot `pipelock tls init` container with whatever UID
pipelock's image uses; the sidecar reads them as that same UID.
Should work, but confirm on first impl by inspecting the file
modes/owners and that the sidecar actually loads them. Fallback:
`docker cp` the cert/key into the running sidecar after `docker
create` (mirror PR #8's mitmproxy lifecycle).
- **Cert validity / TTL.** Defaults are `cert_ttl: 24h` for
per-host leaves; the CA validity from `pipelock tls init` is
10 years by default (`--validity 87600h`). The CA outlives the
bottle either way; per-bottle ephemerality is enforced by
*generating a fresh one each launch*, not by setting a short
CA validity. Document; no tuning in v1.
- **`passthrough_domains` shape.** Once we expose this through
the manifest in a follow-up, the natural place is
`bottle.egress.tls_passthrough_domains: [host, ...]`, mirroring
the existing `egress.allowlist` shape.
- **Stage-dir cleanup ordering.** The stage dir holds the CA
private key briefly. `start.py`'s existing `finally` block
`shutil.rmtree`s it. Confirm the rmtree fires after the sidecar
is stopped, so the sidecar isn't reading a deleted mount when
it shuts down. The current order is correct (teardown unwinds
via ExitStack before the outer `finally` runs); verify.
## References
- `docs/research/pipelock-assessment.md` (now corrected) —
pipelock capability assessment including the
`tls_interception` block.
- `docs/prds/0001-per-agent-egress-proxy-via-pipelock.md` —
egress-proxy baseline this PRD extends.
- `docs/prds/0003-bottle-backend-abstraction.md` — backend ABC
contract this PRD adds a `provision_ca` method to.
- `docs/prds/0004-split-out-provisioners.md` — per-provisioner
module pattern reused for the new CA provisioner.
- Pipelock `tls` CLI (in-image help):
`pipelock tls init / install-ca / show-ca`.
- Closed PR #8 — earlier mitmproxy-based design built on the
falsified "pipelock can't MITM" premise; archived for context.
+9 -5
View File
@@ -222,10 +222,14 @@ The following threat-model items from `network-egress-guard.md` are
intercept raw UDP 53 packets.
- **Domain fronting**: an agent can send `CONNECT allowed-host.com:443`
through the proxy but embed a different SNI inside the TLS session.
Pipelock does not perform TLS inspection (no CA trust injection) and
cannot verify SNI vs. CONNECT header. The same limitation is shared
with smokescreen and is documented in `network-egress-guard.md` as a
known gap for the non-TLS-terminating proxy approach.
Pipelock supports TLS interception via its `tls_interception` config
block (`enabled`, `ca_cert`, `ca_key`, `cert_ttl`, `cert_cache_size`,
`passthrough_domains`, `max_response_bytes`) plus the `pipelock tls
init` / `install-ca` / `show-ca` CLI; with interception on, the
body and inner Host header become visible to its scanner pipeline,
closing the domain-fronting gap. With interception off (default in
the generated config), pipelock relays the CONNECT as an opaque
tunnel and only sees the outer hostname.
- **SSH egress content**: SSH sessions to permitted hosts are opaque.
Same limitation noted in both prior research notes.
- **Agent killing the proxy process**: if pipelock runs inside the same
@@ -385,7 +389,7 @@ pipelock's differentiators.
| Blocks RFC 1918 by default | only if explicitly added to rules | yes | yes, + DNS rebinding | no |
| Content-based DLP (credential patterns) | no | no | yes, 48 patterns + encoding normalization | no |
| MCP / WebSocket scanning | no | no | yes, bidirectional | no |
| Domain fronting bypass | possible | possible | possible (no TLS termination) | n/a |
| Domain fronting bypass | possible | possible | mitigated when `tls_interception` is enabled (CA trust required in client) | n/a |
| macOS Docker Desktop (sidecar mode) | yes | yes | yes | yes |
| macOS Docker Desktop (in-container sandbox) | yes | n/a | degraded (--best-effort) | yes |
| NET_ADMIN / NET_RAW required | yes | no | no (sidecar) | no |
-508
View File
@@ -1,508 +0,0 @@
# TLS interception for pipelock content scanning
Research into adding TLS termination ("MITM") to the egress path so that
pipelock's scanning pipeline can see plaintext HTTP request and response
bodies, instead of only the `CONNECT` host and opaque ciphertext.
## Summary
- Pipelock today sees `CONNECT` hostnames and the encrypted bytes that follow.
Its DLP, subdomain-entropy, and MCP scanners cannot fire on TLS-encrypted
bodies, which is the gap explicitly named under "Scope gaps" in
`pipelock-assessment.md` ("Pipelock does not perform TLS inspection (no CA
trust injection)").
- Closing that gap requires a TLS-terminating proxy that bumps `CONNECT`,
presents a leaf certificate for the target hostname signed by a CA the
bottle's trust store accepts, decrypts the inner HTTP, and re-establishes
TLS to the real upstream.
- The mature open-source option is **mitmproxy**. Squid + `ssl_bump` is the
heavier production-grade alternative. The Go ecosystem (`goproxy`,
`gomitmproxy`, `martian`) is suitable only if we want a custom binary
tightly coupled to pipelock.
- Recommended v1 topology: **mitmproxy in front of pipelock** on the same
egress route. mitmproxy terminates client TLS, forwards plaintext to
pipelock as its upstream HTTP proxy, and re-encrypts to the real upstream.
Pipelock stays unchanged.
- Per-bottle ephemeral CA, generated at bottle start and destroyed on
teardown. The CA private key lives only on the sidecar; the bottle's
trust store only ever sees the public cert.
- Cert pinning is a known caveat but a small one given the narrow allowlist
in this project. Selective bumping is the mitigation if a future
allowlisted host turns out to pin.
---
## What pipelock cannot see today
The current egress topology (per `pipelock-assessment.md`):
```
agent --HTTPS_PROXY--> pipelock --CONNECT host:443--> internet
\____________________________
opaque TLS bytes
```
The agent's client (Claude Code, `curl`, an MCP server, a Python SDK)
sends `CONNECT api.anthropic.com:443`. Pipelock checks the hostname
against its `api_allowlist`, replies `200 Connection Established`, and
then blindly relays bytes between the two TCP halves. The TLS handshake
and everything inside it happens end-to-end between the agent and the
real upstream.
What pipelock can scan in this mode:
- `CONNECT` target hostname (SNI is not even needed).
- TLS record framing and lengths (useful for budgets, useless for DLP).
- Plain HTTP/1.1 to non-HTTPS destinations (irrelevant — there are none
in `DEFAULT_ALLOWLIST`).
What pipelock cannot scan in this mode:
- Request URL, method, headers, body.
- Response status, headers, body.
- MCP JSON-RPC payloads inside the TLS session.
- WebSocket frames inside a TLS-wrapped upgrade.
- Whether the inner SNI or HTTP `Host` / `:authority` matches the
outer `CONNECT` target (domain-fronting check).
The 48-pattern DLP layer, the subdomain-entropy check (insofar as it
inspects URLs rather than DNS-resolver queries), the request-redaction
feature added in v2.3.0, and bidirectional MCP scanning all require
plaintext to operate on. Without TLS termination, those layers are
inert against any HTTPS destination — which is every destination in
the current allowlist.
---
## How TLS interception works
The mechanics of `CONNECT` bumping, end to end:
1. **Agent issues `CONNECT`.** The HTTP client sees `HTTPS_PROXY` set,
so it opens a TCP connection to the proxy and sends
`CONNECT api.anthropic.com:443 HTTP/1.1`.
2. **Proxy answers `200`.** Standard tunnel-established response.
3. **Proxy starts TLS as the server.** Instead of relaying bytes, the
proxy itself performs a TLS handshake with the agent. It needs a
server certificate for `api.anthropic.com` — so on first contact for
that hostname, the proxy generates a leaf certificate with
`CN=api.anthropic.com` and a SAN for the same, signs it with its
own CA private key, and presents that cert. Subsequent connections
to the same hostname reuse the cached leaf.
4. **Agent verifies the cert.** The agent's TLS library walks the chain
to a trusted root. Because the bottle's trust store contains the
proxy's CA cert, validation succeeds. The agent has no way to tell
it isn't talking to the real `api.anthropic.com`.
5. **Proxy opens its own TLS to the real upstream.** As a client this
time, using the system root store, talking to the real
`api.anthropic.com`. Real SNI, real cert chain validated normally.
6. **Proxy bridges the two TLS sessions.** Decrypts on the server side,
re-encrypts on the client side, and scans the plaintext in between.
This is what every TLS-terminating egress proxy does. The trade-offs
live in three places:
- **CA trust injection.** Step 4 only works if the bottle's trust
store contains the proxy's CA. Mechanics covered under "CA lifecycle"
below.
- **Cert generation cost.** Generating an RSA-2048 leaf cert takes
~50 ms; ECDSA P-256 is ~5 ms. Cache leaves per (hostname, SAN list)
to keep this off the steady-state hot path.
- **Protocol coverage.** The proxy needs to speak HTTP/1.1, HTTP/2 (ALPN
`h2`), and ideally WebSocket. HTTP/3 / QUIC is UDP and requires a
separate code path; for v1, blocking UDP/443 at the iptables layer
forces clients to fall back to HTTP/2, which we can inspect.
---
## Tools
### mitmproxy
- **What it is.** Python (with Rust crypto bits) interactive HTTPS proxy.
Reference open-source implementation of the bump pattern. Ships as
`mitmproxy` (TUI), `mitmweb` (browser UI), and `mitmdump` (headless).
- **Cert handling.** Generates a CA on first run under `~/.mitmproxy/`.
Per-host leaves are generated on demand and cached in memory. Cert
cache keyed by (hostname, SAN extensions inferred from upstream cert).
- **Protocols.** HTTP/1.1, HTTP/2, WebSocket fully supported. HTTP/3
exists as experimental. Raw TCP / non-HTTP TLS supported via
`--mode reverse:` but not in CONNECT-bump mode.
- **Extensibility.** Python addon API. An addon module can inspect or
modify any `request` / `response` / `tcp_message` flow. The pipelock
integration in Topology D below uses this.
- **Selective bumping.** `ignore_hosts` regex; matching CONNECTs are
tunneled blindly instead of bumped. Critical for the cert-pinning
mitigation.
- **Docker image.** `mitmproxy/mitmproxy` on Docker Hub. Single binary
for the CLI, ~80 MB image. Configurable via flags or `~/.mitmproxy/config.yaml`.
- **Project URL.** <https://mitmproxy.org>, <https://github.com/mitmproxy/mitmproxy>.
Most mature, best-documented, lowest-effort integration. Default choice
for v1.
### Squid + ssl_bump
- **What it is.** Squid is a long-running C++ caching proxy.
`ssl_bump` is its TLS-interception feature, controlled by per-CONNECT
actions: `splice` (tunnel blindly), `bump` (decrypt and re-encrypt),
`peek` (look at TLS hello then decide), `stare` (look at server cert
then decide), `terminate` (abort the connection).
- **Cert handling.** Configured via `sslcrtd_program` — a helper that
generates and caches per-host certs. CA cert and key referenced by
PEM paths in `squid.conf`.
- **Protocols.** HTTP/1.1 fully; HTTP/2 to clients via recent versions;
no scripted addons.
- **Extensibility.** ICAP (Internet Content Adaptation Protocol) for
external scanners — Squid POSTs each request/response to an ICAP
service that can modify or reject. This is the formal version of
Topology D below.
- **Production track record.** Used at corporate-proxy scale (large
enterprises, ISPs). Heavyweight for a single-bottle sidecar.
- **Project URL.** <https://wiki.squid-cache.org/Features/SslPeekAndSplice>.
Right tool if pipelock grows an ICAP server endpoint. Otherwise, more
config surface than this project needs.
### Go libraries: goproxy, gomitmproxy, martian
- **`goproxy`** (elazarl) — long-lived Go library, basic CONNECT-bumping
proxy with a handler API. Sparse on HTTP/2.
<https://github.com/elazarl/goproxy>
- **`gomitmproxy`** (AdGuard) — newer, cleaner API; built for AdGuard
Home / DNS-filtering products. HTTP/2 support is partial.
<https://github.com/AdguardTeam/gomitmproxy>
- **`martian`** (Google) — request/response modifier framework with a
JSON-configurable rule engine. Used internally at Google; public
ecosystem thin.
<https://github.com/google/martian>
These are relevant only if we decide to write a custom TLS-terminating
binary that links pipelock's scanning packages directly — Topology C
below. They are not faster than mitmproxy for the v1 sidecar shape;
they are smaller and more direct, at the cost of writing more Go.
### Disqualified
- **Caddy, Envoy, HAProxy.** All can terminate TLS at a reverse-proxy
vhost. None ship a "bump on CONNECT and forward plaintext to a
downstream proxy" mode out of the box. Adapting any of them to this
shape is more work than starting from mitmproxy.
- **Cloudflare Gateway, Zscaler, NetSkope, Forcepoint.** Managed cloud
egress with TLS inspection. Wrong topology — they live outside the
host, not as a per-bottle sidecar, and they require trusting a vendor
with full plaintext.
- **Charles Proxy, Burp Suite.** Closed-source GUI tools for developer
capture and security testing. Not appropriate as headless sidecars.
- **`mitmdump` standalone vs. embedding mitmproxy as a library.** Both
are mitmproxy. Calling out only to note: the project ships both a CLI
and a Python API; addons can be loaded either way.
---
## Topologies
Five candidate topologies, ordered roughly from least to most coupled
between the two components.
### A — mitmproxy in front of pipelock (recommended)
```
agent --HTTPS_PROXY--> mitmproxy --HTTP_PROXY--> pipelock --> internet
(bump TLS) (scan plain) (real TLS)
```
mitmproxy terminates the agent's TLS connection, decrypts, and then
forwards the inner HTTP request to pipelock by treating pipelock as
its own upstream HTTP forward proxy. Pipelock receives plaintext HTTP
exactly as if the agent had used HTTP, applies its full scanning
pipeline, and forwards to mitmproxy's upstream client half — which
re-establishes TLS to the real destination.
Concretely the agent's `HTTPS_PROXY` points at mitmproxy; mitmproxy's
`upstream_proxy` config points at pipelock; pipelock's network reach
includes the real internet.
- **Wins.** Pipelock unchanged. mitmproxy unchanged from default
configuration. Each component has one job. Failure modes are clear
per layer.
- **Costs.** Two sidecars per bottle instead of one. One extra
decrypt / re-encrypt hop, ~515 ms per request in steady state.
- **Open question.** How exactly mitmproxy forwards to pipelock matters
for whether pipelock sees TLS again or only HTTP. mitmproxy's
`upstream` mode wraps the decrypted request in another CONNECT if the
destination is HTTPS — which would re-encrypt before pipelock sees
it, defeating the point. The correct mode is `upstream` with TLS
re-origination disabled, or `regular` mode with a chained proxy. The
v2 release of mitmproxy reworked this; needs verification against the
current docs at integration time.
### B — pipelock in front of mitmproxy (ruled out)
```
agent --HTTPS_PROXY--> pipelock --CONNECT?--> mitmproxy --> internet
(sees CONNECT only) (bump TLS)
```
Pipelock would receive a `CONNECT` and decide to allow or deny based
on hostname, then tunnel to mitmproxy. mitmproxy would terminate TLS
and see plaintext — but pipelock would never see the plaintext, which
is the whole point of the exercise. The scanning still happens (in
mitmproxy), but it isn't pipelock doing it, so we'd need an entirely
different rule engine. Ruled out.
### C — Extend pipelock itself to terminate TLS
Two sub-variants:
**C.1 — Upstream a `tls_terminate` mode.** Submit a feature to
pipelock that adds CONNECT bumping and per-host cert generation in Go,
using `crypto/tls` and the existing scanning packages. Pipelock becomes
a self-contained MITM proxy. License question matters here: the Apache
2.0 core can grow new features in-tree, but if upstream insists this
belongs in `enterprise/` (ELv2), we either accept ELv2 or fork.
**C.2 — Wrap pipelock in a thin Go binary in the same container.** A
small Go program does the TLS half (`CONNECT` parsing, cert generation,
TLS handshake) and pipes plaintext to pipelock over UDS or loopback.
The wrapper is ours; pipelock is unmodified. No license question.
- **Wins.** Single component on the egress path. Pipelock owns the
scanning end-to-end, including domain-fronting checks (SNI vs.
`Host` vs. `CONNECT`).
- **Costs.** Real Go engineering effort. CA generation, cert caching,
TLS handshake, HTTP/2 ALPN negotiation, WebSocket upgrade — all
things mitmproxy already solves.
- **When.** Right shape for v2 or v3 once the v1 mitmproxy-in-front
topology has proven the integration works and the scanning rules are
stable.
### D — mitmproxy as the proxy, pipelock as a content-scan subroutine
```
agent --HTTPS_PROXY--> mitmproxy --> internet
(bump TLS)
|
v
POST /scan to pipelock
<- allow / block / redact
```
A Python addon in mitmproxy sends each decrypted request (and response)
to a pipelock HTTP `/scan` endpoint and gates the flow on the verdict.
mitmproxy handles all networking; pipelock is the rule engine only.
- **Wins.** Clean separation of concerns. Pipelock doesn't have to
speak TLS at all. The addon is small, ~100 lines of Python.
- **Costs.** Requires pipelock to expose a scan API. The current Apache
2.0 core does not document one. If `/scan` lives in `enterprise/`,
ELv2 applies. If it doesn't exist, we'd be asking pipelock for a new
surface.
- **Variant.** Squid's ICAP path is the formalized version of the same
pattern.
### E — Single container, two processes
mitmproxy and pipelock share a container, started by `supervisord` or
`s6-overlay`. Networking simplifies to localhost. Lifecycle complicates:
container restart now means restarting both; failure of one process is
not visible at the Docker layer; logs interleave.
- **Wins.** Slightly less Docker plumbing in `cli.py`.
- **Costs.** Operational complexity not worth the savings. The two
containers are independent processes with independent failure modes;
Docker is the right tool for that.
Net: not recommended.
---
## CA lifecycle
The CA private key is the asset to defend. With it, anyone can issue
certs that the bottle's trust store will accept for any hostname. So:
**Per-bottle ephemeral CA.** At bottle start, generate a fresh
RSA-2048 or ECDSA-P256 CA inside the mitmproxy sidecar. Export only
the public cert (PEM) into the bottle's trust store at one of:
- `/usr/local/share/ca-certificates/claude-bottle-mitm.crt` followed by
`update-ca-certificates` (Debian/Ubuntu base images).
- `/etc/pki/ca-trust/source/anchors/` with `update-ca-trust`
(Red-Hat-family).
- `$NODE_EXTRA_CA_CERTS` for Node-based agents (Claude Code).
- `$SSL_CERT_FILE` / `$REQUESTS_CA_BUNDLE` for Python SDKs.
The private key never leaves the sidecar's filesystem. The CA cert
public half is the only artifact that crosses into the bottle.
On bottle teardown, the sidecar container is destroyed; the CA dies
with it. The next bottle gets a fresh CA. No long-lived MITM CA on
disk.
**Why not a shared per-host CA.** A persistent CA across bottles is
faster (no generation at start) but is a real liability: if any bottle
exfiltrates the CA cert public half (which it can — it's in the trust
store by design), an attacker on the host network could in principle
impersonate any host to any bottle. With a per-bottle CA, the exfil
gains nothing: the CA is bottle-local and dies in minutes.
**Generation cost.** RSA-2048 CA generation is ~200 ms; ECDSA-P256 is
~5 ms. Either is irrelevant against the per-bottle Docker pull and
network setup cost.
**Where the CA lives in the bottle's trust store.** Both: a
distribution-standard path with `update-ca-certificates`, and the
env-var path. Belt and suspenders, because some Node and Python
libraries honor the env vars only, and some load only `/etc/ssl/certs/`
directly.
---
## Cert pinning (brief)
A client that pins ignores the trust store and refuses any cert whose
public key isn't on a hardcoded list. Three observations for this
project:
- The current `DEFAULT_ALLOWLIST` (`api.anthropic.com`,
`statsig.anthropic.com`, `sentry.io`, `claude.ai`,
`platform.claude.com`, `downloads.claude.ai`,
`raw.githubusercontent.com`) does not appear to include any host that
pins against server-side SDKs. Server-side SDKs (Node, Python) almost
universally honor system trust and `NODE_EXTRA_CA_CERTS` /
`SSL_CERT_FILE`. Mobile SDKs and Chromium pin; we don't run those.
- If a future allowlisted host turns out to pin, the mitigation is
selective bumping via mitmproxy `ignore_hosts`: that specific
hostname tunnels blindly and pipelock loses DLP coverage for it.
Coverage on every other host is unaffected.
- The cost of finding out: a single 5-minute test before adding a host
— point mitmproxy at the host, observe whether the client succeeds.
Not a v1 blocker. Document the failure mode and the mitigation.
---
## Comparison table
| | A: mitmproxy → pipelock | B: pipelock → mitmproxy | C: TLS in pipelock | D: mitmproxy + scan API | E: one container |
|---|---|---|---|---|---|
| Pipelock sees plaintext | yes | no | yes | yes (via /scan) | yes |
| Code change to pipelock | none | none | substantial | adds /scan endpoint | none |
| Sidecar count | 2 | 2 | 1 | 2 | 1 |
| Cert generation owner | mitmproxy | mitmproxy | pipelock | mitmproxy | mitmproxy |
| Selective bumping | mitmproxy `ignore_hosts` | mitmproxy `ignore_hosts` | pipelock config | mitmproxy `ignore_hosts` | mitmproxy `ignore_hosts` |
| Failure isolation per process | yes | yes | n/a (one process) | yes | no (shared container) |
| License question | none | none | ELv2 risk | ELv2 risk | none |
| v1 effort | low | low (but pointless) | high | medium | low |
| Long-term shape | interim | n/a | best | possible | not recommended |
---
## Recommendation
**Adopt Topology A for v1.** Add a mitmproxy sidecar to the egress
topology, in front of pipelock on the same per-bottle internal network.
The agent's `HTTPS_PROXY` points at mitmproxy; mitmproxy's upstream is
pipelock; pipelock's upstream is the real internet.
Concretely:
1. Add a `MitmproxyProxy` class alongside `PipelockProxy`, with the
same `prepare` / `start` / `stop` lifecycle. The class generates
a per-bottle CA in `stage_dir`, exports the public cert into a
second file, and writes a mitmproxy config that:
- bumps every CONNECT by default
- uses `upstream_proxy = http://pipelock-<slug>:<port>`
- listens on a known port inside the per-bottle internal network
2. Extend the bottle launch step to copy the CA public cert into the
agent container under
`/usr/local/share/ca-certificates/claude-bottle-mitm.crt`, run
`update-ca-certificates`, and set `NODE_EXTRA_CA_CERTS` /
`SSL_CERT_FILE` / `REQUESTS_CA_BUNDLE` accordingly.
3. Repoint the agent's `HTTPS_PROXY` and `HTTP_PROXY` from the pipelock
container to the mitmproxy container.
4. Verify mitmproxy's upstream-proxy mode forwards plaintext (not a
re-wrapped CONNECT) to pipelock; if not, use `regular` mode with a
chained proxy directive.
5. Test that pipelock's DLP, subdomain-entropy, and MCP scanners now
fire on real request bodies for `api.anthropic.com` traffic.
**Defer Topologies C and D.** Topology C (extending pipelock to
terminate TLS) is the cleanest long-term shape but is a substantial
build and runs into the Apache 2.0 vs. ELv2 question. Topology D
(mitmproxy with pipelock as a scan API) is attractive but requires a
pipelock surface that doesn't exist today. Both are valid v2 targets;
neither is the right starting point.
The `network-egress-guard.md` v1 iptables + dnsmasq layer remains
necessary alongside this — TLS interception covers HTTP/HTTPS only;
raw TCP, UDP/443 (QUIC), UDP/53 (DNS), and ICMP still need the
IP-level default-deny.
---
## Open questions
1. **mitmproxy upstream-proxy mode mechanics.** Does mitmproxy in
`upstream_proxy` mode forward decrypted HTTP plaintext to the
upstream, or does it wrap it in a new CONNECT? The documented
behavior changed between mitmproxy 8 and 10. Needs verification
against the version we pin.
2. **Pipelock's behavior when receiving plain HTTP.** Pipelock's
`forward_proxy.enabled: true` accepts both `GET http://...` (plain
HTTP) and `CONNECT host:443` (HTTPS). After Topology A is wired up,
pipelock will see only plain HTTP — does its DLP / MCP scanning
pipeline run the full set of layers, or are some gated on the
CONNECT path? Confirm by reading
`github.com/luckyPipewrench/pipelock/blob/main/docs/configuration.md`.
3. **CA installation in the Anthropic-provided Claude Code Docker image.**
The base image's distribution determines whether `update-ca-certificates`
(Debian/Ubuntu) or `update-ca-trust` (Red Hat) is the right command.
The current `Dockerfile` should be inspected before assuming Debian.
4. **HTTP/2 over the agent → mitmproxy hop.** Node's HTTP client
negotiates `h2` via ALPN. mitmproxy speaks `h2` to clients in recent
versions. Confirm the version we pin supports `h2` end-to-end and
doesn't downgrade to `http/1.1` (which would be a silent
performance regression).
5. **Selective-bump policy surface.** Where does the
"tunnel this hostname blindly" decision live? Options: a field on
`bottle.egress` in the manifest, a fixed list of known-pinning
hosts baked into the mitmproxy config, or pipelock-side opt-out.
Manifest field is most consistent with the existing
`bottle.egress.allowlist` shape.
6. **Image pin for mitmproxy.** The `pipelock-assessment.md`
recommendation is to pin by digest. The mitmproxy Docker Hub image
should be pinned the same way. Which release line? `mitmproxy/mitmproxy`
ships rolling and tagged versions; the tagged `:11.x` line is the
right baseline.
7. **CA generation in Python (mitmproxy) vs. as a separate step.**
mitmproxy generates a CA on first launch if none is provided. For
per-bottle ephemerality, we want the CA to be ours, not whatever
mitmproxy chooses — so generate the CA in the host-side prepare
step and inject it via `--certs *=...`. Mechanics need confirming.
8. **Domain fronting verification.** Once pipelock sees plaintext, it
has access to the inner `Host` / `:authority`. A new rule that
compares it against the outer `CONNECT` target catches domain
fronting. Worth a follow-up note on whether pipelock has such a
rule or whether we add it.
---
## References
- mitmproxy: <https://mitmproxy.org>, <https://github.com/mitmproxy/mitmproxy>
- mitmproxy `upstream_proxy` mode: <https://docs.mitmproxy.org/stable/concepts/modes/#upstream-proxy>
- mitmproxy CA cert installation: <https://docs.mitmproxy.org/stable/concepts/certificates/>
- Squid `ssl_bump`: <https://wiki.squid-cache.org/Features/SslPeekAndSplice>
- Squid ICAP: <https://wiki.squid-cache.org/Features/ICAP>
- `goproxy`: <https://github.com/elazarl/goproxy>
- `gomitmproxy`: <https://github.com/AdguardTeam/gomitmproxy>
- `martian`: <https://github.com/google/martian>
- Node TLS / `NODE_EXTRA_CA_CERTS`: <https://nodejs.org/api/cli.html#node_extra_ca_certsfile>
- Python `SSL_CERT_FILE` and `REQUESTS_CA_BUNDLE`: <https://docs.python.org/3/library/ssl.html#ssl.SSLContext.load_verify_locations>
- Prior research — pipelock assessment: `docs/research/pipelock-assessment.md`
- Prior research — network egress guard: `docs/research/network-egress-guard.md`
- Prior research — secret exfil tripwire encodings: `docs/research/secret-exfil-tripwire-encodings.md`
Research conducted 2026-05-12.