Files
bot-bottle/docs/prds/0005-mitmproxy-tls-interception.md
T
didericis c2eacac49f
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 15s
docs(prd): update 0005 after open-question walkthrough
Re-grounds the design after walking the eight original open
questions interactively. Two structural changes:

- Topology A → A'. A spike confirmed mitmproxy's `upstream` mode
  re-wraps decrypted flows in a new CONNECT to the upstream proxy,
  which would have left pipelock seeing only ciphertext (the very
  gap this PRD set out to close). The fix is to run mitmproxy in
  `regular` mode and ship a vendored Python addon that forwards
  each decrypted request to pipelock as a plain HTTP forward-proxy
  call. Pipelock is unchanged.
- mitmproxy owns CA generation. The research note's preference
  for a host-side openssl / cryptography CA turned out to be
  unnecessary — mitmproxy generates a fresh CA on startup; the
  public cert is `docker cp`'d into the agent. No new host-side
  crypto deps. Dry-run can't render a fingerprint (CA doesn't
  exist yet); launches print it once to stderr.

Other Q3–Q8 resolutions folded in: Debian-base `update-ca-certificates`
confirmed, mitmproxy 12 verified to speak h2 on both halves,
selective-bump deferred to v2, response-body and MCP scanning
deferred to v2, domain-fronting deferred to v2.

Open questions rewritten — what remains is addon-implementation
specifics (pipelock 403-body fingerprint, env-var inheritance
through docker exec, addon test fixtures).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-12 12:54:27 -04:00

438 lines
21 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# PRD 0005: mitmproxy TLS interception for pipelock content scanning
- **Status:** Draft (updated 2026-05-12 after open-question walkthrough)
- **Author:** didericis
- **Created:** 2026-05-12
## Summary
Add a per-bottle **mitmproxy** sidecar in front of pipelock on the
egress path. mitmproxy bumps the agent's TLS CONNECT, decrypts the
inner HTTP, and hands each request to a vendored Python addon. The
addon forwards the decrypted request to pipelock as a plain HTTP
forward-proxy call so pipelock's DLP, URL-scan, and header-scan
layers fire on real bodies. On the verdict, the addon either
short-circuits the flow with a 403 (block) or lets mitmproxy
proceed to the real upstream (allow). mitmproxy itself generates
the ephemeral per-bottle CA on startup; the public cert is copied
into the agent's trust store and the private key dies with the
sidecar on teardown.
This is Topology A' from `docs/research/tls-mitm-for-pipelock.md`
a variant of the research note's Topology A after a spike showed
mitmproxy's `upstream` mode re-wraps decrypted flows in a new
CONNECT to the upstream proxy (which would defeat the entire
point). The addon recovers the design by emitting plain HTTP to
pipelock explicitly instead of relying on mitmproxy's `upstream`
chaining.
## Problem
PRD 0001 wired pipelock onto every bottle's egress, but the current
topology only sees `CONNECT` hostnames and opaque TLS bytes:
```
agent --HTTPS_PROXY--> pipelock --CONNECT host:443--> internet
\____________________________
opaque TLS bytes
```
What pipelock cannot scan in this mode is documented in
`docs/research/tls-mitm-for-pipelock.md` §What pipelock cannot see
today: request URLs and methods, request and response headers,
request and response bodies, MCP JSON-RPC payloads, inner-vs-outer
hostname (the domain-fronting check), and WebSocket frames inside a
TLS-wrapped upgrade. The 48-pattern DLP layer this project relies on
in PRD 0001 is therefore inert against every host in the current
`DEFAULT_ALLOWLIST` — all of which are HTTPS-only.
The integration test added in `tests/integration/test_pipelock_blocks_secret_post.py`
demonstrates the gap concretely: pipelock's body-scan layer only
fires when the agent is forced to send plain HTTP. Real Claude Code
traffic to `api.anthropic.com` goes over CONNECT-tunneled TLS and
slips past the scanner.
`pipelock-assessment.md` §Scope gaps names this as a known
limitation of the proxy-without-TLS-inspection shape. Closing it is
the explicit motivation for `tls-mitm-for-pipelock.md`, whose
recommendation this PRD implements (with the addon adjustment
forced by the upstream-mode spike).
## Goals / Success Criteria
The feature works when all of the following are observable:
- A Node request from inside a launched bottle to a CONNECT-bumped
HTTPS host (e.g. `https://api.anthropic.com/dlp-probe`) carrying a
pipelock-recognized credential pattern in the body returns 403
from the bottle's egress chain — not a response from the upstream.
The existing `test_pipelock_blocks_secret_post` test path becomes
the HTTPS variant of this assertion.
- A plain HTTPS GET from inside the bottle to an allowlisted host
with no credential pattern (e.g. `GET https://raw.githubusercontent.com/...`)
returns the real upstream response — the addon doesn't break
clean traffic.
- Claude Code itself reaches `api.anthropic.com` end-to-end through
the bottle and completes a chat round-trip. No TLS-trust errors
in the agent process.
- mitmproxy's flow log and pipelock's `body_dlp` / `header_dlp` /
`core_dlp` event lines both appear for the same outbound request,
confirming the two-stage path is active.
The feature is **done** when all of the following ship:
- A new `MitmproxyProxy` class with the same `prepare` / `start` /
`stop` lifecycle shape as `PipelockProxy`, wired into the Docker
backend's launch step.
- A vendored Python addon at `claude_bottle/mitmproxy/addon.py`
that mitmproxy loads on startup via `mitmdump -s ...`. The sidecar
runs in `regular` mode (default), not `upstream` mode.
- The bottle launch step starts the mitmproxy sidecar, waits for
the sidecar-internal CA to be generated, copies the CA public
cert into the agent at `/usr/local/share/ca-certificates/claude-bottle-mitm.crt`,
runs `update-ca-certificates` inside the agent, and threads the
`NODE_EXTRA_CA_CERTS` / `SSL_CERT_FILE` / `REQUESTS_CA_BUNDLE`
env trio onto the agent container's runtime env.
- The agent's `HTTPS_PROXY` / `HTTP_PROXY` point at the mitmproxy
sidecar (where they pointed at pipelock under PRD 0001).
- pipelock is otherwise unchanged. It continues to load the YAML
PRD 0001 generates and runs its existing scanning pipeline; the
addon talks to it via the same forward-proxy interface today's
`test_pipelock_blocks_secret_post` uses.
- On bottle teardown the mitmproxy sidecar is removed and the
ephemeral CA private key is gone with it.
- An HTTPS variant of `test_pipelock_blocks_secret_post` proves
pipelock now blocks a credential POST over HTTPS rather than
plain HTTP.
- An integration test proves a non-credential HTTPS GET through
the chain returns the upstream's real response.
- The dry-run preflight (`start --dry-run`) shows the mitmproxy
sidecar in both text and `--format=json` output. The JSON
contract gains a reserved `egress.mitm: { "enabled": true, "ca_fingerprint": null }`
block; fingerprint is always null at dry-run because the CA
doesn't exist yet. Real launches emit a one-line stderr log:
`claude-bottle: mitm ca fingerprint: <sha256-first-16>...`.
## Non-goals
- **Topology C** — extending pipelock itself to terminate TLS. The
research note's recommended long-term shape, but substantial Go
work plus the Apache-2.0-vs-ELv2 question. Deferred.
- **Topology D as canonical** — mitmproxy with a pipelock `/scan`
HTTP endpoint. The addon in this PRD talks to pipelock via its
existing forward-proxy interface; no upstream pipelock change
needed.
- **Persistent or shared CA across bottles.** Each bottle gets a
fresh CA generated by its own mitmproxy at startup.
- **Selective bumping ("ignore_hosts") as a v1 manifest field.**
v1 bumps every CONNECT. If a future allowlisted host turns out
to pin (Mobile / Chromium-style cert pinning), a follow-up PRD
adds the per-host opt-out via `bottle.egress.tls_bump_ignore`.
Strictly additive.
- **HTTP/3 / QUIC.** mitmproxy's HTTP/3 support is experimental.
v1 relies on the v1-egress iptables layer blocking UDP/443 to
force clients onto HTTP/2 over TCP, which mitmproxy 12 inspects
natively (verified by spike).
- **Raw TCP / non-HTTP TLS interception.** mitmproxy supports it
via `--mode reverse:`, not in CONNECT-bump mode. SSH and any
future raw-TCP egress route around mitmproxy entirely.
- **Trust-store rewiring for non-Debian agent images.** The
current `Dockerfile` is `node:22-slim` (Debian). If a future base
switches to Red-Hat-family, the `update-ca-certificates` step
becomes `update-ca-trust`. Out of scope until the base changes.
- **Response-body scanning.** Pipelock supports it; we don't wire
it in v1 because the addon would need to ferry the upstream
response back through pipelock's scanner, which the forward-
proxy interface doesn't support cleanly. v2 candidate.
- **MCP scanning on the bumped path.** Only fires on MCP-formatted
JSON-RPC payloads inside tool calls. Not relevant to plain HTTPS
agent traffic and out of v1 scope.
- **Domain-fronting verification.** Once the addon sees the inner
`Host` / `:authority`, comparing it to the outer CONNECT target
catches domain fronting. Worth ~10 lines in the addon, but
defer until the rest of v1 is settled.
- **Host-side openssl / `cryptography` for CA generation.** The
research note's open question on this is resolved by letting
mitmproxy itself generate the CA (it does so on first launch).
No new host-side crypto.
## Scope
### In scope
- New `claude_bottle/mitmproxy/` package:
- `__init__.py` — backend-agnostic. Constants (sidecar port,
image-pin digest, the in-container addon path), the abstract
`MitmproxyProxy` class with `prepare` / `start` / `stop` shape
mirroring `PipelockProxy`, and the small helper that reads the
CA fingerprint from a PEM file via `openssl x509 -fingerprint`
shelled out.
- `addon.py` — the Python addon mitmproxy loads. ~80150 lines.
For each `request` event: forward the decrypted request to
pipelock at `http://claude-bottle-pipelock-<slug>:8888` as a
plain HTTP forward-proxy call (absolute-URI form). Inspect
pipelock's response. If status is 403 *and* the body matches
pipelock's known block-event shape, set the flow's response to
a 403 with pipelock's body and short-circuit. Otherwise,
discard pipelock's response (and any wasted upstream-leg
response from pipelock's forwarder) and let mitmproxy proceed
to the real upstream.
- New `claude_bottle/backend/docker/mitmproxy.py`
`DockerMitmproxyProxy(MitmproxyProxy)` with the Docker-specific
start/stop lifecycle. `start(plan)` does `docker create` /
`docker cp addon.py …` / `docker network connect` / `docker start`,
analogous to the existing `DockerPipelockProxy.start`. Injects
`CLAUDE_BOTTLE_PIPELOCK_URL` into the sidecar env so the addon
knows where pipelock lives.
- New provisioner `claude_bottle/backend/docker/provision/ca.py`.
Polls mitmproxy for the cert file, copies it through a host
stage dir into the agent, runs `update-ca-certificates` inside
the agent, computes the SHA-256 fingerprint, and prints the
one-line stderr log.
- `BottleBackend.provision_ca(plan, target)` joins the four
existing provisioner methods on the abstract base. Default impl
is no-op so other backends don't break when they don't yet
implement TLS interception.
- `DockerBottlePlan` grows a `mitmproxy_plan` field mirroring the
existing `proxy_plan`.
- Agent container `docker run` invocation:
- `HTTPS_PROXY` / `HTTP_PROXY` change from the pipelock service
name to the mitmproxy service name.
- Three `-e` flags set the CA env trio so they're inherited by
the eventual `docker exec claude` (Docker propagates run-time
env into exec by default; fallback in Q1 below).
- Dry-run preflight rendering of the mitmproxy entry (text + JSON).
JSON gains `egress.mitm: { "enabled": true, "ca_fingerprint": null }`.
- One stderr log line at launch with the CA fingerprint.
- Two new integration tests under `tests/integration/`:
- `test_mitmproxy_blocks_secret_https_post.py` — HTTPS variant
of the existing block-secret test. Asserts pipelock's body
DLP fires on a credential POST tunneled through CONNECT.
- `test_mitmproxy_allows_normal_https.py` — confirms a plain
HTTPS GET on an allowlisted host returns the upstream response,
isolating the addon's pass-through path from the block path.
- Unit tests for the addon's verdict logic (block vs allow on
status + body shape, edge cases) using mitmproxy's `mitmproxy.test`
flow fixtures. Unit tests for the proxy config builder
(mirroring `tests/unit/test_pipelock_yaml.py`).
### Out of scope
- The v1 iptables + dnsmasq layer (separate PRD; see
`network-egress-guard.md`). mitmproxy covers HTTP/HTTPS only;
raw TCP, UDP, ICMP, and direct DNS still need the IP-level layer.
- Pipelock config changes. Pipelock continues to load the YAML
PRD 0001 generates; the addon talks to it via the existing
forward-proxy interface.
- A bottle-level toggle to skip mitmproxy entirely. v1 always
wires it in.
- Pinning-host detection automation. The cost of finding out (per
research) is a single 5-minute test before adding a host; it
stays a manual step.
- Pipelock upstream contributions for an `X-Pipelock-Verdict` header.
Possible follow-up. Until then the addon distinguishes blocks
from passes via status + body fingerprint.
## Proposed Design
### Topology
```
agent --HTTPS_PROXY--> mitmproxy --addon--> pipelock (scan)
(bump TLS) |
^ | (verdict via status code)
| v
+-- on allow ----- real upstream
(mitmproxy as client)
```
All three containers live on the same per-bottle internal Docker
network. mitmproxy and pipelock are both attached to the per-bottle
egress bridge for real-internet reach; the agent has no default
route.
Concretely:
- Agent sets `HTTPS_PROXY=http://claude-bottle-mitm-<slug>:<port>`.
PRD 0001 had this pointing at pipelock; the hostname swap is the
only agent-side env change.
- mitmproxy runs in **`regular`** mode (default; no `--mode` flag).
It bumps every CONNECT, generates fake leaf certs signed by its
own CA, and presents them to the agent.
- The addon, loaded via `mitmdump -s /addon/addon.py`, intercepts
each decrypted `request` event. It forwards the request to
pipelock at `http://claude-bottle-pipelock-<slug>:8888` as a
plain HTTP forward-proxy call (absolute-URI form), so pipelock
sees the full URL, headers, and body.
- The addon inspects pipelock's response. If status is 403 *and*
the response body matches pipelock's known block-event shape,
the addon sets the mitmproxy flow's response to a 403 with
pipelock's body and short-circuits. Otherwise — including the
case where pipelock's forwarder attempted the upstream and got
a 4xx — the addon discards pipelock's response and lets
mitmproxy proceed to the real upstream.
- mitmproxy completes the outbound TLS to the real destination
using its built-in trust store, just like any other forward
proxy. Pipelock is only involved as a scanner.
The trade-off: pipelock makes a wasted upstream forward attempt
for every allowed request (it tries to forward over plain HTTP to
a real HTTPS-only host, which fails with the upstream's 4xx). This
is benign — the scan completes before forwarding, the verdict
reaches the addon, the upstream-side request happens to die in
pipelock's forwarder rather than reach the agent. Acceptable cost
for the visibility win. A pipelock-side improvement (skip the
forward when the addon only needs the scan verdict) is a future
optimization.
### New components
- `claude_bottle/mitmproxy/__init__.py` — backend-agnostic
abstract base, constants, the `openssl x509 -fingerprint` helper.
- `claude_bottle/mitmproxy/addon.py` — the scanning addon.
Reads pipelock's URL from `CLAUDE_BOTTLE_PIPELOCK_URL` (injected
into the sidecar env by the proxy's `start`). For each
`request` flow: synchronously POST to pipelock; inspect status
+ body; either short-circuit with 403 or fall through.
- `claude_bottle/backend/docker/mitmproxy.py`
`DockerMitmproxyProxy(MitmproxyProxy)` with start/stop, the
`docker cp` of the addon into the sidecar before `docker start`,
and the `CLAUDE_BOTTLE_PIPELOCK_URL` wiring.
### CA lifecycle
Simplified by letting mitmproxy own the generation:
- **Generation.** mitmproxy generates a fresh CA on startup
inside its container at `/home/mitmproxy/.mitmproxy/mitmproxy-ca-cert.pem`
(public) + `mitmproxy-ca.pem` (private). No host-side openssl
for *generation*; no host-side Python `cryptography` dep.
- **Volume strategy.** Container-internal only. No host bind
mount means the CA dies with the container.
- **Extraction.** `provision_ca` polls (~1s) for the cert file
via `docker exec`, then `docker cp` to host stage dir, then
`docker cp` into the agent. Host stage dir gets cleaned up by
the existing `start.py` `finally` block.
- **Bottle install.**
1. `docker cp <host stage>/mitm-ca.crt agent-<slug>:/usr/local/share/ca-certificates/claude-bottle-mitm.crt`
2. `docker exec -u 0 agent-<slug> chmod 644 …`
3. `docker exec -u 0 agent-<slug> update-ca-certificates`
4. Three `-e` flags on `docker run` set the env trio
(`NODE_EXTRA_CA_CERTS=…/claude-bottle-mitm.crt`,
`SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt`,
`REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt`) so
`docker exec claude` inherits them.
- **Teardown.** Sidecar container removed; CA private key gone.
- **Fingerprint.** Computed post-extraction via shelled-out
`openssl x509 -fingerprint -sha256 -noout`. Logged once to
stderr at launch; never the private key.
### Data model changes
None to the manifest schema. The dry-run JSON contract gains a
reserved `egress.mitm: { "enabled": true, "ca_fingerprint": null }`
block. Fingerprint is always null at dry-run (CA doesn't exist
yet) but the field is reserved so future schema additions stay
non-breaking.
A future selective-bump knob would add
`bottle.egress.tls_bump_ignore: [host, ...]` per the research
note. Strictly additive when it lands.
### Existing code touched
- **`claude_bottle/backend/docker/launch.py`** — bring up the
mitmproxy sidecar between pipelock and the agent. Repoint the
agent's `HTTPS_PROXY` / `HTTP_PROXY` env flags to mitmproxy.
Register an `ExitStack` callback for mitmproxy teardown. Print
the CA fingerprint once the sidecar reports ready.
- **`claude_bottle/backend/docker/prepare.py`** — call into
`MitmproxyProxy.prepare(...)` alongside `PipelockProxy.prepare(...)`,
populate `DockerBottlePlan.mitmproxy_plan`.
- **`claude_bottle/backend/docker/backend.py`** — add the
`DockerMitmproxyProxy` instance attribute (`self._mitm`) and
thread it through `launch` + cleanup, mirroring `self._proxy`.
- **`claude_bottle/backend/docker/bottle_plan.py`** — new
`mitmproxy_plan` field. `print()` and `to_dict()` learn to
render the mitmproxy entry and the `egress.mitm` JSON block.
- **`claude_bottle/backend/__init__.py`** — abstract
`BottleBackend.provision_ca` joins the four existing
provisioners; default no-op.
- **`tests/integration/`** — two new tests as described above.
- **`tests/unit/`** — addon-verdict tests, mitmproxy-config
builder tests, dry-run-plan test updated for the new
`egress.mitm` block.
### External dependencies
- **mitmproxy Docker image** pinned by digest on the `12.x` line.
Bumped deliberately, mirroring the pipelock pin. Verified by
spike to speak h2 on both halves.
- No new host-side runtimes. mitmproxy generates the CA;
fingerprint via the `openssl` already present on Debian / macOS
/ ubuntu-latest runners.
## Open questions
(rewritten — most of the original v1 questions are now closed by
the walkthrough spikes; what remains is addon-implementation
specifics worth pinning during the first impl turn.)
- **Pipelock's 403-body fingerprint.** The addon needs to
distinguish a pipelock block (DLP / host) from a real-upstream
4xx that pipelock's forwarder relayed back. Most likely shape:
pipelock's 403 response carries a JSON body with `event` /
`scanner` fields, whereas a real-upstream 4xx carries whatever
the upstream sent. Pin the exact fingerprint by inspecting
pipelock's actual 403 body bytes at impl time. Long-term
cleanup: file an upstream feature request for an
`X-Pipelock-Verdict: block` response header so the addon can
read a structured signal instead of pattern-matching the body.
- **Docker run env-var inheritance through docker exec.** Plan
assumes `docker run -e VAR=value` propagates to subsequent
`docker exec` invocations. The Docker docs say so; not yet
empirically pinned on this project's runner setup. Verify in
the first impl turn. Trivial fallback: thread the three `-e`
flags onto every `DockerBottle.exec*` call.
- **Addon synchronous-call latency.** The addon makes a sync HTTP
call to pipelock per outbound flow. Pipelock is on the same
internal Docker network; expected per-call latency is well
under 10ms. Confirm under the parallel-request load Claude Code
generates (most likely a non-issue — Claude is single-stream
request-wise).
- **Addon test fixtures.** mitmproxy ships `mitmproxy.test` with
flow fixtures; addons can be unit-tested without a running
proxy. Confirm the import path and recommended fixture shape at
impl time; structure the addon so the verdict-decision is a
pure function that's trivially testable in isolation from any
HTTP I/O.
- **Pipelock allowing the addon's forwarded request through.**
pipelock will see the addon's request as coming from the
mitmproxy sidecar's IP on the internal network. Confirm
pipelock has no client-IP allowlist that would reject these.
Likely fine — pipelock's `client_ip` is informational in the
scan event, not a gate.
## References
- `docs/research/tls-mitm-for-pipelock.md` — primary source. This
PRD implements a variant of §Recommendation (Topology A) after
the spike documented under "Open questions" §1 falsified the
`upstream` mode assumption.
- `docs/research/pipelock-assessment.md` §Scope gaps — names the
TLS-inspection gap closed here.
- `docs/prds/0001-per-agent-egress-proxy-via-pipelock.md`
egress-proxy baseline this PRD extends.
- `docs/prds/0003-bottle-backend-abstraction.md` — backend ABC
contract this PRD adds a `provision_ca` method to.
- `docs/prds/0004-split-out-provisioners.md` — per-provisioner
module pattern reused for the new CA provisioner.
- mitmproxy: <https://mitmproxy.org>,
<https://github.com/mitmproxy/mitmproxy>
- mitmproxy modes: <https://docs.mitmproxy.org/stable/concepts/modes/>
- mitmproxy CA cert installation:
<https://docs.mitmproxy.org/stable/concepts/certificates/>
- mitmproxy addon API: <https://docs.mitmproxy.org/stable/addons-overview/>
- Node `NODE_EXTRA_CA_CERTS`:
<https://nodejs.org/api/cli.html#node_extra_ca_certsfile>