Files
bot-bottle/docs/prds/0005-mitmproxy-tls-interception.md
T
didericis fe71249005
test / unit (pull_request) Successful in 10s
test / integration (pull_request) Successful in 13s
docs(prd): add 0005 mitmproxy TLS interception
Captures the design for putting a mitmproxy sidecar in front of
pipelock on the egress path so pipelock's body / header / MCP
scanners see plaintext for the HTTPS hosts in the default allowlist.
Implements Topology A from docs/research/tls-mitm-for-pipelock.md
with a per-bottle ephemeral CA, no manifest schema change in v1,
and selective-bumping deferred until a pinning host appears.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-12 12:20:24 -04:00

18 KiB

PRD 0005: mitmproxy TLS interception for pipelock content scanning

  • Status: Draft
  • Author: didericis
  • Created: 2026-05-12

Summary

Add a per-bottle mitmproxy sidecar in front of pipelock on the egress path so pipelock's DLP, subdomain-entropy, and MCP scanners fire on the plaintext bodies of HTTPS requests instead of only the opaque ciphertext that follows a CONNECT. mitmproxy terminates the agent's TLS, hands plaintext HTTP to pipelock as an upstream forward proxy, and re-establishes TLS to the real destination. A fresh ephemeral CA is minted per bottle; the CA private key never leaves the sidecar, and the public cert is wired into the agent container's trust store at launch.

Problem

PRD 0001 wired pipelock onto every bottle's egress, but the current topology only sees CONNECT hostnames and opaque TLS bytes:

agent --HTTPS_PROXY--> pipelock --CONNECT host:443--> internet
                                  \____________________________
                                       opaque TLS bytes

What pipelock cannot scan in this mode is documented in docs/research/tls-mitm-for-pipelock.md §What pipelock cannot see today: request URLs and methods, request and response headers, request and response bodies, MCP JSON-RPC payloads, inner-vs-outer hostname (the domain-fronting check), and WebSocket frames inside a TLS-wrapped upgrade. The 48-pattern DLP layer this project relies on in PRD 0001 is therefore inert against every host in the current DEFAULT_ALLOWLIST — all of which are HTTPS-only.

The integration test added in tests/integration/test_pipelock_blocks_secret_post.py demonstrates the gap concretely: pipelock's body-scan layer only fires when the agent is forced to send plain HTTP. Real Claude Code traffic to api.anthropic.com goes over CONNECT-tunneled TLS and slips past the scanner.

pipelock-assessment.md §Scope gaps names this as a known limitation of the proxy-without-TLS-inspection shape. Closing it is the explicit motivation for tls-mitm-for-pipelock.md, whose recommendation this PRD implements.

Goals / Success Criteria

The feature works when all of the following are observable:

  • A Node request from inside a launched bottle to a CONNECT-bumped HTTPS host (e.g. https://api.anthropic.com/dlp-probe) carrying a pipelock-recognized credential pattern in the body returns 403 from the proxy, not a response from the upstream. The existing test_pipelock_blocks_secret_post test path becomes the HTTPS variant of this assertion.
  • Claude Code itself reaches api.anthropic.com end-to-end through the bottle and completes a chat round-trip. No TLS-trust errors in the agent process.
  • mitmproxy's TLS-handshake log lines and pipelock's body_dlp event lines both appear for the same outbound request, confirming the two-stage path is active.

The feature is done when all of the following ship:

  • A new MitmproxyProxy class with the same prepare / start / stop lifecycle shape as PipelockProxy, wired into the Docker backend's launch step.
  • The bottle launch step generates a per-bottle ephemeral CA in stage_dir, starts the mitmproxy sidecar with that CA on the per-bottle internal network, copies the CA public cert into the agent container's trust store, and points the agent's HTTPS_PROXY / HTTP_PROXY at mitmproxy.
  • mitmproxy's upstream is the existing pipelock sidecar; pipelock sees plaintext HTTP from mitmproxy for every previously-HTTPS request.
  • On bottle teardown the mitmproxy sidecar is removed and the ephemeral CA private key is gone with it.
  • An integration test (variant of test_pipelock_blocks_secret_post) proves pipelock now blocks a credential POST that goes out over HTTPS rather than plain HTTP.
  • An integration test proves a non-credential HTTPS request to an allowlisted host (e.g. CONNECT-then-GET on raw.githubusercontent.com) succeeds end-to-end with mitmproxy in the path (no TLS-trust errors, response body received).
  • The dry-run preflight (start --dry-run) shows the mitmproxy sidecar in both the text and --format=json output alongside the existing pipelock entry.

Non-goals

  • Topology C — extending pipelock itself to terminate TLS. That is the cleanest long-term shape per the research note's recommendation but is substantial Go work and hits the Apache-2.0-vs-ELv2 question. Deferred.
  • Topology D — driving mitmproxy with a pipelock /scan HTTP endpoint. Requires a pipelock surface that doesn't exist today. Deferred.
  • Persistent or shared CA across bottles. Each bottle gets a fresh CA generated at start and destroyed at teardown. No CA storage on the host, no cross-bottle reuse.
  • Selective bumping ("ignore_hosts") as a v1 manifest field. v1 bumps every CONNECT. If a future allowlisted host turns out to pin (Mobile / Chromium-style cert pinning), a follow-up PRD adds the per-host opt-out — likely a bottle.egress.tls_bump_ignore field. See Open questions.
  • HTTP/3 / QUIC. mitmproxy's HTTP/3 support is experimental. v1 relies on the v1-egress iptables layer (separate PRD) blocking UDP/443 to force clients onto HTTP/2 over TCP, which mitmproxy inspects normally.
  • Raw TCP / non-HTTP TLS interception. mitmproxy supports it via --mode reverse:, not in CONNECT-bump mode. SSH and any future raw-TCP egress route around mitmproxy entirely.
  • Trust-store rewiring for non-Debian agent base images. The current Dockerfile is node:22-slim (Debian). If a future base switches to Red-Hat-family, the update-ca-certificates step becomes update-ca-trust. Out of scope until the base changes.

Scope

In scope

  • New claude_bottle/mitmproxy.py mirroring claude_bottle/pipelock.py: config helpers (no backend-specific Docker calls), the MitmproxyProxy abstract class, and the per-bottle CA generation helpers.
  • New claude_bottle/backend/docker/mitmproxy.py mirroring claude_bottle/backend/docker/pipelock.py: DockerMitmproxyProxy with the Docker-specific start / stop lifecycle, the sidecar container name scheme, and the image pin.
  • New provisioner: claude_bottle/backend/docker/provision/ca.py, installing the CA public cert into the agent container at /usr/local/share/ca-certificates/claude-bottle-mitm.crt, running update-ca-certificates, and exporting NODE_EXTRA_CA_CERTS / SSL_CERT_FILE / REQUESTS_CA_BUNDLE env vars to the agent process. The provisioner runs from BottleBackend.provision in the same orchestration as prompt, skills, ssh, git.
  • Per-agent network reshuffle in DockerBottleBackend.launch:
    • internal network is unchanged (mitmproxy + pipelock + agent)
    • agent's HTTPS_PROXY / HTTP_PROXY change from pointing at the pipelock service name to the mitmproxy service name
    • mitmproxy's upstream_proxy config points at the pipelock service name on the internal network
  • DockerBottlePlan grows a mitmproxy_plan field analogous to the existing proxy_plan (the pipelock one) so prepare-time state rides on the plan.
  • Dry-run preflight (start --dry-run text + JSON) renders the mitmproxy line and surfaces the CA fingerprint shown in the bottle's trust store, so the operator can verify what's been installed.
  • Two new integration tests under tests/integration/:
    • test_mitmproxy_blocks_secret_https_post.py — the HTTPS variant of the existing _blocks_secret_post test.
    • test_mitmproxy_allows_normal_https.py — confirms a plain HTTPS GET to a non-credential-bearing path through mitmproxy + pipelock returns the upstream response, asserting no trust / handshake breakage.
  • Unit tests for the new config builder (mirroring the pipelock YAML unit tests) and for the CA generation helper.

Out of scope

  • The v1 iptables + dnsmasq layer (separate PRD; see network-egress-guard.md). mitmproxy covers HTTP/HTTPS only. Raw TCP, UDP, ICMP, and direct DNS still need the IP-level layer.
  • Pipelock config changes. Pipelock continues to load the YAML PRD 0001 already generates. mitmproxy is opaque to it; pipelock just sees plain HTTP from a forward-proxy client.
  • A bottle-level toggle to skip mitmproxy entirely. v1 always wires it in. If a use case appears for an unintercepted bottle (e.g. testing pipelock's CONNECT-mode behavior in isolation), that's a follow-up.
  • Pinning-host detection automation. The cost of finding out (per the research note) is a single 5-minute test before adding a host; it stays a manual step.

Proposed Design

Topology

agent --HTTPS_PROXY--> mitmproxy --HTTP_PROXY--> pipelock --> internet
                       (bump TLS)               (scan plain)  (real TLS)

All three containers live on the same per-bottle internal Docker network. mitmproxy and pipelock are both attached to the per-bottle egress bridge so they can reach the host network; the agent has no default route, exactly as today.

Concretely:

  • agent sets HTTPS_PROXY=http://claude-bottle-mitm-<slug>:<port>. Currently this points at claude-bottle-pipelock-<slug>. The hostname swap is the only agent-side env change.
  • mitmproxy runs with --mode upstream:http://claude-bottle-pipelock-<slug>:<pipelock-port> so its decrypted plaintext is forwarded to pipelock as a regular upstream forward-proxy request. (Research open question #1 calls this out: mitmproxy 10+ documentation says upstream mode forwards the original request shape; verify against the pinned version at implementation time. If forwarding wraps a new CONNECT, fall back to regular mode with a chained proxy declared in mitmproxy's config and route plain HTTP to pipelock by hand.)
  • pipelock continues to listen on its existing port and receives plain HTTP from mitmproxy. No pipelock config change.

New components

Two new modules, matching PRD 0001's split between backend-agnostic config and backend-specific lifecycle:

  • claude_bottle/mitmproxy.py — backend-agnostic. The config builder (mitmproxy YAML / TOML — confirm format), the abstract MitmproxyProxy class with prepare(...) writing the config and the ephemeral CA into stage_dir, the CA generation helper (RSA-2048 or ECDSA-P256 — pick at impl time, research suggests ECDSA for cert-gen speed), and constants for the sidecar's internal-network port and image pin.
  • claude_bottle/backend/docker/mitmproxy.py — Docker implementation. DockerMitmproxyProxy(MitmproxyProxy) with start(plan) doing docker create / docker cp / docker network connect / docker start analogous to DockerPipelockProxy.start. stop(target) removes the sidecar idempotently.

The provisioner that installs the CA cert into the agent's trust store lives at claude_bottle/backend/docker/provision/ca.py and plugs into the existing BottleBackend.provision orchestration. The abstract BottleBackend.provision_ca method joins provision_prompt / provision_skills / provision_ssh / provision_git on the base class (PRD 0004's pattern), with a default no-op implementation so other backends don't break when they don't yet implement it.

CA lifecycle

Per tls-mitm-for-pipelock.md §CA lifecycle:

  • Generation. Host-side in MitmproxyProxy.prepare, written to stage_dir/mitm-ca.key (mode 600) and stage_dir/mitm-ca.crt (mode 644). The .key is copied into the mitmproxy container at start; nothing else touches it.
  • Bottle injection. provision_ca copies only the public .crt into the agent container at /usr/local/share/ca-certificates/claude-bottle-mitm.crt, runs update-ca-certificates as root inside the container, and sets NODE_EXTRA_CA_CERTS=/usr/local/share/ca-certificates/claude-bottle-mitm.crt, SSL_CERT_FILE, and REQUESTS_CA_BUNDLE for the agent process. Belt-and-suspenders because some libraries honor only env vars.
  • Teardown. The mitmproxy sidecar container is destroyed; the CA key vanishes with it. Nothing persists on the host outside stage_dir, which the start command already deletes in its finally block.
  • Cost. ECDSA-P256 CA + per-host leaf generation runs in milliseconds; the per-bottle Docker pull and network plumbing dominate startup time.

Data model changes

None in v1. The manifest schema is unchanged. mitmproxy is always on for every bottle once this PRD ships.

A future selective-bump knob (per tls-mitm-for-pipelock.md open question #5) would land on bottle.egress.tls_bump_ignore as a list of hostnames. The shape mirrors egress.allowlist. Adding it later is a strictly additive change.

Existing code touched

  • claude_bottle/backend/docker/launch.py — bring up the mitmproxy sidecar after the pipelock sidecar but before the agent container, repoint the agent's HTTPS_PROXY / HTTP_PROXY env flags, register an ExitStack callback to stop mitmproxy on teardown.
  • claude_bottle/backend/docker/prepare.py — call into MitmproxyProxy.prepare(...) alongside the existing PipelockProxy.prepare(...), populate DockerBottlePlan.mitmproxy_plan.
  • claude_bottle/backend/docker/backend.py — add the DockerMitmproxyProxy instance attribute (self._mitm) and thread it through launch + cleanup, mirroring the existing self._proxy pattern.
  • claude_bottle/backend/docker/bottle_plan.py — new mitmproxy_plan: MitmproxyProxyPlan field on DockerBottlePlan. print() and to_dict() learn to render it.
  • claude_bottle/backend/__init__.py — abstract BottleBackend.provision_ca(plan, target) joins the other four provisioners. Default impl is a no-op (so a future fly backend isn't forced to implement TLS interception in v1).
  • tests/integration/ — two new tests as described above.
  • tests/unit/ — config-builder unit tests; CA-helper unit tests; updated dry-run-plan test pinning the mitmproxy entry.

External dependencies

  • mitmproxy Docker image pulled from mitmproxy/mitmproxy@sha256:<digest>. The digest is pinned in claude_bottle/mitmproxy.py and bumped deliberately, mirroring the pipelock pin. Tag line mitmproxy/mitmproxy:11.x per research §Image pin for mitmproxy.
  • No new host-side runtimes. CA generation uses Python's cryptography if it's already a transitive dep; otherwise use openssl shelled out from the host-side prepare step. Decide at impl time after confirming what's available on the runner without adding deps.

Open questions

  • mitmproxy upstream-proxy mode mechanics. Whether upstream mode forwards decrypted plaintext to pipelock or re-wraps it in a CONNECT. Documented behavior changed between mitmproxy 8 and 10. Needs verification against the pinned version at impl time. If upstream re-wraps, fall back to regular mode plus a chained proxy directive routing plain HTTP to pipelock.
  • Pipelock plain-HTTP scanning coverage. Pipelock's forward_proxy.enabled: true accepts both GET http://… and CONNECT host:443. Confirm by reading github.com/luckyPipewrench/pipelock/blob/main/docs/configuration.md that the full DLP / MCP / subdomain-entropy pipeline runs on the HTTP path; some pipelock layers may be gated on CONNECT only.
  • CA installation in the Anthropic-provided Claude Code image. The base image determines whether update-ca-certificates (Debian) or update-ca-trust (Red Hat) applies. Confirm against the Dockerfile before writing the provisioner; v1 assumes Debian (node:22-slim).
  • HTTP/2 ALPN end-to-end. Node's HTTP client negotiates h2 via ALPN. Confirm the pinned mitmproxy version speaks h2 to both halves without silently downgrading to http/1.1, which would be a noticeable performance regression on bulk transfers.
  • Selective-bump policy surface. Where does the "tunnel this hostname blindly" decision live when (not if) a pinning host appears? Recommended shape per research: bottle.egress.tls_bump_ignore: ["example.com"], a list of hostnames mitmproxy passes through via ignore_hosts. Defer until needed; record the shape so the follow-up is mechanical.
  • CA generation: Python cryptography vs. shelled-out openssl. Adding cryptography brings a substantial transitive graph; shelling to openssl keeps the host-side prepare step dep-light. Decide at impl time based on what's already on the runner. Either way, the CA is per-bottle and ephemeral.
  • Domain-fronting verification. Once pipelock sees the inner Host / :authority, comparing it to the outer CONNECT target catches domain fronting. Whether pipelock has a rule for this or we need to add one is a follow-up; out of scope here.
  • Dry-run preflight rendering of the CA. Show the fingerprint but never the private key. Confirm the exact dry-run JSON shape during implementation; the field set is part of the CLI's user- facing contract (per PRD 0003 §to_dict notes).

References