bot-bottle/docs/prds/0005-mitmproxy-tls-interception.md

# PRD 0005: mitmproxy TLS interception for pipelock content scanning

- **Status:** Draft (updated 2026-05-12 after open-question walkthrough)
- **Author:** didericis
- **Created:** 2026-05-12

## Summary

Add a per-bottle **mitmproxy** sidecar in front of pipelock on the
egress path. mitmproxy bumps the agent's TLS CONNECT, decrypts the
inner HTTP, and hands each request to a vendored Python addon. The
addon forwards the decrypted request to pipelock as a plain HTTP
forward-proxy call so pipelock's DLP, URL-scan, and header-scan
layers fire on real bodies. On the verdict, the addon either
short-circuits the flow with a 403 (block) or lets mitmproxy
proceed to the real upstream (allow). mitmproxy itself generates
the ephemeral per-bottle CA on startup; the public cert is copied
into the agent's trust store and the private key dies with the
sidecar on teardown.

This is Topology A' from `docs/research/tls-mitm-for-pipelock.md` —
a variant of the research note's Topology A after a spike showed
mitmproxy's `upstream` mode re-wraps decrypted flows in a new
CONNECT to the upstream proxy (which would defeat the entire
point). The addon recovers the design by emitting plain HTTP to
pipelock explicitly instead of relying on mitmproxy's `upstream`
chaining.

## Problem

PRD 0001 wired pipelock onto every bottle's egress, but the current
topology only sees `CONNECT` hostnames and opaque TLS bytes:

```
agent --HTTPS_PROXY--> pipelock --CONNECT host:443--> internet
                                  \____________________________
                                       opaque TLS bytes
```

What pipelock cannot scan in this mode is documented in
`docs/research/tls-mitm-for-pipelock.md` §What pipelock cannot see
today: request URLs and methods, request and response headers,
request and response bodies, MCP JSON-RPC payloads, inner-vs-outer
hostname (the domain-fronting check), and WebSocket frames inside a
TLS-wrapped upgrade. The 48-pattern DLP layer this project relies on
in PRD 0001 is therefore inert against every host in the current
`DEFAULT_ALLOWLIST` — all of which are HTTPS-only.

The integration test added in `tests/integration/test_pipelock_blocks_secret_post.py`
demonstrates the gap concretely: pipelock's body-scan layer only
fires when the agent is forced to send plain HTTP. Real Claude Code
traffic to `api.anthropic.com` goes over CONNECT-tunneled TLS and
slips past the scanner.

`pipelock-assessment.md` §Scope gaps names this as a known
limitation of the proxy-without-TLS-inspection shape. Closing it is
the explicit motivation for `tls-mitm-for-pipelock.md`, whose
recommendation this PRD implements (with the addon adjustment
forced by the upstream-mode spike).

## Goals / Success Criteria

The feature works when all of the following are observable:

- A Node request from inside a launched bottle to a CONNECT-bumped
  HTTPS host (e.g. `https://api.anthropic.com/dlp-probe`) carrying a
  pipelock-recognized credential pattern in the body returns 403
  from the bottle's egress chain — not a response from the upstream.
  The existing `test_pipelock_blocks_secret_post` test path becomes
  the HTTPS variant of this assertion.
- A plain HTTPS GET from inside the bottle to an allowlisted host
  with no credential pattern (e.g. `GET https://raw.githubusercontent.com/...`)
  returns the real upstream response — the addon doesn't break
  clean traffic.
- Claude Code itself reaches `api.anthropic.com` end-to-end through
  the bottle and completes a chat round-trip. No TLS-trust errors
  in the agent process.
- mitmproxy's flow log and pipelock's `body_dlp` / `header_dlp` /
  `core_dlp` event lines both appear for the same outbound request,
  confirming the two-stage path is active.

The feature is **done** when all of the following ship:

- A new `MitmproxyProxy` class with the same `prepare` / `start` /
  `stop` lifecycle shape as `PipelockProxy`, wired into the Docker
  backend's launch step.
- A vendored Python addon at `claude_bottle/mitmproxy/addon.py`
  that mitmproxy loads on startup via `mitmdump -s ...`. The sidecar
  runs in `regular` mode (default), not `upstream` mode.
- The bottle launch step starts the mitmproxy sidecar, waits for
  the sidecar-internal CA to be generated, copies the CA public
  cert into the agent at `/usr/local/share/ca-certificates/claude-bottle-mitm.crt`,
  runs `update-ca-certificates` inside the agent, and threads the
  `NODE_EXTRA_CA_CERTS` / `SSL_CERT_FILE` / `REQUESTS_CA_BUNDLE`
  env trio onto the agent container's runtime env.
- The agent's `HTTPS_PROXY` / `HTTP_PROXY` point at the mitmproxy
  sidecar (where they pointed at pipelock under PRD 0001).
- pipelock is otherwise unchanged. It continues to load the YAML
  PRD 0001 generates and runs its existing scanning pipeline; the
  addon talks to it via the same forward-proxy interface today's
  `test_pipelock_blocks_secret_post` uses.
- On bottle teardown the mitmproxy sidecar is removed and the
  ephemeral CA private key is gone with it.
- An HTTPS variant of `test_pipelock_blocks_secret_post` proves
  pipelock now blocks a credential POST over HTTPS rather than
  plain HTTP.
- An integration test proves a non-credential HTTPS GET through
  the chain returns the upstream's real response.
- The dry-run preflight (`start --dry-run`) shows the mitmproxy
  sidecar in both text and `--format=json` output. The JSON
  contract gains a reserved `egress.mitm: { "enabled": true, "ca_fingerprint": null }`
  block; fingerprint is always null at dry-run because the CA
  doesn't exist yet. Real launches emit a one-line stderr log:
  `claude-bottle: mitm ca fingerprint: <sha256-first-16>...`.

## Non-goals

- **Topology C** — extending pipelock itself to terminate TLS. The
  research note's recommended long-term shape, but substantial Go
  work plus the Apache-2.0-vs-ELv2 question. Deferred.
- **Topology D as canonical** — mitmproxy with a pipelock `/scan`
  HTTP endpoint. The addon in this PRD talks to pipelock via its
  existing forward-proxy interface; no upstream pipelock change
  needed.
- **Persistent or shared CA across bottles.** Each bottle gets a
  fresh CA generated by its own mitmproxy at startup.
- **Selective bumping ("ignore_hosts") as a v1 manifest field.**
  v1 bumps every CONNECT. If a future allowlisted host turns out
  to pin (Mobile / Chromium-style cert pinning), a follow-up PRD
  adds the per-host opt-out via `bottle.egress.tls_bump_ignore`.
  Strictly additive.
- **HTTP/3 / QUIC.** mitmproxy's HTTP/3 support is experimental.
  v1 relies on the v1-egress iptables layer blocking UDP/443 to
  force clients onto HTTP/2 over TCP, which mitmproxy 12 inspects
  natively (verified by spike).
- **Raw TCP / non-HTTP TLS interception.** mitmproxy supports it
  via `--mode reverse:`, not in CONNECT-bump mode. SSH and any
  future raw-TCP egress route around mitmproxy entirely.
- **Trust-store rewiring for non-Debian agent images.** The
  current `Dockerfile` is `node:22-slim` (Debian). If a future base
  switches to Red-Hat-family, the `update-ca-certificates` step
  becomes `update-ca-trust`. Out of scope until the base changes.
- **Response-body scanning.** Pipelock supports it; we don't wire
  it in v1 because the addon would need to ferry the upstream
  response back through pipelock's scanner, which the forward-
  proxy interface doesn't support cleanly. v2 candidate.
- **MCP scanning on the bumped path.** Only fires on MCP-formatted
  JSON-RPC payloads inside tool calls. Not relevant to plain HTTPS
  agent traffic and out of v1 scope.
- **Domain-fronting verification.** Once the addon sees the inner
  `Host` / `:authority`, comparing it to the outer CONNECT target
  catches domain fronting. Worth ~10 lines in the addon, but
  defer until the rest of v1 is settled.
- **Host-side openssl / `cryptography` for CA generation.** The
  research note's open question on this is resolved by letting
  mitmproxy itself generate the CA (it does so on first launch).
  No new host-side crypto.

## Scope

### In scope

- New `claude_bottle/mitmproxy/` package:
  - `__init__.py` — backend-agnostic. Constants (sidecar port,
    image-pin digest, the in-container addon path), the abstract
    `MitmproxyProxy` class with `prepare` / `start` / `stop` shape
    mirroring `PipelockProxy`, and the small helper that reads the
    CA fingerprint from a PEM file via `openssl x509 -fingerprint`
    shelled out.
  - `addon.py` — the Python addon mitmproxy loads. ~80–150 lines.
    For each `request` event: forward the decrypted request to
    pipelock at `http://claude-bottle-pipelock-<slug>:8888` as a
    plain HTTP forward-proxy call (absolute-URI form). Inspect
    pipelock's response. If status is 403 *and* the body matches
    pipelock's known block-event shape, set the flow's response to
    a 403 with pipelock's body and short-circuit. Otherwise,
    discard pipelock's response (and any wasted upstream-leg
    response from pipelock's forwarder) and let mitmproxy proceed
    to the real upstream.
- New `claude_bottle/backend/docker/mitmproxy.py` —
  `DockerMitmproxyProxy(MitmproxyProxy)` with the Docker-specific
  start/stop lifecycle. `start(plan)` does `docker create` /
  `docker cp addon.py …` / `docker network connect` / `docker start`,
  analogous to the existing `DockerPipelockProxy.start`. Injects
  `CLAUDE_BOTTLE_PIPELOCK_URL` into the sidecar env so the addon
  knows where pipelock lives.
- New provisioner `claude_bottle/backend/docker/provision/ca.py`.
  Polls mitmproxy for the cert file, copies it through a host
  stage dir into the agent, runs `update-ca-certificates` inside
  the agent, computes the SHA-256 fingerprint, and prints the
  one-line stderr log.
- `BottleBackend.provision_ca(plan, target)` joins the four
  existing provisioner methods on the abstract base. Default impl
  is no-op so other backends don't break when they don't yet
  implement TLS interception.
- `DockerBottlePlan` grows a `mitmproxy_plan` field mirroring the
  existing `proxy_plan`.
- Agent container `docker run` invocation:
  - `HTTPS_PROXY` / `HTTP_PROXY` change from the pipelock service
    name to the mitmproxy service name.
  - Three `-e` flags set the CA env trio so they're inherited by
    the eventual `docker exec claude` (Docker propagates run-time
    env into exec by default; fallback in Q1 below).
- Dry-run preflight rendering of the mitmproxy entry (text + JSON).
  JSON gains `egress.mitm: { "enabled": true, "ca_fingerprint": null }`.
- One stderr log line at launch with the CA fingerprint.
- Two new integration tests under `tests/integration/`:
  - `test_mitmproxy_blocks_secret_https_post.py` — HTTPS variant
    of the existing block-secret test. Asserts pipelock's body
    DLP fires on a credential POST tunneled through CONNECT.
  - `test_mitmproxy_allows_normal_https.py` — confirms a plain
    HTTPS GET on an allowlisted host returns the upstream response,
    isolating the addon's pass-through path from the block path.
- Unit tests for the addon's verdict logic (block vs allow on
  status + body shape, edge cases) using mitmproxy's `mitmproxy.test`
  flow fixtures. Unit tests for the proxy config builder
  (mirroring `tests/unit/test_pipelock_yaml.py`).

### Out of scope

- The v1 iptables + dnsmasq layer (separate PRD; see
  `network-egress-guard.md`). mitmproxy covers HTTP/HTTPS only;
  raw TCP, UDP, ICMP, and direct DNS still need the IP-level layer.
- Pipelock config changes. Pipelock continues to load the YAML
  PRD 0001 generates; the addon talks to it via the existing
  forward-proxy interface.
- A bottle-level toggle to skip mitmproxy entirely. v1 always
  wires it in.
- Pinning-host detection automation. The cost of finding out (per
  research) is a single 5-minute test before adding a host; it
  stays a manual step.
- Pipelock upstream contributions for an `X-Pipelock-Verdict` header.
  Possible follow-up. Until then the addon distinguishes blocks
  from passes via status + body fingerprint.

## Proposed Design

### Topology

```
agent --HTTPS_PROXY--> mitmproxy --addon--> pipelock     (scan)
                       (bump TLS)              |
                          ^                    | (verdict via status code)
                          |                    v
                          +-- on allow ----- real upstream
                                              (mitmproxy as client)
```

All three containers live on the same per-bottle internal Docker
network. mitmproxy and pipelock are both attached to the per-bottle
egress bridge for real-internet reach; the agent has no default
route.

Concretely:

- Agent sets `HTTPS_PROXY=http://claude-bottle-mitm-<slug>:<port>`.
  PRD 0001 had this pointing at pipelock; the hostname swap is the
  only agent-side env change.
- mitmproxy runs in **`regular`** mode (default; no `--mode` flag).
  It bumps every CONNECT, generates fake leaf certs signed by its
  own CA, and presents them to the agent.
- The addon, loaded via `mitmdump -s /addon/addon.py`, intercepts
  each decrypted `request` event. It forwards the request to
  pipelock at `http://claude-bottle-pipelock-<slug>:8888` as a
  plain HTTP forward-proxy call (absolute-URI form), so pipelock
  sees the full URL, headers, and body.
- The addon inspects pipelock's response. If status is 403 *and*
  the response body matches pipelock's known block-event shape,
  the addon sets the mitmproxy flow's response to a 403 with
  pipelock's body and short-circuits. Otherwise — including the
  case where pipelock's forwarder attempted the upstream and got
  a 4xx — the addon discards pipelock's response and lets
  mitmproxy proceed to the real upstream.
- mitmproxy completes the outbound TLS to the real destination
  using its built-in trust store, just like any other forward
  proxy. Pipelock is only involved as a scanner.

The trade-off: pipelock makes a wasted upstream forward attempt
for every allowed request (it tries to forward over plain HTTP to
a real HTTPS-only host, which fails with the upstream's 4xx). This
is benign — the scan completes before forwarding, the verdict
reaches the addon, the upstream-side request happens to die in
pipelock's forwarder rather than reach the agent. Acceptable cost
for the visibility win. A pipelock-side improvement (skip the
forward when the addon only needs the scan verdict) is a future
optimization.

### New components

- `claude_bottle/mitmproxy/__init__.py` — backend-agnostic
  abstract base, constants, the `openssl x509 -fingerprint` helper.
- `claude_bottle/mitmproxy/addon.py` — the scanning addon.
  Reads pipelock's URL from `CLAUDE_BOTTLE_PIPELOCK_URL` (injected
  into the sidecar env by the proxy's `start`). For each
  `request` flow: synchronously POST to pipelock; inspect status
  + body; either short-circuit with 403 or fall through.
- `claude_bottle/backend/docker/mitmproxy.py` —
  `DockerMitmproxyProxy(MitmproxyProxy)` with start/stop, the
  `docker cp` of the addon into the sidecar before `docker start`,
  and the `CLAUDE_BOTTLE_PIPELOCK_URL` wiring.

### CA lifecycle

Simplified by letting mitmproxy own the generation:

- **Generation.** mitmproxy generates a fresh CA on startup
  inside its container at `/home/mitmproxy/.mitmproxy/mitmproxy-ca-cert.pem`
  (public) + `mitmproxy-ca.pem` (private). No host-side openssl
  for *generation*; no host-side Python `cryptography` dep.
- **Volume strategy.** Container-internal only. No host bind
  mount means the CA dies with the container.
- **Extraction.** `provision_ca` polls (~1s) for the cert file
  via `docker exec`, then `docker cp` to host stage dir, then
  `docker cp` into the agent. Host stage dir gets cleaned up by
  the existing `start.py` `finally` block.
- **Bottle install.**
  1. `docker cp <host stage>/mitm-ca.crt agent-<slug>:/usr/local/share/ca-certificates/claude-bottle-mitm.crt`
  2. `docker exec -u 0 agent-<slug> chmod 644 …`
  3. `docker exec -u 0 agent-<slug> update-ca-certificates`
  4. Three `-e` flags on `docker run` set the env trio
     (`NODE_EXTRA_CA_CERTS=…/claude-bottle-mitm.crt`,
     `SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt`,
     `REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt`) so
     `docker exec claude` inherits them.
- **Teardown.** Sidecar container removed; CA private key gone.
- **Fingerprint.** Computed post-extraction via shelled-out
  `openssl x509 -fingerprint -sha256 -noout`. Logged once to
  stderr at launch; never the private key.

### Data model changes

None to the manifest schema. The dry-run JSON contract gains a
reserved `egress.mitm: { "enabled": true, "ca_fingerprint": null }`
block. Fingerprint is always null at dry-run (CA doesn't exist
yet) but the field is reserved so future schema additions stay
non-breaking.

A future selective-bump knob would add
`bottle.egress.tls_bump_ignore: [host, ...]` per the research
note. Strictly additive when it lands.

### Existing code touched

- **`claude_bottle/backend/docker/launch.py`** — bring up the
  mitmproxy sidecar between pipelock and the agent. Repoint the
  agent's `HTTPS_PROXY` / `HTTP_PROXY` env flags to mitmproxy.
  Register an `ExitStack` callback for mitmproxy teardown. Print
  the CA fingerprint once the sidecar reports ready.
- **`claude_bottle/backend/docker/prepare.py`** — call into
  `MitmproxyProxy.prepare(...)` alongside `PipelockProxy.prepare(...)`,
  populate `DockerBottlePlan.mitmproxy_plan`.
- **`claude_bottle/backend/docker/backend.py`** — add the
  `DockerMitmproxyProxy` instance attribute (`self._mitm`) and
  thread it through `launch` + cleanup, mirroring `self._proxy`.
- **`claude_bottle/backend/docker/bottle_plan.py`** — new
  `mitmproxy_plan` field. `print()` and `to_dict()` learn to
  render the mitmproxy entry and the `egress.mitm` JSON block.
- **`claude_bottle/backend/__init__.py`** — abstract
  `BottleBackend.provision_ca` joins the four existing
  provisioners; default no-op.
- **`tests/integration/`** — two new tests as described above.
- **`tests/unit/`** — addon-verdict tests, mitmproxy-config
  builder tests, dry-run-plan test updated for the new
  `egress.mitm` block.

### External dependencies

- **mitmproxy Docker image** pinned by digest on the `12.x` line.
  Bumped deliberately, mirroring the pipelock pin. Verified by
  spike to speak h2 on both halves.
- No new host-side runtimes. mitmproxy generates the CA;
  fingerprint via the `openssl` already present on Debian / macOS
  / ubuntu-latest runners.

## Open questions

(rewritten — most of the original v1 questions are now closed by
the walkthrough spikes; what remains is addon-implementation
specifics worth pinning during the first impl turn.)

- **Pipelock's 403-body fingerprint.** The addon needs to
  distinguish a pipelock block (DLP / host) from a real-upstream
  4xx that pipelock's forwarder relayed back. Most likely shape:
  pipelock's 403 response carries a JSON body with `event` /
  `scanner` fields, whereas a real-upstream 4xx carries whatever
  the upstream sent. Pin the exact fingerprint by inspecting
  pipelock's actual 403 body bytes at impl time. Long-term
  cleanup: file an upstream feature request for an
  `X-Pipelock-Verdict: block` response header so the addon can
  read a structured signal instead of pattern-matching the body.
- **Docker run env-var inheritance through docker exec.** Plan
  assumes `docker run -e VAR=value` propagates to subsequent
  `docker exec` invocations. The Docker docs say so; not yet
  empirically pinned on this project's runner setup. Verify in
  the first impl turn. Trivial fallback: thread the three `-e`
  flags onto every `DockerBottle.exec*` call.
- **Addon synchronous-call latency.** The addon makes a sync HTTP
  call to pipelock per outbound flow. Pipelock is on the same
  internal Docker network; expected per-call latency is well
  under 10ms. Confirm under the parallel-request load Claude Code
  generates (most likely a non-issue — Claude is single-stream
  request-wise).
- **Addon test fixtures.** mitmproxy ships `mitmproxy.test` with
  flow fixtures; addons can be unit-tested without a running
  proxy. Confirm the import path and recommended fixture shape at
  impl time; structure the addon so the verdict-decision is a
  pure function that's trivially testable in isolation from any
  HTTP I/O.
- **Pipelock allowing the addon's forwarded request through.**
  pipelock will see the addon's request as coming from the
  mitmproxy sidecar's IP on the internal network. Confirm
  pipelock has no client-IP allowlist that would reject these.
  Likely fine — pipelock's `client_ip` is informational in the
  scan event, not a gate.

## References

- `docs/research/tls-mitm-for-pipelock.md` — primary source. This
  PRD implements a variant of §Recommendation (Topology A) after
  the spike documented under "Open questions" §1 falsified the
  `upstream` mode assumption.
- `docs/research/pipelock-assessment.md` §Scope gaps — names the
  TLS-inspection gap closed here.
- `docs/prds/0001-per-agent-egress-proxy-via-pipelock.md` —
  egress-proxy baseline this PRD extends.
- `docs/prds/0003-bottle-backend-abstraction.md` — backend ABC
  contract this PRD adds a `provision_ca` method to.
- `docs/prds/0004-split-out-provisioners.md` — per-provisioner
  module pattern reused for the new CA provisioner.
- mitmproxy: <https://mitmproxy.org>,
  <https://github.com/mitmproxy/mitmproxy>
- mitmproxy modes: <https://docs.mitmproxy.org/stable/concepts/modes/>
- mitmproxy CA cert installation:
  <https://docs.mitmproxy.org/stable/concepts/certificates/>
- mitmproxy addon API: <https://docs.mitmproxy.org/stable/addons-overview/>
- Node `NODE_EXTRA_CA_CERTS`:
  <https://nodejs.org/api/cli.html#node_extra_ca_certsfile>