didericis/bot-bottle

Fork 0

Files

T

didericis c2eacac49f

test / unit (pull_request) Successful in 17s

Details

test / integration (pull_request) Successful in 15s

Details

docs(prd): update 0005 after open-question walkthrough

Re-grounds the design after walking the eight original open
questions interactively. Two structural changes:

- Topology A → A'. A spike confirmed mitmproxy's `upstream` mode
  re-wraps decrypted flows in a new CONNECT to the upstream proxy,
  which would have left pipelock seeing only ciphertext (the very
  gap this PRD set out to close). The fix is to run mitmproxy in
  `regular` mode and ship a vendored Python addon that forwards
  each decrypted request to pipelock as a plain HTTP forward-proxy
  call. Pipelock is unchanged.
- mitmproxy owns CA generation. The research note's preference
  for a host-side openssl / cryptography CA turned out to be
  unnecessary — mitmproxy generates a fresh CA on startup; the
  public cert is `docker cp`'d into the agent. No new host-side
  crypto deps. Dry-run can't render a fingerprint (CA doesn't
  exist yet); launches print it once to stderr.

Other Q3–Q8 resolutions folded in: Debian-base `update-ca-certificates`
confirmed, mitmproxy 12 verified to speak h2 on both halves,
selective-bump deferred to v2, response-body and MCP scanning
deferred to v2, domain-fronting deferred to v2.

Open questions rewritten — what remains is addon-implementation
specifics (pipelock 403-body fingerprint, env-var inheritance
through docker exec, addon test fixtures).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-12 12:54:27 -04:00

21 KiB

Raw Blame History

PRD 0005: mitmproxy TLS interception for pipelock content scanning

Status: Draft (updated 2026-05-12 after open-question walkthrough)
Author: didericis
Created: 2026-05-12

Summary

Add a per-bottle mitmproxy sidecar in front of pipelock on the egress path. mitmproxy bumps the agent's TLS CONNECT, decrypts the inner HTTP, and hands each request to a vendored Python addon. The addon forwards the decrypted request to pipelock as a plain HTTP forward-proxy call so pipelock's DLP, URL-scan, and header-scan layers fire on real bodies. On the verdict, the addon either short-circuits the flow with a 403 (block) or lets mitmproxy proceed to the real upstream (allow). mitmproxy itself generates the ephemeral per-bottle CA on startup; the public cert is copied into the agent's trust store and the private key dies with the sidecar on teardown.

This is Topology A' from docs/research/tls-mitm-for-pipelock.md — a variant of the research note's Topology A after a spike showed mitmproxy's upstream mode re-wraps decrypted flows in a new CONNECT to the upstream proxy (which would defeat the entire point). The addon recovers the design by emitting plain HTTP to pipelock explicitly instead of relying on mitmproxy's upstream chaining.

Problem

PRD 0001 wired pipelock onto every bottle's egress, but the current topology only sees CONNECT hostnames and opaque TLS bytes:

agent --HTTPS_PROXY--> pipelock --CONNECT host:443--> internet
                                  \____________________________
                                       opaque TLS bytes

What pipelock cannot scan in this mode is documented in docs/research/tls-mitm-for-pipelock.md §What pipelock cannot see today: request URLs and methods, request and response headers, request and response bodies, MCP JSON-RPC payloads, inner-vs-outer hostname (the domain-fronting check), and WebSocket frames inside a TLS-wrapped upgrade. The 48-pattern DLP layer this project relies on in PRD 0001 is therefore inert against every host in the current DEFAULT_ALLOWLIST — all of which are HTTPS-only.

The integration test added in tests/integration/test_pipelock_blocks_secret_post.py demonstrates the gap concretely: pipelock's body-scan layer only fires when the agent is forced to send plain HTTP. Real Claude Code traffic to api.anthropic.com goes over CONNECT-tunneled TLS and slips past the scanner.

pipelock-assessment.md §Scope gaps names this as a known limitation of the proxy-without-TLS-inspection shape. Closing it is the explicit motivation for tls-mitm-for-pipelock.md, whose recommendation this PRD implements (with the addon adjustment forced by the upstream-mode spike).

Goals / Success Criteria

The feature works when all of the following are observable:

A Node request from inside a launched bottle to a CONNECT-bumped HTTPS host (e.g. https://api.anthropic.com/dlp-probe) carrying a pipelock-recognized credential pattern in the body returns 403 from the bottle's egress chain — not a response from the upstream. The existing test_pipelock_blocks_secret_post test path becomes the HTTPS variant of this assertion.
A plain HTTPS GET from inside the bottle to an allowlisted host with no credential pattern (e.g. GET https://raw.githubusercontent.com/...) returns the real upstream response — the addon doesn't break clean traffic.
Claude Code itself reaches api.anthropic.com end-to-end through the bottle and completes a chat round-trip. No TLS-trust errors in the agent process.
mitmproxy's flow log and pipelock's body_dlp / header_dlp / core_dlp event lines both appear for the same outbound request, confirming the two-stage path is active.

The feature is done when all of the following ship:

A new MitmproxyProxy class with the same prepare / start / stop lifecycle shape as PipelockProxy, wired into the Docker backend's launch step.
A vendored Python addon at claude_bottle/mitmproxy/addon.py that mitmproxy loads on startup via mitmdump -s .... The sidecar runs in regular mode (default), not upstream mode.
The bottle launch step starts the mitmproxy sidecar, waits for the sidecar-internal CA to be generated, copies the CA public cert into the agent at /usr/local/share/ca-certificates/claude-bottle-mitm.crt, runs update-ca-certificates inside the agent, and threads the NODE_EXTRA_CA_CERTS / SSL_CERT_FILE / REQUESTS_CA_BUNDLE env trio onto the agent container's runtime env.
The agent's HTTPS_PROXY / HTTP_PROXY point at the mitmproxy sidecar (where they pointed at pipelock under PRD 0001).
pipelock is otherwise unchanged. It continues to load the YAML PRD 0001 generates and runs its existing scanning pipeline; the addon talks to it via the same forward-proxy interface today's test_pipelock_blocks_secret_post uses.
On bottle teardown the mitmproxy sidecar is removed and the ephemeral CA private key is gone with it.
An HTTPS variant of test_pipelock_blocks_secret_post proves pipelock now blocks a credential POST over HTTPS rather than plain HTTP.
An integration test proves a non-credential HTTPS GET through the chain returns the upstream's real response.
The dry-run preflight (start --dry-run) shows the mitmproxy sidecar in both text and --format=json output. The JSON contract gains a reserved egress.mitm: { "enabled": true, "ca_fingerprint": null } block; fingerprint is always null at dry-run because the CA doesn't exist yet. Real launches emit a one-line stderr log: claude-bottle: mitm ca fingerprint: <sha256-first-16>....

Non-goals

Topology C — extending pipelock itself to terminate TLS. The research note's recommended long-term shape, but substantial Go work plus the Apache-2.0-vs-ELv2 question. Deferred.
Topology D as canonical — mitmproxy with a pipelock /scan HTTP endpoint. The addon in this PRD talks to pipelock via its existing forward-proxy interface; no upstream pipelock change needed.
Persistent or shared CA across bottles. Each bottle gets a fresh CA generated by its own mitmproxy at startup.
Selective bumping ("ignore_hosts") as a v1 manifest field. v1 bumps every CONNECT. If a future allowlisted host turns out to pin (Mobile / Chromium-style cert pinning), a follow-up PRD adds the per-host opt-out via bottle.egress.tls_bump_ignore. Strictly additive.
HTTP/3 / QUIC. mitmproxy's HTTP/3 support is experimental. v1 relies on the v1-egress iptables layer blocking UDP/443 to force clients onto HTTP/2 over TCP, which mitmproxy 12 inspects natively (verified by spike).
Raw TCP / non-HTTP TLS interception. mitmproxy supports it via --mode reverse:, not in CONNECT-bump mode. SSH and any future raw-TCP egress route around mitmproxy entirely.
Trust-store rewiring for non-Debian agent images. The current Dockerfile is node:22-slim (Debian). If a future base switches to Red-Hat-family, the update-ca-certificates step becomes update-ca-trust. Out of scope until the base changes.
Response-body scanning. Pipelock supports it; we don't wire it in v1 because the addon would need to ferry the upstream response back through pipelock's scanner, which the forward- proxy interface doesn't support cleanly. v2 candidate.
MCP scanning on the bumped path. Only fires on MCP-formatted JSON-RPC payloads inside tool calls. Not relevant to plain HTTPS agent traffic and out of v1 scope.
Domain-fronting verification. Once the addon sees the inner Host / :authority, comparing it to the outer CONNECT target catches domain fronting. Worth ~10 lines in the addon, but defer until the rest of v1 is settled.
Host-side openssl / cryptography for CA generation. The research note's open question on this is resolved by letting mitmproxy itself generate the CA (it does so on first launch). No new host-side crypto.

Scope

In scope

New claude_bottle/mitmproxy/ package:
- __init__.py — backend-agnostic. Constants (sidecar port, image-pin digest, the in-container addon path), the abstract MitmproxyProxy class with prepare / start / stop shape mirroring PipelockProxy, and the small helper that reads the CA fingerprint from a PEM file via openssl x509 -fingerprint shelled out.
- addon.py — the Python addon mitmproxy loads. ~80–150 lines. For each request event: forward the decrypted request to pipelock at http://claude-bottle-pipelock-<slug>:8888 as a plain HTTP forward-proxy call (absolute-URI form). Inspect pipelock's response. If status is 403 and the body matches pipelock's known block-event shape, set the flow's response to a 403 with pipelock's body and short-circuit. Otherwise, discard pipelock's response (and any wasted upstream-leg response from pipelock's forwarder) and let mitmproxy proceed to the real upstream.
New claude_bottle/backend/docker/mitmproxy.py — DockerMitmproxyProxy(MitmproxyProxy) with the Docker-specific start/stop lifecycle. start(plan) does docker create / docker cp addon.py … / docker network connect / docker start, analogous to the existing DockerPipelockProxy.start. Injects CLAUDE_BOTTLE_PIPELOCK_URL into the sidecar env so the addon knows where pipelock lives.
New provisioner claude_bottle/backend/docker/provision/ca.py. Polls mitmproxy for the cert file, copies it through a host stage dir into the agent, runs update-ca-certificates inside the agent, computes the SHA-256 fingerprint, and prints the one-line stderr log.
BottleBackend.provision_ca(plan, target) joins the four existing provisioner methods on the abstract base. Default impl is no-op so other backends don't break when they don't yet implement TLS interception.
DockerBottlePlan grows a mitmproxy_plan field mirroring the existing proxy_plan.
Agent container docker run invocation:
- HTTPS_PROXY / HTTP_PROXY change from the pipelock service name to the mitmproxy service name.
- Three -e flags set the CA env trio so they're inherited by the eventual docker exec claude (Docker propagates run-time env into exec by default; fallback in Q1 below).
Dry-run preflight rendering of the mitmproxy entry (text + JSON). JSON gains egress.mitm: { "enabled": true, "ca_fingerprint": null }.
One stderr log line at launch with the CA fingerprint.
Two new integration tests under tests/integration/:
- test_mitmproxy_blocks_secret_https_post.py — HTTPS variant of the existing block-secret test. Asserts pipelock's body DLP fires on a credential POST tunneled through CONNECT.
- test_mitmproxy_allows_normal_https.py — confirms a plain HTTPS GET on an allowlisted host returns the upstream response, isolating the addon's pass-through path from the block path.
Unit tests for the addon's verdict logic (block vs allow on status + body shape, edge cases) using mitmproxy's mitmproxy.test flow fixtures. Unit tests for the proxy config builder (mirroring tests/unit/test_pipelock_yaml.py).

Out of scope

The v1 iptables + dnsmasq layer (separate PRD; see network-egress-guard.md). mitmproxy covers HTTP/HTTPS only; raw TCP, UDP, ICMP, and direct DNS still need the IP-level layer.
Pipelock config changes. Pipelock continues to load the YAML PRD 0001 generates; the addon talks to it via the existing forward-proxy interface.
A bottle-level toggle to skip mitmproxy entirely. v1 always wires it in.
Pinning-host detection automation. The cost of finding out (per research) is a single 5-minute test before adding a host; it stays a manual step.
Pipelock upstream contributions for an X-Pipelock-Verdict header. Possible follow-up. Until then the addon distinguishes blocks from passes via status + body fingerprint.

Proposed Design

Topology

agent --HTTPS_PROXY--> mitmproxy --addon--> pipelock     (scan)
                       (bump TLS)              |
                          ^                    | (verdict via status code)
                          |                    v
                          +-- on allow ----- real upstream
                                              (mitmproxy as client)

All three containers live on the same per-bottle internal Docker network. mitmproxy and pipelock are both attached to the per-bottle egress bridge for real-internet reach; the agent has no default route.

Concretely:

Agent sets HTTPS_PROXY=http://claude-bottle-mitm-<slug>:<port>. PRD 0001 had this pointing at pipelock; the hostname swap is the only agent-side env change.
mitmproxy runs in regular mode (default; no --mode flag). It bumps every CONNECT, generates fake leaf certs signed by its own CA, and presents them to the agent.
The addon, loaded via mitmdump -s /addon/addon.py, intercepts each decrypted request event. It forwards the request to pipelock at http://claude-bottle-pipelock-<slug>:8888 as a plain HTTP forward-proxy call (absolute-URI form), so pipelock sees the full URL, headers, and body.
The addon inspects pipelock's response. If status is 403 and the response body matches pipelock's known block-event shape, the addon sets the mitmproxy flow's response to a 403 with pipelock's body and short-circuits. Otherwise — including the case where pipelock's forwarder attempted the upstream and got a 4xx — the addon discards pipelock's response and lets mitmproxy proceed to the real upstream.
mitmproxy completes the outbound TLS to the real destination using its built-in trust store, just like any other forward proxy. Pipelock is only involved as a scanner.

The trade-off: pipelock makes a wasted upstream forward attempt for every allowed request (it tries to forward over plain HTTP to a real HTTPS-only host, which fails with the upstream's 4xx). This is benign — the scan completes before forwarding, the verdict reaches the addon, the upstream-side request happens to die in pipelock's forwarder rather than reach the agent. Acceptable cost for the visibility win. A pipelock-side improvement (skip the forward when the addon only needs the scan verdict) is a future optimization.

New components

claude_bottle/mitmproxy/__init__.py — backend-agnostic abstract base, constants, the openssl x509 -fingerprint helper.
claude_bottle/mitmproxy/addon.py — the scanning addon. Reads pipelock's URL from CLAUDE_BOTTLE_PIPELOCK_URL (injected into the sidecar env by the proxy's start). For each request flow: synchronously POST to pipelock; inspect status
- body; either short-circuit with 403 or fall through.
claude_bottle/backend/docker/mitmproxy.py — DockerMitmproxyProxy(MitmproxyProxy) with start/stop, the docker cp of the addon into the sidecar before docker start, and the CLAUDE_BOTTLE_PIPELOCK_URL wiring.

CA lifecycle

Simplified by letting mitmproxy own the generation:

Generation. mitmproxy generates a fresh CA on startup inside its container at /home/mitmproxy/.mitmproxy/mitmproxy-ca-cert.pem (public) + mitmproxy-ca.pem (private). No host-side openssl for generation; no host-side Python cryptography dep.
Volume strategy. Container-internal only. No host bind mount means the CA dies with the container.
Extraction. provision_ca polls (~1s) for the cert file via docker exec, then docker cp to host stage dir, then docker cp into the agent. Host stage dir gets cleaned up by the existing start.py finally block.
Bottle install.
1. docker cp <host stage>/mitm-ca.crt agent-<slug>:/usr/local/share/ca-certificates/claude-bottle-mitm.crt
2. docker exec -u 0 agent-<slug> chmod 644 …
3. docker exec -u 0 agent-<slug> update-ca-certificates
4. Three -e flags on docker run set the env trio (NODE_EXTRA_CA_CERTS=…/claude-bottle-mitm.crt, SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt, REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt) so docker exec claude inherits them.
Teardown. Sidecar container removed; CA private key gone.
Fingerprint. Computed post-extraction via shelled-out openssl x509 -fingerprint -sha256 -noout. Logged once to stderr at launch; never the private key.

Data model changes

None to the manifest schema. The dry-run JSON contract gains a reserved egress.mitm: { "enabled": true, "ca_fingerprint": null } block. Fingerprint is always null at dry-run (CA doesn't exist yet) but the field is reserved so future schema additions stay non-breaking.

A future selective-bump knob would add bottle.egress.tls_bump_ignore: [host, ...] per the research note. Strictly additive when it lands.

Existing code touched

claude_bottle/backend/docker/launch.py — bring up the mitmproxy sidecar between pipelock and the agent. Repoint the agent's HTTPS_PROXY / HTTP_PROXY env flags to mitmproxy. Register an ExitStack callback for mitmproxy teardown. Print the CA fingerprint once the sidecar reports ready.
claude_bottle/backend/docker/prepare.py — call into MitmproxyProxy.prepare(...) alongside PipelockProxy.prepare(...), populate DockerBottlePlan.mitmproxy_plan.
claude_bottle/backend/docker/backend.py — add the DockerMitmproxyProxy instance attribute (self._mitm) and thread it through launch + cleanup, mirroring self._proxy.
claude_bottle/backend/docker/bottle_plan.py — new mitmproxy_plan field. print() and to_dict() learn to render the mitmproxy entry and the egress.mitm JSON block.
claude_bottle/backend/__init__.py — abstract BottleBackend.provision_ca joins the four existing provisioners; default no-op.
tests/integration/ — two new tests as described above.
tests/unit/ — addon-verdict tests, mitmproxy-config builder tests, dry-run-plan test updated for the new egress.mitm block.

External dependencies

mitmproxy Docker image pinned by digest on the 12.x line. Bumped deliberately, mirroring the pipelock pin. Verified by spike to speak h2 on both halves.
No new host-side runtimes. mitmproxy generates the CA; fingerprint via the openssl already present on Debian / macOS / ubuntu-latest runners.

Open questions

(rewritten — most of the original v1 questions are now closed by the walkthrough spikes; what remains is addon-implementation specifics worth pinning during the first impl turn.)

Pipelock's 403-body fingerprint. The addon needs to distinguish a pipelock block (DLP / host) from a real-upstream 4xx that pipelock's forwarder relayed back. Most likely shape: pipelock's 403 response carries a JSON body with event / scanner fields, whereas a real-upstream 4xx carries whatever the upstream sent. Pin the exact fingerprint by inspecting pipelock's actual 403 body bytes at impl time. Long-term cleanup: file an upstream feature request for an X-Pipelock-Verdict: block response header so the addon can read a structured signal instead of pattern-matching the body.
Docker run env-var inheritance through docker exec. Plan assumes docker run -e VAR=value propagates to subsequent docker exec invocations. The Docker docs say so; not yet empirically pinned on this project's runner setup. Verify in the first impl turn. Trivial fallback: thread the three -e flags onto every DockerBottle.exec* call.
Addon synchronous-call latency. The addon makes a sync HTTP call to pipelock per outbound flow. Pipelock is on the same internal Docker network; expected per-call latency is well under 10ms. Confirm under the parallel-request load Claude Code generates (most likely a non-issue — Claude is single-stream request-wise).
Addon test fixtures. mitmproxy ships mitmproxy.test with flow fixtures; addons can be unit-tested without a running proxy. Confirm the import path and recommended fixture shape at impl time; structure the addon so the verdict-decision is a pure function that's trivially testable in isolation from any HTTP I/O.
Pipelock allowing the addon's forwarded request through. pipelock will see the addon's request as coming from the mitmproxy sidecar's IP on the internal network. Confirm pipelock has no client-IP allowlist that would reject these. Likely fine — pipelock's client_ip is informational in the scan event, not a gate.

References

docs/research/tls-mitm-for-pipelock.md — primary source. This PRD implements a variant of §Recommendation (Topology A) after the spike documented under "Open questions" §1 falsified the upstream mode assumption.
docs/research/pipelock-assessment.md §Scope gaps — names the TLS-inspection gap closed here.
docs/prds/0001-per-agent-egress-proxy-via-pipelock.md — egress-proxy baseline this PRD extends.
docs/prds/0003-bottle-backend-abstraction.md — backend ABC contract this PRD adds a provision_ca method to.
docs/prds/0004-split-out-provisioners.md — per-provisioner module pattern reused for the new CA provisioner.
mitmproxy: https://mitmproxy.org, https://github.com/mitmproxy/mitmproxy
mitmproxy modes: https://docs.mitmproxy.org/stable/concepts/modes/
mitmproxy CA cert installation: https://docs.mitmproxy.org/stable/concepts/certificates/
mitmproxy addon API: https://docs.mitmproxy.org/stable/addons-overview/
Node NODE_EXTRA_CA_CERTS: https://nodejs.org/api/cli.html#node_extra_ca_certsfile

21 KiB Raw Blame History Unescape Escape