bot-bottle/docs/prds/0006-pipelock-tls-interception.md

# PRD 0006: pipelock native TLS interception

- **Status:** Draft
- **Author:** didericis
- **Created:** 2026-05-12

## Summary

Turn on pipelock's built-in `tls_interception` so its DLP / URL /
header / MCP scanners fire on the plaintext of HTTPS requests
instead of only the outer `CONNECT` hostname. Pipelock generates a
per-bottle ephemeral CA at launch (`pipelock tls init`); the
public cert is installed into the agent container's trust store
and the private key dies with the sidecar on teardown. The
existing per-agent sidecar topology from PRD 0001 is otherwise
unchanged — one container, no addon, no second proxy.

This supersedes the closed PR #8 / branch `mitmproxy-tls-interception`,
which built a mitmproxy + addon chain on the (falsified) premise
that pipelock could not MITM. Empirical proof from the impl-time
spike: with `tls_interception: { enabled: true, ca_cert, ca_key }`
in the pipelock config, pipelock answered a credential POST over
HTTPS with `STATUS=403 / body: blocked: request body contains
secret: GitHub Token` and emitted both
`scanner:"tls_intercept"` and `scanner:"body_dlp"` events.

## Problem

PRD 0001 wired pipelock onto every bottle's egress, but pipelock
ran with its default `tls_interception.enabled: false`. The agent
container's only egress route is pipelock, but pipelock only saw
`CONNECT` hostnames and the encrypted bytes inside the tunnel.
Pipelock's headline scanners — request body DLP (48 credential
patterns), header DLP, URL DLP, subdomain entropy, MCP scanning,
response-body scanning — all need plaintext to fire. Against the
HTTPS-only hosts in `DEFAULT_ALLOWLIST` (`api.anthropic.com`,
`raw.githubusercontent.com`, etc.) they are effectively disabled.

The existing `tests/integration/test_pipelock_blocks_secret_post`
test only fires because it forces the agent to send plain HTTP
through pipelock's forward-proxy mode. Real Claude Code traffic
uses HTTPS via CONNECT and slips past the scanner.

## Goals / Success Criteria

The feature works when all of the following are observable:

- A Node / curl request from inside a launched bottle to a
  CONNECT-bumped HTTPS host (e.g. `https://api.anthropic.com/dlp-probe`)
  carrying a pipelock-recognized credential pattern in the body
  returns 403 from pipelock with the documented
  `blocked: request body contains secret: …` body. Pipelock's
  `body_dlp` event fires on the decrypted request.
- A clean HTTPS GET from inside the bottle to an allowlisted host
  (e.g. `https://raw.githubusercontent.com/...`) returns the real
  upstream response — TLS interception doesn't break legitimate
  traffic.
- The agent's TLS library trusts pipelock's bumped leaf certs
  (per the bottle's installed CA); no TLS-trust errors.
- Claude Code reaches `api.anthropic.com` end-to-end through the
  bottle and completes a chat round-trip.

The feature is **done** when all of the following ship:

- `pipelock_build_config` / `pipelock_render_yaml` emit a
  `tls_interception` block with `enabled: true` and the per-bottle
  CA cert/key paths. The defaults
  (`cert_ttl: 24h`, `cert_cache_size: 10000`,
  `passthrough_domains: []`) are kept; only `enabled` and the
  cert paths are populated.
- The prepare step generates a per-bottle CA via `pipelock tls init`
  in a one-shot container, writes `ca.pem` and `ca-key.pem` to
  `stage_dir`. Paths land on the `DockerBottlePlan`.
- `DockerPipelockProxy.start` mounts the stage dir into the
  sidecar (read-only) so the running pipelock can read its CA.
- `BottleBackend.provision_ca` (new) copies the CA public cert
  into the agent at
  `/usr/local/share/ca-certificates/bot-bottle-mitm.crt`, runs
  `update-ca-certificates`, and sets the `NODE_EXTRA_CA_CERTS` /
  `SSL_CERT_FILE` / `REQUESTS_CA_BUNDLE` env trio on the agent
  container's runtime env. Default no-op on the abstract base so
  other backends aren't forced to implement.
- The launch step prints a one-line stderr log with the SHA-256
  fingerprint of the public CA cert (computed via stdlib
  `ssl.PEM_cert_to_DER_cert` + `hashlib.sha256`).
- On bottle teardown the sidecar is removed and the CA private
  key is gone with it.
- Two new integration tests under `tests/integration/`:
  - HTTPS variant of the credential-post block test (proves the
    `tls_intercept` + `body_dlp` chain fires end-to-end).
  - Clean HTTPS GET test (proves the allow path doesn't break TLS
    trust and returns real upstream content).
- The dry-run preflight (`start --dry-run`) renders the new TLS
  layer. Text: one line under the egress summary. JSON: a
  reserved `egress.tls_interception: { enabled: true,
  ca_fingerprint: null }` block — fingerprint is null at dry-run
  because the CA only exists after launch.

## Non-goals

- A second proxy in the chain. Pipelock does the bumping
  natively; the mitmproxy approach was based on a wrong premise
  (closed PR #8).
- Per-bottle override to disable interception. v1 always enables
  `tls_interception`. The pipelock-side `passthrough_domains`
  list is the right knob if a future allowlisted host turns out
  to pin certs — exposing it through the manifest is a follow-up.
- A long-lived / shared CA across bottles. Each bottle gets a
  fresh CA generated by `pipelock tls init` and destroyed with the
  sidecar.
- Tuning `cert_ttl`, `cert_cache_size`, `max_response_bytes`,
  `cross_request_detection`, or other pipelock advanced features.
  Defaults from `pipelock generate config --preset strict` are
  fine for v1.
- Trust-store paths for non-Debian agent images.
  `node:22-slim` is Debian; `update-ca-certificates` is the right
  command. A Red-Hat-family base would need `update-ca-trust`.
- HTTP/3 / QUIC. Pipelock's interception is HTTP/HTTPS-over-TLS;
  UDP/443 still needs an iptables layer (separate PRD).

## Scope

### In scope

- **`bot_bottle/pipelock.py`** changes:
  - Extend `pipelock_build_config` to include
    `tls_interception: { enabled: true, ca_cert: <path>, ca_key:
    <path> }`. Paths are populated from the plan; the function's
    signature grows a `cert_path` / `key_path` pair or reads them
    off `Bottle` once they're stored.
  - Extend `pipelock_render_yaml` to emit the new block.
- **`bot_bottle/backend/docker/pipelock.py`** changes:
  - New helper `pipelock_tls_init(stage_dir)` runs the upstream
    image as a one-shot:
    `docker run --rm -v <stage>:/h -e PIPELOCK_HOME=/h pipelock tls init`,
    leaving `ca.pem` and `ca-key.pem` under `stage_dir`. The host
    file owner is whatever the upstream image's user is; the
    sidecar mount is read-only so this is fine.
  - `DockerPipelockProxy.start` `docker cp`s the CA cert + key
    into the sidecar at `/etc/pipelock/ca.pem` and
    `/etc/pipelock/ca-key.pem` between `docker create` and
    `docker start`, mirroring the existing pattern for the YAML
    config. If pipelock's image runs as non-root, a `docker exec
    -u 0 chown pipelock:pipelock /etc/pipelock/ca*.pem` lands
    between the `cp` and the `start`.
- **`bot_bottle/backend/__init__.py`**: new abstract method
  `provision_ca(plan, target)` on `BottleBackend`, default no-op.
  `BottleBackend.provision` orchestrates `ca → prompt → skills →
  ssh → git`.
- **`bot_bottle/backend/docker/provision/ca.py`** (new):
  - Reads the cert from `stage_dir` (already written by prepare).
  - `docker cp` into the agent.
  - `docker exec -u 0 ... chmod 644 ...` + `update-ca-certificates`.
  - Computes the SHA-256 fingerprint with stdlib (`ssl` +
    `hashlib`), emits one stderr log line.
- **`bot_bottle/backend/docker/launch.py`**:
  - Three new `-e` flags on the agent's `docker run`:
    `NODE_EXTRA_CA_CERTS=/usr/local/share/ca-certificates/bot-bottle-mitm.crt`,
    `SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt`,
    `REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt`.
  - `HTTPS_PROXY` / `HTTP_PROXY` continue to point at pipelock
    (unchanged from PRD 0001 — the mitmproxy detour in PR #8 is
    abandoned).
- **`bot_bottle/backend/docker/bottle_plan.py`**:
  - One new `info(...)` line in `print()` noting TLS interception
    is on.
  - `to_dict()` gains an `egress.tls_interception: { enabled:
    true, ca_fingerprint: null }` block. Reserved for future
    population.
- **`bot_bottle/backend/docker/prepare.py`**: call
  `pipelock_tls_init(stage_dir)` and write the resolved cert/key
  paths onto the plan (either on the existing `proxy_plan` field
  or on the parent `DockerBottlePlan`).
- **Tests:**
  - `tests/integration/test_pipelock_blocks_secret_https_post.py`
    (new) — HTTPS variant of the existing block test.
  - `tests/integration/test_pipelock_allows_normal_https.py`
    (new) — clean HTTPS GET succeeds.
  - `tests/unit/test_pipelock_yaml.py` updated to assert the new
    `tls_interception` block in the rendered config.
  - `tests/integration/test_dry_run_plan.py` updated to assert
    the new `egress.tls_interception` JSON block.

### Out of scope

- Modifying pipelock itself. We're using existing config knobs.
- A manifest field to disable / customize interception per bottle.
  Doable but premature.
- Wiring `passthrough_domains`. The default `[]` is correct for
  v1; add the manifest field when a pinning host shows up. The
  shape is pre-recorded so the follow-up is mechanical:
  `bottle.egress.tls_passthrough_domains: [host, ...]`,
  mirroring the existing `egress.allowlist`.
- `cross_request_detection`, `entropy_budget`,
  `fragment_reassembly`, `reverse_proxy`, `scan_api` — features
  pipelock exposes but we don't need for the body-DLP gap.

## Proposed Design

### Topology

```
agent --HTTPS_PROXY--> pipelock --[bumps TLS]--> internet
                       (sees plaintext: URL, headers, body)
```

Same single-sidecar shape as PRD 0001. The only addition is
`tls_interception` in pipelock's config plus the per-bottle CA
generated at prepare time.

### CA lifecycle

- **Generation.** Host-side, at prepare time, via a one-shot
  `docker run --rm -v <stage>:/h -e PIPELOCK_HOME=/h pipelock tls
  init`. Output: `<stage>/ca.pem` + `<stage>/ca-key.pem`, mode 600.
- **Sidecar install.** `DockerPipelockProxy.start` `docker cp`s
  the CA cert + key into the sidecar at `/etc/pipelock/ca.pem`
  and `/etc/pipelock/ca-key.pem` between `docker create` and
  `docker start`. Same pattern the proxy already uses for the
  YAML config — no bind-mount, no UID/permission concern from
  the one-shot generation step. The rendered YAML references
  the in-container paths.
- **Bottle install.** `provision_ca` (Docker impl) does
  `docker cp <stage>/ca.pem agent:/usr/local/share/ca-certificates/bot-bottle-mitm.crt`,
  then `update-ca-certificates`. The CA env trio is set at
  `docker run -e` time (Docker propagates run-time env into
  `docker exec`).
- **Per-bottle ephemerality.** Enforced by *regenerating per
  launch*, not by validity windows. Pipelock's defaults
  (`cert_ttl: 24h` for leaves, `--validity 87600h` for the CA)
  are fine — the CA lives only as long as the sidecar, which is
  the bottle's lifetime.
- **Teardown.** Sidecar removed via `ExitStack` callback, then
  the launch context manager's outer `finally` `shutil.rmtree`s
  `stage_dir`. CA dies with both, in that order, so the sidecar
  is never reading a deleted mount on shutdown.
- **Fingerprint.** Computed via stdlib in `provision_ca` and
  logged once to stderr (`bot-bottle: mitm ca fingerprint:
  sha256:<hex>…`). The private key never appears in any log.

### Data model changes

None to the manifest schema. The dry-run JSON contract grows a
reserved `egress.tls_interception` block; the fingerprint is
always null at dry-run because the CA doesn't exist yet.

### Existing code touched

Surgical, all on the existing pipelock path:

- `bot_bottle/pipelock.py` — config builder + YAML renderer.
- `bot_bottle/backend/__init__.py` — abstract `provision_ca`.
- `bot_bottle/backend/docker/pipelock.py` — `tls init` helper,
  sidecar volume mount.
- `bot_bottle/backend/docker/prepare.py` — CA paths on plan.
- `bot_bottle/backend/docker/launch.py` — CA env trio on agent.
- `bot_bottle/backend/docker/backend.py` — `provision_ca`
  dispatch + thread `self._proxy` through prepare/launch unchanged
  shape.
- `bot_bottle/backend/docker/bottle_plan.py` — preflight
  rendering.
- `bot_bottle/backend/docker/provision/ca.py` (new).

Net diff is meaningfully smaller than PR #8 because pipelock
already does the work — no addon, no second sidecar, no second
backend module.

### External dependencies

- **Pipelock image** — unchanged pin from PRD 0001
  (`ghcr.io/luckypipewrench/pipelock@sha256:3b1a3941…`,
  matching pipelock v2.3.0). No new image dependency.
- **No host-side crypto deps.** CA generation uses the pipelock
  image's own `tls init` command in a one-shot container.
  Fingerprint uses Python stdlib `ssl` + `hashlib`.

## References

- `docs/research/pipelock-assessment.md` (now corrected) —
  pipelock capability assessment including the
  `tls_interception` block.
- `docs/prds/0001-per-agent-egress-proxy-via-pipelock.md` —
  egress-proxy baseline this PRD extends.
- `docs/prds/0003-bottle-backend-abstraction.md` — backend ABC
  contract this PRD adds a `provision_ca` method to.
- `docs/prds/0004-split-out-provisioners.md` — per-provisioner
  module pattern reused for the new CA provisioner.
- Pipelock `tls` CLI (in-image help):
  `pipelock tls init / install-ca / show-ca`.
- Closed PR #8 — earlier mitmproxy-based design built on the
  falsified "pipelock can't MITM" premise; archived for context.