Files
bot-bottle/docs/prds/0017-egress-proxy-via-mitmproxy.md
T
2026-05-28 17:56:14 -04:00

14 KiB

PRD 0017: Egress-proxy — universal MITM with path filtering + auth injection

  • Status: Draft
  • Author: didericis
  • Created: 2026-05-25
  • Supersedes: the cred-proxy sidecar (PRD 0010) — hard cutover.

Summary

Replace the per-bottle cred-proxy sidecar with a new egress-proxy sidecar built on mitmproxy. The egress-proxy is the agent's HTTP_PROXY / HTTPS_PROXY — every agent HTTP/HTTPS request flows through it before reaching pipelock. It owns three jobs that today are split between cred-proxy and pipelock:

  1. MITM the agent's HTTPS. Uses the per-bottle CA today held by pipelock; that key moves to the egress-proxy.
  2. Path-level allow/deny. Manifest-declared path_allowlist per route. Universal coverage — any HTTPS path the agent reaches for is inspected here, not just traffic that voluntarily dials the cred-proxy URL.
  3. Credential injection. Continues cred-proxy's existing role: match by hostname (or hostname + path), strip inbound Authorization, inject one based on the route's optional auth: { scheme, token_ref } block.

Pipelock's role narrows to hostname allowlist + DLP body scanning on the egress-proxy → upstream leg. Pipelock no longer holds the CA private key; no longer the agent's direct proxy.

Problem

PR #25's pipelock-block flow exposed an honest gap: pipelock's api_allowlist is hostname-only (verified by probing the binary's strict preset and the pipelock check --url output). Approving a proposed pipelock-block opens the entire host, not the URL's path. For shared platforms (github.com, gitlab.com, public registries) operators routinely want narrower-than-host granularity — allow github.com/didericis but block github.com/somebody-else.

Cred-proxy already does path-prefix routing for credentialed APIs, but it only sees the requests the agent voluntarily routes to it (via ANTHROPIC_BASE_URL, ~/.gitconfig insteadOf, npmrc registry=). A raw curl https://github.com/anyone from the agent goes to HTTPS_PROXY=pipelock directly and bypasses cred-proxy entirely. So extending cred-proxy with path_allowlist (the earlier PRD 0017 draft) buys opt-in path filtering, not enforcement.

For enforcement we need a layer that sits on the agent's HTTPS_PROXY path — universal coverage of agent egress.

Goals / Success Criteria

A bottle manifest declares an egress-proxy route with a path_allowlist. From inside the bottle, curl https://github.com/didericis/foo succeeds; curl https://github.com/somebody-else/secret gets a 403 from egress-proxy, never reaches pipelock or the real github. The same holds for any tool inside the bottle that respects HTTPS_PROXY — claude-code, git over HTTPS, npm, raw curl, random Python requests. No tool-specific rewrite is required for path enforcement.

Existing cred-proxy responsibilities continue to work after the cutover: Anthropic OAuth injection for claude-code (via the proxy-side header injection rather than the dotfile rewrite), git-insteadof routing into the proxy stays useful for hostname canonicalisation but is no longer load-bearing for credential delivery.

Non-goals

  • Replacing pipelock. Pipelock keeps doing hostname allowlist + DLP body scanning on the egress-proxy → upstream leg.
  • Building our own MITM stack. mitmproxy already does it; we ship addons.
  • Backward compatibility with bottle.cred_proxy.routes[]. Hard cutover (see Migration).
  • Path-level rules in pipelock. Upstream feature request is a separate track (file independently); this PRD doesn't depend on it.

Scope

In scope

  • A new egress-proxy sidecar replacing the cred-proxy sidecar. mitmproxy image, pinned by digest. Addons in Python.
  • Per-bottle CA generation moves from pipelock to egress-proxy. The agent's trust store is rebuilt against the egress-proxy CA (was pipelock's CA).
  • Manifest rename: bottle.cred_proxy.routes[]bottle.egress_proxy.routes[]. The route shape gains optional path_allowlist: [<prefix>, ...] and a nested optional auth: { scheme, token_ref } block (presence/absence of auth is the authenticated vs unauthenticated signal — replaces the old auth_scheme: "none" pattern).
  • Agent's HTTP_PROXY / HTTPS_PROXY env vars repointed at the egress-proxy (was pipelock).
  • Pipelock retains its sidecar slot and its own DLP + hostname scanner. The agent never dials it directly anymore; egress-proxy uses HTTPS_PROXY=pipelock for its outbound leg, matching the current cred-proxy → pipelock pattern.
  • Existing PRDs that depend on cred-proxy:
    • PRD 0014 (cred-proxy-block remediation) → renames + retargets apply path. SIGHUP reload semantics carry over to egress-proxy.
    • PRD 0013 (supervise plane) cred-proxy-block MCP tool stays; its proposed file format updates per the new route shape.
  • Removal of the old cred-proxy code: bot_bottle/cred_proxy.py, cred_proxy_server.py, backend/docker/cred_proxy.py, provision/cred_proxy.py, the Dockerfile.cred-proxy. Tests updated.

Out of scope

  • Pipelock CA path: pipelock keeps generating its own CA for any internal TLS termination it still does (e.g., on the egress-proxy → upstream leg if pipelock is the MITM there). Whether pipelock needs that CA at all post-cutover is an open question (probably no — egress-proxy already terminated; pipelock is now downstream of a plain-HTTP forward from egress-proxy).
  • Glob / regex matching in path_allowlist. v1 ships prefix matching; expressive forms are a follow-up.
  • An MCP tool for the agent to propose path_allowlist additions. Today the operator manages this via the manifest + the existing routes edit <bottle> TUI verb (renamed to egress-proxy edit <bottle>).

Proposed design

Topology

[Agent] --HTTP_PROXY=egress-proxy-->
           [egress-proxy (mitmproxy)]
              MITM with per-bottle CA
              path_allowlist enforcement
              Authorization header injection
            --HTTPS_PROXY=pipelock-->
                  [pipelock]
                    hostname allowlist
                    DLP body scan
                  --egress-->  Internet

Universal coverage: every HTTP/HTTPS request the agent makes hits egress-proxy first. cred-proxy's URL convention (http://cred-proxy:9099/...) goes away — there's no need for the agent to address the proxy by name because it's already on the default proxy path.

Manifest

egress_proxy:
  routes:
    # Authenticated route — `auth` block carries the injection
    # config. path_allowlist optional.
    - host: "api.github.com"
      auth:
        scheme: "Bearer"
        token_ref: "GH_PAT"
      path_allowlist:
        - "/repos/didericis/"
        - "/users/didericis"
    # Unauthenticated path-filtered route — `auth` omitted
    # entirely (presence/absence of the key is the auth signal).
    - host: "github.com"
      path_allowlist:
        - "/didericis/"
    # Bare-pass route: no auth, no path constraint. Useful when
    # you want a host to skip path filtering but still be
    # DLP-scanned by pipelock on the outbound leg.
    - host: "api.anthropic.com"

Route matching is on host (was path prefix). The hostname gates whether a route applies; path_allowlist (if present) constrains the URL path under that host. The optional auth block carries credential-injection config:

  • Omit auth → no Authorization header injected (replaces the earlier draft's auth_scheme: "none").
  • auth.scheme → one of Bearer, token (the values cred-proxy supports today; sidesteps the gitea-token quirk).
  • auth.token_ref → host env var holding the secret. Same semantics as cred-proxy's TokenRef field today.

Validation: auth (if present) must contain both scheme and token_ref. An empty auth: {} is an error rather than a synonym for "no auth" — that's what omission is for.

mitmproxy addon shape

The egress-proxy ships a small Python addon that:

  • Loads the per-bottle routes from /etc/egress-proxy/routes.yaml (rendered by the prepare step, docker-cp'd in like cred-proxy's current routes.json).
  • On request hook: match flow.request.host → route. If no route matches → forward unchanged (pipelock will hostname-gate it). If route matches and has path_allowlist, check flow.request.path against the prefix list; 403 with a clear reason if no match.
  • On approved requests: strip inbound Authorization. If the route carries an auth block, inject Authorization: <auth.scheme> <token-from-env-named-by-auth.token_ref>. If the route omits auth, leave Authorization unset.
  • SIGHUP / file-mtime watch on routes.yaml for hot-reload (same cadence as today's cred-proxy SIGHUP path).

mitmproxy's standard CA generation handles per-host leaf certs at SNI time. The per-bottle CA is generated at bottle launch (was pipelock's tls-init step; now egress-proxy's). Agent's trust store gets the egress-proxy CA installed in place of pipelock's.

Trust-domain concentration

The egress-proxy now holds:

  • Every credential the bottle declared in egress_proxy.routes[] (OAuth tokens, PATs, npm tokens).
  • The per-bottle MITM CA private key.

This is a deliberate concentration. With the previous split:

  • cred-proxy held tokens.
  • pipelock held the CA.

A memory disclosure in cred-proxy exposed tokens; in pipelock, the CA. Both were bad; neither exposed everything.

The new egress-proxy in one disclosure exposes both. Mitigations:

  • mitmproxy runs as an unprivileged user inside the container.
  • Tokens live in the container's environ (same as cred-proxy today). The CA private key is mounted from the host's stage_dir (mode 600).
  • Pipelock stays as a separate sidecar, so a compromise of egress-proxy doesn't disable pipelock's hostname check + DLP on the outbound leg — the attacker can forge certs to the agent but can't easily exfil from inside the agent without pipelock noticing.

The user (per PR #25 discussion) accepted this concentration in exchange for the one-sidecar consolidation. The PRD records it explicitly.

Migration — hard cutover

No backward-compat alias for bottle.cred_proxy.routes[]. At manifest load:

  • cred_proxy: block → die() with a clear pointer at this PRD and a migration recipe (rename to egress_proxy:, rename pathhost, drop the agent-side URL prefix).
  • cred_proxy_routes field on existing dataclasses removed.
  • Dockerfile.cred-proxy deleted.
  • bot_bottle/cred_proxy*.py deleted.
  • bot_bottle/backend/docker/cred_proxy*.py consolidated into egress_proxy*.py.
  • Provisioner files renamed.
  • PRDs 0010 (cred-proxy), 0014 (cred-proxy-block remediation) retroactively annotated as "superseded by 0017" — old text preserved, header updated.

Implementation chunks

Plausibly three implementation PRs after this PRD lands:

  1. egress-proxy sidecar core. Dockerfile + mitmproxy addon + routes.yaml schema + lifecycle (prepare / start / stop / SIGHUP).
  2. Manifest + provisioner migration. Rename cred-proxy throughout the codebase, hard-fail on legacy manifests, update agent CA trust to point at egress-proxy.
  3. PRD 0014 retargeting. cred-proxy-block remediation's apply path repointed at egress-proxy (SIGHUP, audit log, etc.). Supervise tool description updated.

Open questions

  • mitmproxy addon distribution. Mount the addon Python file from stage_dir, or bake it into the image. Mount is more hot-reloadable; bake-in is more reproducible. Recommend bake-in, with routes.yaml as the only mounted state.
  • Path match semantics. Prefix-only for v1 (matches PRD 0017 v1 spirit). Globs / regex are a follow-up if operators ask.
  • Mode for the Authorization strip on inbound. Pipelock has a similar strip in sensitive_headers. Confirm there's no double-strip causing a real header the agent set to disappear unexpectedly. Probably want egress-proxy to be the only stripper for routes that match.
  • Pipelock's TLS interception post-cutover. Today pipelock MITMs the cred-proxy → upstream leg using its own CA. After the cutover, that leg starts as a CONNECT tunnel from egress-proxy (egress-proxy treats pipelock as a plain HTTPS forward proxy). Does pipelock still need to MITM? Probably no — egress-proxy already terminated, body content is already inspected upstream by egress-proxy's addons (or could be). But that means moving DLP from pipelock to egress-proxy, which expands egress-proxy's trust-domain further. Punted to the implementation PR to decide.
  • Performance. Two MITM hops in the worst case (agent ↔ egress-proxy and pipelock ↔ upstream if pipelock keeps its interception). Measure under realistic load; if it's a problem, the answer is probably to disable pipelock's TLS interception and let it operate at hostname-only.
  • Agent's existing dotfile rewrites. Today cred-proxy provisions ~/.npmrc with registry=http://cred-proxy:9099/npm/, ~/.gitconfig with insteadOf rules, etc. After the cutover none of those rewrites are strictly necessary for routing (HTTPS_PROXY catches everything), but they may still be useful for canonicalisation (so the agent's npm install doesn't surprise itself by talking to a different registry). Decide per dotfile in the migration PR.

References

  • PRD 0010 — cred-proxy (superseded by this PRD).
  • PRD 0014 — cred-proxy-block remediation (retargeted).
  • PRD 0013 — supervise plane (tool descriptions updated).
  • PR #25 — the supervise loop, whose _apply_pipelock_url docstring flagged the original "path filtering belongs somewhere" follow-up.
  • mitmproxy — https://mitmproxy.org/ — chosen as the egress-proxy engine because it's the canonical scriptable MITM forward proxy.