docs(prd-0017): pivot to mitmproxy-based egress-proxy
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m34s

Significant rewrite of PRD 0017 based on PR #25 design discussion.

Original draft proposed adding `path_allowlist` to the existing
cred-proxy. That bought opt-in path filtering for tools that
voluntarily routed through cred-proxy (Claude Code, git, npm) —
but raw `curl https://github.com/foo` from the agent goes to
HTTPS_PROXY=pipelock and bypasses cred-proxy entirely, so any
universal enforcement claim was a lie.

New design: replace cred-proxy with a mitmproxy-based egress-proxy
that becomes the agent's HTTP_PROXY/HTTPS_PROXY. Every agent
HTTP/HTTPS request flows through it before reaching pipelock.
Path-level allow/deny enforcement is universal because the proxy
is on every leg. The proxy also absorbs cred-proxy's credential
injection role (mitmproxy addon hooks request → strip + inject
Authorization).

Net sidecar count: unchanged. cred-proxy is replaced 1:1 by
egress-proxy. Pipelock stays as hostname allow + DLP downstream
of egress-proxy.

Decisions baked in per PR-#25 discussion:
- Tool: mitmproxy (designed for this; Python addons; well-maintained).
- CA custody: egress-proxy holds the per-bottle MITM CA key
  (concentration accepted; documented in trust-domain section).
- Migration: hard cutover. Existing `bottle.cred_proxy.routes[]`
  manifests fail-fast at load time with a pointer at this PRD.

Open questions retained for the implementation PRs: addon
distribution (bake vs mount), prefix-vs-glob match, double-strip
of Authorization between egress-proxy and pipelock, whether
pipelock keeps TLS interception or stays hostname-only post-cutover,
performance under two-MITM-hops.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-25 13:28:53 -04:00
parent 5b925a6699
commit b0d9802469
2 changed files with 309 additions and 195 deletions
@@ -0,0 +1,309 @@
# PRD 0017: Egress-proxy — universal MITM with path filtering + auth injection
- **Status:** Draft
- **Author:** didericis
- **Created:** 2026-05-25
- **Supersedes:** the cred-proxy sidecar (PRD 0010) — hard cutover.
## Summary
Replace the per-bottle cred-proxy sidecar with a new `egress-proxy`
sidecar built on mitmproxy. The egress-proxy is the agent's
`HTTP_PROXY` / `HTTPS_PROXY` — every agent HTTP/HTTPS request flows
through it before reaching pipelock. It owns three jobs that today
are split between cred-proxy and pipelock:
1. **MITM the agent's HTTPS.** Uses the per-bottle CA today held by
pipelock; that key moves to the egress-proxy.
2. **Path-level allow/deny.** Manifest-declared `path_allowlist`
per route. Universal coverage — any HTTPS path the agent reaches
for is inspected here, not just traffic that voluntarily dials
the cred-proxy URL.
3. **Credential injection.** Continues cred-proxy's existing role:
match by hostname (or hostname + path), strip inbound
Authorization, inject one based on `auth_scheme` + `token_ref`.
Pipelock's role narrows to hostname allowlist + DLP body scanning
on the egress-proxy → upstream leg. Pipelock no longer holds the
CA private key; no longer the agent's direct proxy.
## Problem
PR #25's pipelock-block flow exposed an honest gap: pipelock's
`api_allowlist` is hostname-only (verified by probing the binary's
strict preset and the `pipelock check --url` output). Approving a
proposed `pipelock-block` opens the entire host, not the URL's
path. For shared platforms (github.com, gitlab.com, public
registries) operators routinely want narrower-than-host granularity
— allow github.com/didericis but block github.com/somebody-else.
Cred-proxy already does path-prefix routing for credentialed APIs,
but it only sees the requests the agent voluntarily routes to it
(via `ANTHROPIC_BASE_URL`, `~/.gitconfig` insteadOf, npmrc
`registry=`). A raw `curl https://github.com/anyone` from the agent
goes to `HTTPS_PROXY=pipelock` directly and bypasses cred-proxy
entirely. So extending cred-proxy with `path_allowlist` (the earlier
PRD 0017 draft) buys *opt-in* path filtering, not enforcement.
For enforcement we need a layer that sits on the agent's
`HTTPS_PROXY` path — universal coverage of agent egress.
## Goals / Success Criteria
A bottle manifest declares an egress-proxy route with a
`path_allowlist`. From inside the bottle, `curl
https://github.com/didericis/foo` succeeds; `curl
https://github.com/somebody-else/secret` gets a 403 from
egress-proxy, never reaches pipelock or the real github. The same
holds for any tool inside the bottle that respects
`HTTPS_PROXY` — claude-code, git over HTTPS, npm, raw curl, random
Python `requests`. No tool-specific rewrite is required for path
enforcement.
Existing cred-proxy responsibilities continue to work after the
cutover: Anthropic OAuth injection for claude-code (via the
proxy-side header injection rather than the dotfile rewrite),
git-insteadof routing into the proxy stays useful for hostname
canonicalisation but is no longer load-bearing for credential
delivery.
## Non-goals
- Replacing pipelock. Pipelock keeps doing hostname allowlist +
DLP body scanning on the egress-proxy → upstream leg.
- Building our own MITM stack. mitmproxy already does it; we ship
addons.
- Backward compatibility with `bottle.cred_proxy.routes[]`. Hard
cutover (see Migration).
- Path-level rules in pipelock. Upstream feature request is a
separate track (file independently); this PRD doesn't depend on
it.
## Scope
### In scope
- A new `egress-proxy` sidecar replacing the cred-proxy sidecar.
mitmproxy image, pinned by digest. Addons in Python.
- Per-bottle CA generation **moves from pipelock to egress-proxy**.
The agent's trust store is rebuilt against the egress-proxy CA
(was pipelock's CA).
- Manifest rename: `bottle.cred_proxy.routes[]`
`bottle.egress_proxy.routes[]`. The route shape gains optional
`path_allowlist: [<prefix>, ...]` and supports `auth_scheme:
"none"`.
- Agent's `HTTP_PROXY` / `HTTPS_PROXY` env vars repointed at the
egress-proxy (was pipelock).
- Pipelock retains its sidecar slot and its own DLP + hostname
scanner. The agent never dials it directly anymore; egress-proxy
uses `HTTPS_PROXY=pipelock` for its outbound leg, matching the
current cred-proxy → pipelock pattern.
- Existing PRDs that depend on cred-proxy:
- PRD 0014 (cred-proxy-block remediation) → renames + retargets
apply path. SIGHUP reload semantics carry over to egress-proxy.
- PRD 0013 (supervise plane) `cred-proxy-block` MCP tool stays;
its proposed file format updates per the new route shape.
- Removal of the old cred-proxy code: `claude_bottle/cred_proxy.py`,
`cred_proxy_server.py`, `backend/docker/cred_proxy.py`,
`provision/cred_proxy.py`, the `Dockerfile.cred-proxy`. Tests
updated.
### Out of scope
- Pipelock CA path: pipelock keeps generating its *own* CA for
any internal TLS termination it still does (e.g., on the
egress-proxy → upstream leg if pipelock is the MITM there).
Whether pipelock needs that CA at all post-cutover is an open
question (probably no — egress-proxy already terminated; pipelock
is now downstream of a plain-HTTP forward from egress-proxy).
- Glob / regex matching in `path_allowlist`. v1 ships prefix
matching; expressive forms are a follow-up.
- An MCP tool for the agent to propose `path_allowlist`
additions. Today the operator manages this via the manifest +
the existing `routes edit <bottle>` TUI verb (renamed to
`egress-proxy edit <bottle>`).
## Proposed design
### Topology
```
[Agent] --HTTP_PROXY=egress-proxy-->
[egress-proxy (mitmproxy)]
MITM with per-bottle CA
path_allowlist enforcement
Authorization header injection
--HTTPS_PROXY=pipelock-->
[pipelock]
hostname allowlist
DLP body scan
--egress--> Internet
```
Universal coverage: every HTTP/HTTPS request the agent makes hits
egress-proxy first. cred-proxy's URL convention
(`http://cred-proxy:9099/...`) goes away — there's no need for the
agent to address the proxy by name because it's already on the
default proxy path.
### Manifest
```yaml
egress_proxy:
routes:
# Authenticated route (today's cred-proxy shape, slightly
# renamed). path_allowlist optional.
- host: "api.github.com"
auth_scheme: "Bearer"
token_ref: "GH_PAT"
path_allowlist:
- "/repos/didericis/"
- "/users/didericis"
# Unauthenticated path-filtered route.
- host: "github.com"
auth_scheme: "none"
path_allowlist:
- "/didericis/"
# Bare-pass route: no auth injection, no path enforcement.
# Useful when you want a host to skip path filtering but
# still be DLP-scanned by pipelock.
- host: "api.anthropic.com"
auth_scheme: "none"
# no path_allowlist → all paths pass
```
Route matching is on `host` (was `path` prefix). The hostname
gates whether a route applies; `path_allowlist` (if present)
constrains the URL path under that host.
### mitmproxy addon shape
The egress-proxy ships a small Python addon that:
- Loads the per-bottle routes from `/etc/egress-proxy/routes.yaml`
(rendered by the prepare step, docker-cp'd in like cred-proxy's
current routes.json).
- On `request` hook: match `flow.request.host` → route. If no route
matches → forward unchanged (pipelock will hostname-gate it). If
route matches and has `path_allowlist`, check `flow.request.path`
against the prefix list; 403 with a clear reason if no match.
- On approved requests: strip inbound Authorization, inject
`Authorization: <auth_scheme> <token-from-env>` if `auth_scheme
!= "none"`.
- SIGHUP / file-mtime watch on `routes.yaml` for hot-reload (same
cadence as today's cred-proxy SIGHUP path).
mitmproxy's standard CA generation handles per-host leaf certs at
SNI time. The per-bottle CA is generated at bottle launch (was
pipelock's tls-init step; now egress-proxy's). Agent's trust store
gets the egress-proxy CA installed in place of pipelock's.
### Trust-domain concentration
The egress-proxy now holds:
- Every credential the bottle declared in `egress_proxy.routes[]`
(OAuth tokens, PATs, npm tokens).
- The per-bottle MITM CA private key.
This is a deliberate concentration. With the previous split:
- cred-proxy held tokens.
- pipelock held the CA.
A memory disclosure in cred-proxy exposed tokens; in pipelock,
the CA. Both were bad; neither exposed everything.
The new egress-proxy in one disclosure exposes both. Mitigations:
- mitmproxy runs as an unprivileged user inside the container.
- Tokens live in the container's environ (same as cred-proxy today).
The CA private key is mounted from the host's stage_dir (mode 600).
- Pipelock stays as a separate sidecar, so a compromise of
egress-proxy doesn't disable pipelock's hostname check + DLP on
the outbound leg — the attacker can forge certs to the agent but
can't easily exfil from inside the agent without pipelock
noticing.
The user (per PR #25 discussion) accepted this concentration in
exchange for the one-sidecar consolidation. The PRD records it
explicitly.
### Migration — hard cutover
No backward-compat alias for `bottle.cred_proxy.routes[]`. At
manifest load:
- `cred_proxy:` block → `die()` with a clear pointer at this PRD
and a migration recipe (rename to `egress_proxy:`, rename
`path``host`, drop the agent-side URL prefix).
- `cred_proxy_routes` field on existing dataclasses removed.
- `Dockerfile.cred-proxy` deleted.
- `claude_bottle/cred_proxy*.py` deleted.
- `claude_bottle/backend/docker/cred_proxy*.py` consolidated into
`egress_proxy*.py`.
- Provisioner files renamed.
- PRDs 0010 (cred-proxy), 0014 (cred-proxy-block remediation)
retroactively annotated as "superseded by 0017" — old text
preserved, header updated.
### Implementation chunks
Plausibly three implementation PRs after this PRD lands:
1. **egress-proxy sidecar core.** Dockerfile + mitmproxy addon +
`routes.yaml` schema + lifecycle (prepare / start / stop / SIGHUP).
2. **Manifest + provisioner migration.** Rename cred-proxy
throughout the codebase, hard-fail on legacy manifests, update
agent CA trust to point at egress-proxy.
3. **PRD 0014 retargeting.** cred-proxy-block remediation's apply
path repointed at egress-proxy (SIGHUP, audit log, etc.).
Supervise tool description updated.
## Open questions
- **mitmproxy addon distribution.** Mount the addon Python file
from stage_dir, or bake it into the image. Mount is more
hot-reloadable; bake-in is more reproducible. Recommend bake-in,
with routes.yaml as the only mounted state.
- **Path match semantics.** Prefix-only for v1 (matches PRD 0017
v1 spirit). Globs / regex are a follow-up if operators ask.
- **Mode for the `Authorization` strip on inbound.** Pipelock has a
similar strip in `sensitive_headers`. Confirm there's no
double-strip causing a real header the agent set to disappear
unexpectedly. Probably want egress-proxy to be the only stripper
for routes that match.
- **Pipelock's TLS interception post-cutover.** Today pipelock
MITMs the cred-proxy → upstream leg using its own CA. After the
cutover, that leg starts as a CONNECT tunnel from egress-proxy
(egress-proxy treats pipelock as a plain HTTPS forward proxy).
Does pipelock still need to MITM? Probably no — egress-proxy
already terminated, body content is already inspected upstream
by egress-proxy's addons (or could be). But that means moving
DLP from pipelock to egress-proxy, which expands egress-proxy's
trust-domain *further*. Punted to the implementation PR to
decide.
- **Performance.** Two MITM hops in the worst case (agent ↔
egress-proxy and pipelock ↔ upstream if pipelock keeps its
interception). Measure under realistic load; if it's a problem,
the answer is probably to disable pipelock's TLS interception
and let it operate at hostname-only.
- **Agent's existing dotfile rewrites.** Today cred-proxy
provisions ~/.npmrc with `registry=http://cred-proxy:9099/npm/`,
~/.gitconfig with `insteadOf` rules, etc. After the cutover
none of those rewrites are strictly necessary for routing
(HTTPS_PROXY catches everything), but they may still be useful
for canonicalisation (so the agent's `npm install` doesn't
surprise itself by talking to a different registry). Decide per
dotfile in the migration PR.
## References
- PRD 0010 — cred-proxy (superseded by this PRD).
- PRD 0014 — cred-proxy-block remediation (retargeted).
- PRD 0013 — supervise plane (tool descriptions updated).
- PR #25 — the supervise loop, whose `_apply_pipelock_url`
docstring flagged the original "path filtering belongs
somewhere" follow-up.
- mitmproxy — https://mitmproxy.org/ — chosen as the egress-proxy
engine because it's the canonical scriptable MITM forward proxy.
@@ -1,195 +0,0 @@
# PRD 0017: Path-aware egress filtering via cred-proxy
- **Status:** Draft
- **Author:** didericis
- **Created:** 2026-05-25
## Summary
Pipelock's `api_allowlist` is hostname-only — once a host is on the
list, every URL path at that host is reachable. For agents working
on shared platforms (github.com, gitlab.com, public registries),
this means approving access to one user's content also opens
access to every other user's content. Cred-proxy already
path-prefix-routes authenticated traffic; this PRD extends it to
filter (not just route) paths, including for unauthenticated hosts.
Per-bottle egress then has two complementary layers: pipelock for
hostname allow + DLP + body scanning, cred-proxy for path-level
allow on declared hosts.
## Problem
PR #25's pipelock-block tool delivers an honest but coarse experience:
the agent reports "I tried hitting `https://github.com/didericis`,
pipelock 403'd it"; the operator approves and the agent now has
access to all of github.com. The path in the proposal is captured
as context but not enforced (PR #25 documents this in
`_apply_pipelock_url`'s docstring).
The intended posture for many shared platforms is narrower than
hostname-level. "Allow the agent to read github.com/didericis but
not github.com/somebody-else" is a normal ask. Today the egress
stack can't express that, even though cred-proxy already has 80%
of the machinery: it path-routes authenticated traffic with
longest-prefix matching, and the manifest's `cred_proxy.routes[]`
shape is already a list of `(path, upstream, ...)` rules.
## Goals / Success Criteria
A bottle manifest can declare a cred-proxy route with a
`path_allowlist` and `auth_scheme: none`. Agents dialing
`http://cred-proxy:<port>/<route>/<suffix>` hit a 403 from
cred-proxy when `<suffix>` doesn't match any allowlist entry, and
a normal forward (no auth header injected) when it does. For
existing authenticated routes the addition is opt-in: a route
without `path_allowlist` keeps its current permissive behaviour.
Demonstrable behavior: a bottle manifest declares
`{path: "/github/", upstream: "https://github.com", auth_scheme: "none",
path_allowlist: ["/didericis/"]}`; the agent reaches
`http://cred-proxy:9099/github/didericis/some-repo` successfully,
gets a 403 on `http://cred-proxy:9099/github/someone-else/whatever`.
## Non-goals
- Replacing pipelock. Pipelock still does the hostname allowlist,
DLP body scanning, MCP / WebSocket inspection. Path filtering is
additive, sitting in front of pipelock for routes that opt in.
- Auto-routing arbitrary outbound HTTP through cred-proxy. The
agent's `HTTP_PROXY` stays pointed at pipelock; cred-proxy is
reached by explicit URL (with a `git-insteadof`-style rewrite
for the few protocol-level helpers that need it).
- Reworking pipelock-block. The PR #25 tool stays hostname-only;
whether a new path-aware proposal tool (or a richer
pipelock-block) is wanted is an open question for a follow-on
PRD.
- Live mutation of the running container or cred-proxy beyond
what cred-proxy SIGHUP already supports (PRD 0014).
## Scope
### In scope
- A new optional `auth_scheme: "none"` mode on cred-proxy routes
that suppresses Authorization injection while keeping path
routing + (new) path filtering.
- A new optional `path_allowlist: [<prefix>, ...]` field per
cred-proxy route. When present, cred-proxy 403s requests whose
in-route suffix doesn't match at least one prefix.
- Manifest schema + validation for the two new fields.
- Cred-proxy server logic: enforcement on each request after the
longest-prefix route match.
- SIGHUP reload picks up `path_allowlist` changes (no new sidecar
primitives — the existing reload path already re-reads
`routes.json`).
### Out of scope
- A new MCP tool for the agent to propose `path_allowlist`
additions. Today the operator manages this via the manifest +
the existing `routes edit <bottle>` TUI verb.
- Glob / regex matching. v1 ships prefix matching only; the open
question lays out the trade-offs.
- Auto-migrating PR #25's pipelock-block proposals into cred-proxy
routes. Manual operator decision per host.
- Provisioner-side dotfile changes for HTTPS-to-cred-proxy rewrites
on bottles that opt unauth'd hosts onto cred-proxy. Out of scope
for the engine work; the manifest can already encode it.
## Proposed Design
### Manifest schema additions
`bottle.cred_proxy.routes[]` gains two optional fields:
```yaml
cred_proxy:
routes:
- path: "/github/"
upstream: "https://github.com"
auth_scheme: "none" # new — no Authorization header
token_ref: "" # ignored when auth_scheme is "none"
path_allowlist: # new — prefix list; empty / absent = permissive
- "/didericis/"
- "/didericis-org/"
```
- `auth_scheme: "none"` joins the existing `Bearer` / `token` values.
When `none`, `token_ref` must be empty or absent and no
Authorization header is injected. The route still routes by path
prefix and forwards to upstream.
- `path_allowlist` is a list of suffix prefixes (matched after the
route's `path` is stripped). Empty / absent means permissive
(current behaviour). When non-empty, the suffix must start with
at least one of the allowlist entries.
### cred-proxy server changes
Per request:
1. Strip query string, longest-prefix-match against `routes`.
2. Compute the suffix = request_path[len(route.path):].
3. If `route.path_allowlist` is non-empty: require that
`"/" + suffix` (or just `suffix` — pick a consistent
normalization) starts with at least one allowlist entry. 403 if
not.
4. If `auth_scheme == "none"`: skip the `Authorization` header
step entirely; otherwise inject as today.
5. Forward upstream, stream response (unchanged).
The 403 body should name the route + the disallowed suffix so the
operator can diagnose. cred-proxy's existing log line at request
time picks up the new outcome too.
### Validation
At manifest load:
- `auth_scheme` must be one of `Bearer`, `token`, or `none`.
- When `auth_scheme == "none"`, `token_ref` is forbidden (clearer
error than silently ignoring).
- `path_allowlist` entries must start with `/` and end with `/`
(matching the existing convention for `route.path`).
- Duplicate prefixes are deduplicated with a warning, not an
error.
### Migration / backward compatibility
- Routes without `path_allowlist` behave exactly as today.
- Routes with `auth_scheme: Bearer | token` behave exactly as today.
- No existing manifests need editing; the new fields are opt-in.
## Open questions
- **Match semantics: prefix vs glob vs regex.** Prefix is simple
and matches the existing `route.path` convention. Glob (`/users/*/repos/`)
adds power but is easy to get wrong (does `*` match a `/`?).
Regex is the most powerful and the most footguny. Recommend
prefix-only for v1, glob in a follow-up if operators ask for it.
- **403 body shape.** Plain text vs JSON. Cred-proxy's existing
errors use plain text (`send_error(404, "no route for ...")`).
Match that.
- **Auth-less routes and TLS interception.** A `none`-auth route
still routes outbound HTTPS through pipelock (cred-proxy's
`HTTPS_PROXY` env), so pipelock's CA + body scanner still apply.
Confirm that pipelock's allowlist needs the upstream host in
this case — there's no token to make the cred-proxy → upstream
leg special. Likely yes, same as today.
- **MCP tool / pipelock-block evolution.** Once path filtering
exists, the operator may want a way for the agent to propose
path additions (e.g. "I need /didericis-org/ added to the
github route"). Today that requires manifest edit + cli.py
rebuild, or `routes edit` via the dashboard. Whether a new MCP
tool (or a richer pipelock-block) is wanted is a follow-on PRD
open question.
- **Allowlist semantics for the entire route prefix.** Should an
empty `path_allowlist: []` be allowed? Equivalent to "block
everything at this upstream" — possibly useful as a tombstone,
more likely a typo. Recommend treating empty list the same as
absent (permissive) and flagging in the validation note.
## References
- PRD 0010 — cred-proxy (the engine being extended).
- PRD 0015 — pipelock block remediation (whose hostname-only
ceiling motivates this PRD).
- PR #25`_apply_pipelock_url`'s docstring documents the
follow-up that this PRD formalises.