PRD 0017: Egress-proxy — universal MITM via mitmproxy (replaces cred-proxy) #27
@@ -0,0 +1,195 @@
|
|||||||
|
# PRD 0017: Path-aware egress filtering via cred-proxy
|
||||||
|
|
||||||
|
- **Status:** Draft
|
||||||
|
- **Author:** didericis
|
||||||
|
- **Created:** 2026-05-25
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
Pipelock's `api_allowlist` is hostname-only — once a host is on the
|
||||||
|
list, every URL path at that host is reachable. For agents working
|
||||||
|
on shared platforms (github.com, gitlab.com, public registries),
|
||||||
|
this means approving access to one user's content also opens
|
||||||
|
access to every other user's content. Cred-proxy already
|
||||||
|
path-prefix-routes authenticated traffic; this PRD extends it to
|
||||||
|
filter (not just route) paths, including for unauthenticated hosts.
|
||||||
|
Per-bottle egress then has two complementary layers: pipelock for
|
||||||
|
hostname allow + DLP + body scanning, cred-proxy for path-level
|
||||||
|
allow on declared hosts.
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
PR #25's pipelock-block tool delivers an honest but coarse experience:
|
||||||
|
the agent reports "I tried hitting `https://github.com/didericis`,
|
||||||
|
pipelock 403'd it"; the operator approves and the agent now has
|
||||||
|
access to all of github.com. The path in the proposal is captured
|
||||||
|
as context but not enforced (PR #25 documents this in
|
||||||
|
`_apply_pipelock_url`'s docstring).
|
||||||
|
|
||||||
|
The intended posture for many shared platforms is narrower than
|
||||||
|
hostname-level. "Allow the agent to read github.com/didericis but
|
||||||
|
not github.com/somebody-else" is a normal ask. Today the egress
|
||||||
|
stack can't express that, even though cred-proxy already has 80%
|
||||||
|
of the machinery: it path-routes authenticated traffic with
|
||||||
|
longest-prefix matching, and the manifest's `cred_proxy.routes[]`
|
||||||
|
shape is already a list of `(path, upstream, ...)` rules.
|
||||||
|
|
||||||
|
## Goals / Success Criteria
|
||||||
|
|
||||||
|
A bottle manifest can declare a cred-proxy route with a
|
||||||
|
`path_allowlist` and `auth_scheme: none`. Agents dialing
|
||||||
|
`http://cred-proxy:<port>/<route>/<suffix>` hit a 403 from
|
||||||
|
cred-proxy when `<suffix>` doesn't match any allowlist entry, and
|
||||||
|
a normal forward (no auth header injected) when it does. For
|
||||||
|
existing authenticated routes the addition is opt-in: a route
|
||||||
|
without `path_allowlist` keeps its current permissive behaviour.
|
||||||
|
|
||||||
|
Demonstrable behavior: a bottle manifest declares
|
||||||
|
`{path: "/github/", upstream: "https://github.com", auth_scheme: "none",
|
||||||
|
path_allowlist: ["/didericis/"]}`; the agent reaches
|
||||||
|
`http://cred-proxy:9099/github/didericis/some-repo` successfully,
|
||||||
|
gets a 403 on `http://cred-proxy:9099/github/someone-else/whatever`.
|
||||||
|
|
||||||
|
## Non-goals
|
||||||
|
|
||||||
|
- Replacing pipelock. Pipelock still does the hostname allowlist,
|
||||||
|
DLP body scanning, MCP / WebSocket inspection. Path filtering is
|
||||||
|
additive, sitting in front of pipelock for routes that opt in.
|
||||||
|
- Auto-routing arbitrary outbound HTTP through cred-proxy. The
|
||||||
|
agent's `HTTP_PROXY` stays pointed at pipelock; cred-proxy is
|
||||||
|
reached by explicit URL (with a `git-insteadof`-style rewrite
|
||||||
|
for the few protocol-level helpers that need it).
|
||||||
|
- Reworking pipelock-block. The PR #25 tool stays hostname-only;
|
||||||
|
whether a new path-aware proposal tool (or a richer
|
||||||
|
pipelock-block) is wanted is an open question for a follow-on
|
||||||
|
PRD.
|
||||||
|
- Live mutation of the running container or cred-proxy beyond
|
||||||
|
what cred-proxy SIGHUP already supports (PRD 0014).
|
||||||
|
|
||||||
|
## Scope
|
||||||
|
|
||||||
|
### In scope
|
||||||
|
|
||||||
|
- A new optional `auth_scheme: "none"` mode on cred-proxy routes
|
||||||
|
that suppresses Authorization injection while keeping path
|
||||||
|
routing + (new) path filtering.
|
||||||
|
- A new optional `path_allowlist: [<prefix>, ...]` field per
|
||||||
|
cred-proxy route. When present, cred-proxy 403s requests whose
|
||||||
|
in-route suffix doesn't match at least one prefix.
|
||||||
|
- Manifest schema + validation for the two new fields.
|
||||||
|
- Cred-proxy server logic: enforcement on each request after the
|
||||||
|
longest-prefix route match.
|
||||||
|
- SIGHUP reload picks up `path_allowlist` changes (no new sidecar
|
||||||
|
primitives — the existing reload path already re-reads
|
||||||
|
`routes.json`).
|
||||||
|
|
||||||
|
### Out of scope
|
||||||
|
|
||||||
|
- A new MCP tool for the agent to propose `path_allowlist`
|
||||||
|
additions. Today the operator manages this via the manifest +
|
||||||
|
the existing `routes edit <bottle>` TUI verb.
|
||||||
|
- Glob / regex matching. v1 ships prefix matching only; the open
|
||||||
|
question lays out the trade-offs.
|
||||||
|
- Auto-migrating PR #25's pipelock-block proposals into cred-proxy
|
||||||
|
routes. Manual operator decision per host.
|
||||||
|
- Provisioner-side dotfile changes for HTTPS-to-cred-proxy rewrites
|
||||||
|
on bottles that opt unauth'd hosts onto cred-proxy. Out of scope
|
||||||
|
for the engine work; the manifest can already encode it.
|
||||||
|
|
||||||
|
## Proposed Design
|
||||||
|
|
||||||
|
### Manifest schema additions
|
||||||
|
|
||||||
|
`bottle.cred_proxy.routes[]` gains two optional fields:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
cred_proxy:
|
||||||
|
routes:
|
||||||
|
- path: "/github/"
|
||||||
|
upstream: "https://github.com"
|
||||||
|
auth_scheme: "none" # new — no Authorization header
|
||||||
|
token_ref: "" # ignored when auth_scheme is "none"
|
||||||
|
path_allowlist: # new — prefix list; empty / absent = permissive
|
||||||
|
- "/didericis/"
|
||||||
|
- "/didericis-org/"
|
||||||
|
```
|
||||||
|
|
||||||
|
- `auth_scheme: "none"` joins the existing `Bearer` / `token` values.
|
||||||
|
When `none`, `token_ref` must be empty or absent and no
|
||||||
|
Authorization header is injected. The route still routes by path
|
||||||
|
prefix and forwards to upstream.
|
||||||
|
- `path_allowlist` is a list of suffix prefixes (matched after the
|
||||||
|
route's `path` is stripped). Empty / absent means permissive
|
||||||
|
(current behaviour). When non-empty, the suffix must start with
|
||||||
|
at least one of the allowlist entries.
|
||||||
|
|
||||||
|
### cred-proxy server changes
|
||||||
|
|
||||||
|
Per request:
|
||||||
|
1. Strip query string, longest-prefix-match against `routes`.
|
||||||
|
2. Compute the suffix = request_path[len(route.path):].
|
||||||
|
3. If `route.path_allowlist` is non-empty: require that
|
||||||
|
`"/" + suffix` (or just `suffix` — pick a consistent
|
||||||
|
normalization) starts with at least one allowlist entry. 403 if
|
||||||
|
not.
|
||||||
|
4. If `auth_scheme == "none"`: skip the `Authorization` header
|
||||||
|
step entirely; otherwise inject as today.
|
||||||
|
5. Forward upstream, stream response (unchanged).
|
||||||
|
|
||||||
|
The 403 body should name the route + the disallowed suffix so the
|
||||||
|
operator can diagnose. cred-proxy's existing log line at request
|
||||||
|
time picks up the new outcome too.
|
||||||
|
|
||||||
|
### Validation
|
||||||
|
|
||||||
|
At manifest load:
|
||||||
|
- `auth_scheme` must be one of `Bearer`, `token`, or `none`.
|
||||||
|
- When `auth_scheme == "none"`, `token_ref` is forbidden (clearer
|
||||||
|
error than silently ignoring).
|
||||||
|
- `path_allowlist` entries must start with `/` and end with `/`
|
||||||
|
(matching the existing convention for `route.path`).
|
||||||
|
- Duplicate prefixes are deduplicated with a warning, not an
|
||||||
|
error.
|
||||||
|
|
||||||
|
### Migration / backward compatibility
|
||||||
|
|
||||||
|
- Routes without `path_allowlist` behave exactly as today.
|
||||||
|
- Routes with `auth_scheme: Bearer | token` behave exactly as today.
|
||||||
|
- No existing manifests need editing; the new fields are opt-in.
|
||||||
|
|
||||||
|
## Open questions
|
||||||
|
|
||||||
|
- **Match semantics: prefix vs glob vs regex.** Prefix is simple
|
||||||
|
and matches the existing `route.path` convention. Glob (`/users/*/repos/`)
|
||||||
|
adds power but is easy to get wrong (does `*` match a `/`?).
|
||||||
|
Regex is the most powerful and the most footguny. Recommend
|
||||||
|
prefix-only for v1, glob in a follow-up if operators ask for it.
|
||||||
|
- **403 body shape.** Plain text vs JSON. Cred-proxy's existing
|
||||||
|
errors use plain text (`send_error(404, "no route for ...")`).
|
||||||
|
Match that.
|
||||||
|
- **Auth-less routes and TLS interception.** A `none`-auth route
|
||||||
|
still routes outbound HTTPS through pipelock (cred-proxy's
|
||||||
|
`HTTPS_PROXY` env), so pipelock's CA + body scanner still apply.
|
||||||
|
Confirm that pipelock's allowlist needs the upstream host in
|
||||||
|
this case — there's no token to make the cred-proxy → upstream
|
||||||
|
leg special. Likely yes, same as today.
|
||||||
|
- **MCP tool / pipelock-block evolution.** Once path filtering
|
||||||
|
exists, the operator may want a way for the agent to propose
|
||||||
|
path additions (e.g. "I need /didericis-org/ added to the
|
||||||
|
github route"). Today that requires manifest edit + cli.py
|
||||||
|
rebuild, or `routes edit` via the dashboard. Whether a new MCP
|
||||||
|
tool (or a richer pipelock-block) is wanted is a follow-on PRD
|
||||||
|
open question.
|
||||||
|
- **Allowlist semantics for the entire route prefix.** Should an
|
||||||
|
empty `path_allowlist: []` be allowed? Equivalent to "block
|
||||||
|
everything at this upstream" — possibly useful as a tombstone,
|
||||||
|
more likely a typo. Recommend treating empty list the same as
|
||||||
|
absent (permissive) and flagging in the validation note.
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- PRD 0010 — cred-proxy (the engine being extended).
|
||||||
|
- PRD 0015 — pipelock block remediation (whose hostname-only
|
||||||
|
ceiling motivates this PRD).
|
||||||
|
- PR #25 — `_apply_pipelock_url`'s docstring documents the
|
||||||
|
follow-up that this PRD formalises.
|
||||||
Reference in New Issue
Block a user