diff --git a/docs/prds/0017-path-aware-egress-via-cred-proxy.md b/docs/prds/0017-path-aware-egress-via-cred-proxy.md new file mode 100644 index 0000000..81101c8 --- /dev/null +++ b/docs/prds/0017-path-aware-egress-via-cred-proxy.md @@ -0,0 +1,195 @@ +# PRD 0017: Path-aware egress filtering via cred-proxy + +- **Status:** Draft +- **Author:** didericis +- **Created:** 2026-05-25 + +## Summary + +Pipelock's `api_allowlist` is hostname-only — once a host is on the +list, every URL path at that host is reachable. For agents working +on shared platforms (github.com, gitlab.com, public registries), +this means approving access to one user's content also opens +access to every other user's content. Cred-proxy already +path-prefix-routes authenticated traffic; this PRD extends it to +filter (not just route) paths, including for unauthenticated hosts. +Per-bottle egress then has two complementary layers: pipelock for +hostname allow + DLP + body scanning, cred-proxy for path-level +allow on declared hosts. + +## Problem + +PR #25's pipelock-block tool delivers an honest but coarse experience: +the agent reports "I tried hitting `https://github.com/didericis`, +pipelock 403'd it"; the operator approves and the agent now has +access to all of github.com. The path in the proposal is captured +as context but not enforced (PR #25 documents this in +`_apply_pipelock_url`'s docstring). + +The intended posture for many shared platforms is narrower than +hostname-level. "Allow the agent to read github.com/didericis but +not github.com/somebody-else" is a normal ask. Today the egress +stack can't express that, even though cred-proxy already has 80% +of the machinery: it path-routes authenticated traffic with +longest-prefix matching, and the manifest's `cred_proxy.routes[]` +shape is already a list of `(path, upstream, ...)` rules. + +## Goals / Success Criteria + +A bottle manifest can declare a cred-proxy route with a +`path_allowlist` and `auth_scheme: none`. Agents dialing +`http://cred-proxy://` hit a 403 from +cred-proxy when `` doesn't match any allowlist entry, and +a normal forward (no auth header injected) when it does. For +existing authenticated routes the addition is opt-in: a route +without `path_allowlist` keeps its current permissive behaviour. + +Demonstrable behavior: a bottle manifest declares +`{path: "/github/", upstream: "https://github.com", auth_scheme: "none", +path_allowlist: ["/didericis/"]}`; the agent reaches +`http://cred-proxy:9099/github/didericis/some-repo` successfully, +gets a 403 on `http://cred-proxy:9099/github/someone-else/whatever`. + +## Non-goals + +- Replacing pipelock. Pipelock still does the hostname allowlist, + DLP body scanning, MCP / WebSocket inspection. Path filtering is + additive, sitting in front of pipelock for routes that opt in. +- Auto-routing arbitrary outbound HTTP through cred-proxy. The + agent's `HTTP_PROXY` stays pointed at pipelock; cred-proxy is + reached by explicit URL (with a `git-insteadof`-style rewrite + for the few protocol-level helpers that need it). +- Reworking pipelock-block. The PR #25 tool stays hostname-only; + whether a new path-aware proposal tool (or a richer + pipelock-block) is wanted is an open question for a follow-on + PRD. +- Live mutation of the running container or cred-proxy beyond + what cred-proxy SIGHUP already supports (PRD 0014). + +## Scope + +### In scope + +- A new optional `auth_scheme: "none"` mode on cred-proxy routes + that suppresses Authorization injection while keeping path + routing + (new) path filtering. +- A new optional `path_allowlist: [, ...]` field per + cred-proxy route. When present, cred-proxy 403s requests whose + in-route suffix doesn't match at least one prefix. +- Manifest schema + validation for the two new fields. +- Cred-proxy server logic: enforcement on each request after the + longest-prefix route match. +- SIGHUP reload picks up `path_allowlist` changes (no new sidecar + primitives — the existing reload path already re-reads + `routes.json`). + +### Out of scope + +- A new MCP tool for the agent to propose `path_allowlist` + additions. Today the operator manages this via the manifest + + the existing `routes edit ` TUI verb. +- Glob / regex matching. v1 ships prefix matching only; the open + question lays out the trade-offs. +- Auto-migrating PR #25's pipelock-block proposals into cred-proxy + routes. Manual operator decision per host. +- Provisioner-side dotfile changes for HTTPS-to-cred-proxy rewrites + on bottles that opt unauth'd hosts onto cred-proxy. Out of scope + for the engine work; the manifest can already encode it. + +## Proposed Design + +### Manifest schema additions + +`bottle.cred_proxy.routes[]` gains two optional fields: + +```yaml +cred_proxy: + routes: + - path: "/github/" + upstream: "https://github.com" + auth_scheme: "none" # new — no Authorization header + token_ref: "" # ignored when auth_scheme is "none" + path_allowlist: # new — prefix list; empty / absent = permissive + - "/didericis/" + - "/didericis-org/" +``` + +- `auth_scheme: "none"` joins the existing `Bearer` / `token` values. + When `none`, `token_ref` must be empty or absent and no + Authorization header is injected. The route still routes by path + prefix and forwards to upstream. +- `path_allowlist` is a list of suffix prefixes (matched after the + route's `path` is stripped). Empty / absent means permissive + (current behaviour). When non-empty, the suffix must start with + at least one of the allowlist entries. + +### cred-proxy server changes + +Per request: +1. Strip query string, longest-prefix-match against `routes`. +2. Compute the suffix = request_path[len(route.path):]. +3. If `route.path_allowlist` is non-empty: require that + `"/" + suffix` (or just `suffix` — pick a consistent + normalization) starts with at least one allowlist entry. 403 if + not. +4. If `auth_scheme == "none"`: skip the `Authorization` header + step entirely; otherwise inject as today. +5. Forward upstream, stream response (unchanged). + +The 403 body should name the route + the disallowed suffix so the +operator can diagnose. cred-proxy's existing log line at request +time picks up the new outcome too. + +### Validation + +At manifest load: +- `auth_scheme` must be one of `Bearer`, `token`, or `none`. +- When `auth_scheme == "none"`, `token_ref` is forbidden (clearer + error than silently ignoring). +- `path_allowlist` entries must start with `/` and end with `/` + (matching the existing convention for `route.path`). +- Duplicate prefixes are deduplicated with a warning, not an + error. + +### Migration / backward compatibility + +- Routes without `path_allowlist` behave exactly as today. +- Routes with `auth_scheme: Bearer | token` behave exactly as today. +- No existing manifests need editing; the new fields are opt-in. + +## Open questions + +- **Match semantics: prefix vs glob vs regex.** Prefix is simple + and matches the existing `route.path` convention. Glob (`/users/*/repos/`) + adds power but is easy to get wrong (does `*` match a `/`?). + Regex is the most powerful and the most footguny. Recommend + prefix-only for v1, glob in a follow-up if operators ask for it. +- **403 body shape.** Plain text vs JSON. Cred-proxy's existing + errors use plain text (`send_error(404, "no route for ...")`). + Match that. +- **Auth-less routes and TLS interception.** A `none`-auth route + still routes outbound HTTPS through pipelock (cred-proxy's + `HTTPS_PROXY` env), so pipelock's CA + body scanner still apply. + Confirm that pipelock's allowlist needs the upstream host in + this case — there's no token to make the cred-proxy → upstream + leg special. Likely yes, same as today. +- **MCP tool / pipelock-block evolution.** Once path filtering + exists, the operator may want a way for the agent to propose + path additions (e.g. "I need /didericis-org/ added to the + github route"). Today that requires manifest edit + cli.py + rebuild, or `routes edit` via the dashboard. Whether a new MCP + tool (or a richer pipelock-block) is wanted is a follow-on PRD + open question. +- **Allowlist semantics for the entire route prefix.** Should an + empty `path_allowlist: []` be allowed? Equivalent to "block + everything at this upstream" — possibly useful as a tombstone, + more likely a typo. Recommend treating empty list the same as + absent (permissive) and flagging in the validation note. + +## References + +- PRD 0010 — cred-proxy (the engine being extended). +- PRD 0015 — pipelock block remediation (whose hostname-only + ceiling motivates this PRD). +- PR #25 — `_apply_pipelock_url`'s docstring documents the + follow-up that this PRD formalises.