# PRD 0017: Path-aware egress filtering via cred-proxy - **Status:** Draft - **Author:** didericis - **Created:** 2026-05-25 ## Summary Pipelock's `api_allowlist` is hostname-only — once a host is on the list, every URL path at that host is reachable. For agents working on shared platforms (github.com, gitlab.com, public registries), this means approving access to one user's content also opens access to every other user's content. Cred-proxy already path-prefix-routes authenticated traffic; this PRD extends it to filter (not just route) paths, including for unauthenticated hosts. Per-bottle egress then has two complementary layers: pipelock for hostname allow + DLP + body scanning, cred-proxy for path-level allow on declared hosts. ## Problem PR #25's pipelock-block tool delivers an honest but coarse experience: the agent reports "I tried hitting `https://github.com/didericis`, pipelock 403'd it"; the operator approves and the agent now has access to all of github.com. The path in the proposal is captured as context but not enforced (PR #25 documents this in `_apply_pipelock_url`'s docstring). The intended posture for many shared platforms is narrower than hostname-level. "Allow the agent to read github.com/didericis but not github.com/somebody-else" is a normal ask. Today the egress stack can't express that, even though cred-proxy already has 80% of the machinery: it path-routes authenticated traffic with longest-prefix matching, and the manifest's `cred_proxy.routes[]` shape is already a list of `(path, upstream, ...)` rules. ## Goals / Success Criteria A bottle manifest can declare a cred-proxy route with a `path_allowlist` and `auth_scheme: none`. Agents dialing `http://cred-proxy://` hit a 403 from cred-proxy when `` doesn't match any allowlist entry, and a normal forward (no auth header injected) when it does. For existing authenticated routes the addition is opt-in: a route without `path_allowlist` keeps its current permissive behaviour. Demonstrable behavior: a bottle manifest declares `{path: "/github/", upstream: "https://github.com", auth_scheme: "none", path_allowlist: ["/didericis/"]}`; the agent reaches `http://cred-proxy:9099/github/didericis/some-repo` successfully, gets a 403 on `http://cred-proxy:9099/github/someone-else/whatever`. ## Non-goals - Replacing pipelock. Pipelock still does the hostname allowlist, DLP body scanning, MCP / WebSocket inspection. Path filtering is additive, sitting in front of pipelock for routes that opt in. - Auto-routing arbitrary outbound HTTP through cred-proxy. The agent's `HTTP_PROXY` stays pointed at pipelock; cred-proxy is reached by explicit URL (with a `git-insteadof`-style rewrite for the few protocol-level helpers that need it). - Reworking pipelock-block. The PR #25 tool stays hostname-only; whether a new path-aware proposal tool (or a richer pipelock-block) is wanted is an open question for a follow-on PRD. - Live mutation of the running container or cred-proxy beyond what cred-proxy SIGHUP already supports (PRD 0014). ## Scope ### In scope - A new optional `auth_scheme: "none"` mode on cred-proxy routes that suppresses Authorization injection while keeping path routing + (new) path filtering. - A new optional `path_allowlist: [, ...]` field per cred-proxy route. When present, cred-proxy 403s requests whose in-route suffix doesn't match at least one prefix. - Manifest schema + validation for the two new fields. - Cred-proxy server logic: enforcement on each request after the longest-prefix route match. - SIGHUP reload picks up `path_allowlist` changes (no new sidecar primitives — the existing reload path already re-reads `routes.json`). ### Out of scope - A new MCP tool for the agent to propose `path_allowlist` additions. Today the operator manages this via the manifest + the existing `routes edit ` TUI verb. - Glob / regex matching. v1 ships prefix matching only; the open question lays out the trade-offs. - Auto-migrating PR #25's pipelock-block proposals into cred-proxy routes. Manual operator decision per host. - Provisioner-side dotfile changes for HTTPS-to-cred-proxy rewrites on bottles that opt unauth'd hosts onto cred-proxy. Out of scope for the engine work; the manifest can already encode it. ## Proposed Design ### Manifest schema additions `bottle.cred_proxy.routes[]` gains two optional fields: ```yaml cred_proxy: routes: - path: "/github/" upstream: "https://github.com" auth_scheme: "none" # new — no Authorization header token_ref: "" # ignored when auth_scheme is "none" path_allowlist: # new — prefix list; empty / absent = permissive - "/didericis/" - "/didericis-org/" ``` - `auth_scheme: "none"` joins the existing `Bearer` / `token` values. When `none`, `token_ref` must be empty or absent and no Authorization header is injected. The route still routes by path prefix and forwards to upstream. - `path_allowlist` is a list of suffix prefixes (matched after the route's `path` is stripped). Empty / absent means permissive (current behaviour). When non-empty, the suffix must start with at least one of the allowlist entries. ### cred-proxy server changes Per request: 1. Strip query string, longest-prefix-match against `routes`. 2. Compute the suffix = request_path[len(route.path):]. 3. If `route.path_allowlist` is non-empty: require that `"/" + suffix` (or just `suffix` — pick a consistent normalization) starts with at least one allowlist entry. 403 if not. 4. If `auth_scheme == "none"`: skip the `Authorization` header step entirely; otherwise inject as today. 5. Forward upstream, stream response (unchanged). The 403 body should name the route + the disallowed suffix so the operator can diagnose. cred-proxy's existing log line at request time picks up the new outcome too. ### Validation At manifest load: - `auth_scheme` must be one of `Bearer`, `token`, or `none`. - When `auth_scheme == "none"`, `token_ref` is forbidden (clearer error than silently ignoring). - `path_allowlist` entries must start with `/` and end with `/` (matching the existing convention for `route.path`). - Duplicate prefixes are deduplicated with a warning, not an error. ### Migration / backward compatibility - Routes without `path_allowlist` behave exactly as today. - Routes with `auth_scheme: Bearer | token` behave exactly as today. - No existing manifests need editing; the new fields are opt-in. ## Open questions - **Match semantics: prefix vs glob vs regex.** Prefix is simple and matches the existing `route.path` convention. Glob (`/users/*/repos/`) adds power but is easy to get wrong (does `*` match a `/`?). Regex is the most powerful and the most footguny. Recommend prefix-only for v1, glob in a follow-up if operators ask for it. - **403 body shape.** Plain text vs JSON. Cred-proxy's existing errors use plain text (`send_error(404, "no route for ...")`). Match that. - **Auth-less routes and TLS interception.** A `none`-auth route still routes outbound HTTPS through pipelock (cred-proxy's `HTTPS_PROXY` env), so pipelock's CA + body scanner still apply. Confirm that pipelock's allowlist needs the upstream host in this case — there's no token to make the cred-proxy → upstream leg special. Likely yes, same as today. - **MCP tool / pipelock-block evolution.** Once path filtering exists, the operator may want a way for the agent to propose path additions (e.g. "I need /didericis-org/ added to the github route"). Today that requires manifest edit + cli.py rebuild, or `routes edit` via the dashboard. Whether a new MCP tool (or a richer pipelock-block) is wanted is a follow-on PRD open question. - **Allowlist semantics for the entire route prefix.** Should an empty `path_allowlist: []` be allowed? Equivalent to "block everything at this upstream" — possibly useful as a tombstone, more likely a typo. Recommend treating empty list the same as absent (permissive) and flagging in the validation note. ## References - PRD 0010 — cred-proxy (the engine being extended). - PRD 0015 — pipelock block remediation (whose hostname-only ceiling motivates this PRD). - PR #25 — `_apply_pipelock_url`'s docstring documents the follow-up that this PRD formalises.