docs(prd-0017): path-aware egress filtering via cred-proxy

Extends cred-proxy to filter (not just route) paths, including for unauthenticated upstreams via a new `auth_scheme: "none"` mode and `path_allowlist` field per route. Pipelock keeps its hostname allowlist + DLP role; cred-proxy adds path-level enforcement for routes that opt in. Motivated by PR #25's follow-up note in _apply_pipelock_url: pipelock 2.3.0's api_allowlist is hostname-only, so approving pipelock-block opens the entire host. For shared platforms (github.com, gitlab.com, public registries) operators usually want narrower-than-host granularity. Draft status; open questions on match semantics, allow-route-with- empty-allowlist edge case, and the eventual MCP tool shape for agent-proposed path additions. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 08:33:01 -04:00
parent 0668c7bb45
commit 5b925a6699
1 changed files with 195 additions and 0 deletions
@@ -0,0 +1,195 @@
+# PRD 0017: Path-aware egress filtering via cred-proxy
+
+- **Status:** Draft
+- **Author:** didericis
+- **Created:** 2026-05-25
+
+## Summary
+
+Pipelock's `api_allowlist` is hostname-only — once a host is on the
+list, every URL path at that host is reachable. For agents working
+on shared platforms (github.com, gitlab.com, public registries),
+this means approving access to one user's content also opens
+access to every other user's content. Cred-proxy already
+path-prefix-routes authenticated traffic; this PRD extends it to
+filter (not just route) paths, including for unauthenticated hosts.
+Per-bottle egress then has two complementary layers: pipelock for
+hostname allow + DLP + body scanning, cred-proxy for path-level
+allow on declared hosts.
+
+## Problem
+
+PR #25's pipelock-block tool delivers an honest but coarse experience:
+the agent reports "I tried hitting `https://github.com/didericis`,
+pipelock 403'd it"; the operator approves and the agent now has
+access to all of github.com. The path in the proposal is captured
+as context but not enforced (PR #25 documents this in
+`_apply_pipelock_url`'s docstring).
+
+The intended posture for many shared platforms is narrower than
+hostname-level. "Allow the agent to read github.com/didericis but
+not github.com/somebody-else" is a normal ask. Today the egress
+stack can't express that, even though cred-proxy already has 80%
+of the machinery: it path-routes authenticated traffic with
+longest-prefix matching, and the manifest's `cred_proxy.routes[]`
+shape is already a list of `(path, upstream, ...)` rules.
+
+## Goals / Success Criteria
+
+A bottle manifest can declare a cred-proxy route with a
+`path_allowlist` and `auth_scheme: none`. Agents dialing
+`http://cred-proxy:<port>/<route>/<suffix>` hit a 403 from
+cred-proxy when `<suffix>` doesn't match any allowlist entry, and
+a normal forward (no auth header injected) when it does. For
+existing authenticated routes the addition is opt-in: a route
+without `path_allowlist` keeps its current permissive behaviour.
+
+Demonstrable behavior: a bottle manifest declares
+`{path: "/github/", upstream: "https://github.com", auth_scheme: "none",
+path_allowlist: ["/didericis/"]}`; the agent reaches
+`http://cred-proxy:9099/github/didericis/some-repo` successfully,
+gets a 403 on `http://cred-proxy:9099/github/someone-else/whatever`.
+
+## Non-goals
+
+- Replacing pipelock. Pipelock still does the hostname allowlist,
+  DLP body scanning, MCP / WebSocket inspection. Path filtering is
+  additive, sitting in front of pipelock for routes that opt in.
+- Auto-routing arbitrary outbound HTTP through cred-proxy. The
+  agent's `HTTP_PROXY` stays pointed at pipelock; cred-proxy is
+  reached by explicit URL (with a `git-insteadof`-style rewrite
+  for the few protocol-level helpers that need it).
+- Reworking pipelock-block. The PR #25 tool stays hostname-only;
+  whether a new path-aware proposal tool (or a richer
+  pipelock-block) is wanted is an open question for a follow-on
+  PRD.
+- Live mutation of the running container or cred-proxy beyond
+  what cred-proxy SIGHUP already supports (PRD 0014).
+
+## Scope
+
+### In scope
+
+- A new optional `auth_scheme: "none"` mode on cred-proxy routes
+  that suppresses Authorization injection while keeping path
+  routing + (new) path filtering.
+- A new optional `path_allowlist: [<prefix>, ...]` field per
+  cred-proxy route. When present, cred-proxy 403s requests whose
+  in-route suffix doesn't match at least one prefix.
+- Manifest schema + validation for the two new fields.
+- Cred-proxy server logic: enforcement on each request after the
+  longest-prefix route match.
+- SIGHUP reload picks up `path_allowlist` changes (no new sidecar
+  primitives — the existing reload path already re-reads
+  `routes.json`).
+
+### Out of scope
+
+- A new MCP tool for the agent to propose `path_allowlist`
+  additions. Today the operator manages this via the manifest +
+  the existing `routes edit <bottle>` TUI verb.
+- Glob / regex matching. v1 ships prefix matching only; the open
+  question lays out the trade-offs.
+- Auto-migrating PR #25's pipelock-block proposals into cred-proxy
+  routes. Manual operator decision per host.
+- Provisioner-side dotfile changes for HTTPS-to-cred-proxy rewrites
+  on bottles that opt unauth'd hosts onto cred-proxy. Out of scope
+  for the engine work; the manifest can already encode it.
+
+## Proposed Design
+
+### Manifest schema additions
+
+`bottle.cred_proxy.routes[]` gains two optional fields:
+
+```yaml
+cred_proxy:
+  routes:
+    - path: "/github/"
+      upstream: "https://github.com"
+      auth_scheme: "none"         # new — no Authorization header
+      token_ref: ""               # ignored when auth_scheme is "none"
+      path_allowlist:             # new — prefix list; empty / absent = permissive
+        - "/didericis/"
+        - "/didericis-org/"
+```
+
+- `auth_scheme: "none"` joins the existing `Bearer` / `token` values.
+  When `none`, `token_ref` must be empty or absent and no
+  Authorization header is injected. The route still routes by path
+  prefix and forwards to upstream.
+- `path_allowlist` is a list of suffix prefixes (matched after the
+  route's `path` is stripped). Empty / absent means permissive
+  (current behaviour). When non-empty, the suffix must start with
+  at least one of the allowlist entries.
+
+### cred-proxy server changes
+
+Per request:
+1. Strip query string, longest-prefix-match against `routes`.
+2. Compute the suffix = request_path[len(route.path):].
+3. If `route.path_allowlist` is non-empty: require that
+   `"/" + suffix` (or just `suffix` — pick a consistent
+   normalization) starts with at least one allowlist entry. 403 if
+   not.
+4. If `auth_scheme == "none"`: skip the `Authorization` header
+   step entirely; otherwise inject as today.
+5. Forward upstream, stream response (unchanged).
+
+The 403 body should name the route + the disallowed suffix so the
+operator can diagnose. cred-proxy's existing log line at request
+time picks up the new outcome too.
+
+### Validation
+
+At manifest load:
+- `auth_scheme` must be one of `Bearer`, `token`, or `none`.
+- When `auth_scheme == "none"`, `token_ref` is forbidden (clearer
+  error than silently ignoring).
+- `path_allowlist` entries must start with `/` and end with `/`
+  (matching the existing convention for `route.path`).
+- Duplicate prefixes are deduplicated with a warning, not an
+  error.
+
+### Migration / backward compatibility
+
+- Routes without `path_allowlist` behave exactly as today.
+- Routes with `auth_scheme: Bearer | token` behave exactly as today.
+- No existing manifests need editing; the new fields are opt-in.
+
+## Open questions
+
+- **Match semantics: prefix vs glob vs regex.** Prefix is simple
+  and matches the existing `route.path` convention. Glob (`/users/*/repos/`)
+  adds power but is easy to get wrong (does `*` match a `/`?).
+  Regex is the most powerful and the most footguny. Recommend
+  prefix-only for v1, glob in a follow-up if operators ask for it.
+- **403 body shape.** Plain text vs JSON. Cred-proxy's existing
+  errors use plain text (`send_error(404, "no route for ...")`).
+  Match that.
+- **Auth-less routes and TLS interception.** A `none`-auth route
+  still routes outbound HTTPS through pipelock (cred-proxy's
+  `HTTPS_PROXY` env), so pipelock's CA + body scanner still apply.
+  Confirm that pipelock's allowlist needs the upstream host in
+  this case — there's no token to make the cred-proxy → upstream
+  leg special. Likely yes, same as today.
+- **MCP tool / pipelock-block evolution.** Once path filtering
+  exists, the operator may want a way for the agent to propose
+  path additions (e.g. "I need /didericis-org/ added to the
+  github route"). Today that requires manifest edit + cli.py
+  rebuild, or `routes edit` via the dashboard. Whether a new MCP
+  tool (or a richer pipelock-block) is wanted is a follow-on PRD
+  open question.
+- **Allowlist semantics for the entire route prefix.** Should an
+  empty `path_allowlist: []` be allowed? Equivalent to "block
+  everything at this upstream" — possibly useful as a tombstone,
+  more likely a typo. Recommend treating empty list the same as
+  absent (permissive) and flagging in the validation note.
+
+## References
+
+- PRD 0010 — cred-proxy (the engine being extended).
+- PRD 0015 — pipelock block remediation (whose hostname-only
+  ceiling motivates this PRD).
+- PR #25 — `_apply_pipelock_url`'s docstring documents the
+  follow-up that this PRD formalises.