docs(prd-0017): path-aware egress filtering via cred-proxy
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m34s

Extends cred-proxy to filter (not just route) paths, including for
unauthenticated upstreams via a new `auth_scheme: "none"` mode and
`path_allowlist` field per route. Pipelock keeps its hostname
allowlist + DLP role; cred-proxy adds path-level enforcement for
routes that opt in.

Motivated by PR #25's follow-up note in _apply_pipelock_url: pipelock
2.3.0's api_allowlist is hostname-only, so approving pipelock-block
opens the entire host. For shared platforms (github.com, gitlab.com,
public registries) operators usually want narrower-than-host
granularity.

Draft status; open questions on match semantics, allow-route-with-
empty-allowlist edge case, and the eventual MCP tool shape for
agent-proposed path additions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-25 08:33:01 -04:00
parent 0668c7bb45
commit 5b925a6699
@@ -0,0 +1,195 @@
# PRD 0017: Path-aware egress filtering via cred-proxy
- **Status:** Draft
- **Author:** didericis
- **Created:** 2026-05-25
## Summary
Pipelock's `api_allowlist` is hostname-only — once a host is on the
list, every URL path at that host is reachable. For agents working
on shared platforms (github.com, gitlab.com, public registries),
this means approving access to one user's content also opens
access to every other user's content. Cred-proxy already
path-prefix-routes authenticated traffic; this PRD extends it to
filter (not just route) paths, including for unauthenticated hosts.
Per-bottle egress then has two complementary layers: pipelock for
hostname allow + DLP + body scanning, cred-proxy for path-level
allow on declared hosts.
## Problem
PR #25's pipelock-block tool delivers an honest but coarse experience:
the agent reports "I tried hitting `https://github.com/didericis`,
pipelock 403'd it"; the operator approves and the agent now has
access to all of github.com. The path in the proposal is captured
as context but not enforced (PR #25 documents this in
`_apply_pipelock_url`'s docstring).
The intended posture for many shared platforms is narrower than
hostname-level. "Allow the agent to read github.com/didericis but
not github.com/somebody-else" is a normal ask. Today the egress
stack can't express that, even though cred-proxy already has 80%
of the machinery: it path-routes authenticated traffic with
longest-prefix matching, and the manifest's `cred_proxy.routes[]`
shape is already a list of `(path, upstream, ...)` rules.
## Goals / Success Criteria
A bottle manifest can declare a cred-proxy route with a
`path_allowlist` and `auth_scheme: none`. Agents dialing
`http://cred-proxy:<port>/<route>/<suffix>` hit a 403 from
cred-proxy when `<suffix>` doesn't match any allowlist entry, and
a normal forward (no auth header injected) when it does. For
existing authenticated routes the addition is opt-in: a route
without `path_allowlist` keeps its current permissive behaviour.
Demonstrable behavior: a bottle manifest declares
`{path: "/github/", upstream: "https://github.com", auth_scheme: "none",
path_allowlist: ["/didericis/"]}`; the agent reaches
`http://cred-proxy:9099/github/didericis/some-repo` successfully,
gets a 403 on `http://cred-proxy:9099/github/someone-else/whatever`.
## Non-goals
- Replacing pipelock. Pipelock still does the hostname allowlist,
DLP body scanning, MCP / WebSocket inspection. Path filtering is
additive, sitting in front of pipelock for routes that opt in.
- Auto-routing arbitrary outbound HTTP through cred-proxy. The
agent's `HTTP_PROXY` stays pointed at pipelock; cred-proxy is
reached by explicit URL (with a `git-insteadof`-style rewrite
for the few protocol-level helpers that need it).
- Reworking pipelock-block. The PR #25 tool stays hostname-only;
whether a new path-aware proposal tool (or a richer
pipelock-block) is wanted is an open question for a follow-on
PRD.
- Live mutation of the running container or cred-proxy beyond
what cred-proxy SIGHUP already supports (PRD 0014).
## Scope
### In scope
- A new optional `auth_scheme: "none"` mode on cred-proxy routes
that suppresses Authorization injection while keeping path
routing + (new) path filtering.
- A new optional `path_allowlist: [<prefix>, ...]` field per
cred-proxy route. When present, cred-proxy 403s requests whose
in-route suffix doesn't match at least one prefix.
- Manifest schema + validation for the two new fields.
- Cred-proxy server logic: enforcement on each request after the
longest-prefix route match.
- SIGHUP reload picks up `path_allowlist` changes (no new sidecar
primitives — the existing reload path already re-reads
`routes.json`).
### Out of scope
- A new MCP tool for the agent to propose `path_allowlist`
additions. Today the operator manages this via the manifest +
the existing `routes edit <bottle>` TUI verb.
- Glob / regex matching. v1 ships prefix matching only; the open
question lays out the trade-offs.
- Auto-migrating PR #25's pipelock-block proposals into cred-proxy
routes. Manual operator decision per host.
- Provisioner-side dotfile changes for HTTPS-to-cred-proxy rewrites
on bottles that opt unauth'd hosts onto cred-proxy. Out of scope
for the engine work; the manifest can already encode it.
## Proposed Design
### Manifest schema additions
`bottle.cred_proxy.routes[]` gains two optional fields:
```yaml
cred_proxy:
routes:
- path: "/github/"
upstream: "https://github.com"
auth_scheme: "none" # new — no Authorization header
token_ref: "" # ignored when auth_scheme is "none"
path_allowlist: # new — prefix list; empty / absent = permissive
- "/didericis/"
- "/didericis-org/"
```
- `auth_scheme: "none"` joins the existing `Bearer` / `token` values.
When `none`, `token_ref` must be empty or absent and no
Authorization header is injected. The route still routes by path
prefix and forwards to upstream.
- `path_allowlist` is a list of suffix prefixes (matched after the
route's `path` is stripped). Empty / absent means permissive
(current behaviour). When non-empty, the suffix must start with
at least one of the allowlist entries.
### cred-proxy server changes
Per request:
1. Strip query string, longest-prefix-match against `routes`.
2. Compute the suffix = request_path[len(route.path):].
3. If `route.path_allowlist` is non-empty: require that
`"/" + suffix` (or just `suffix` — pick a consistent
normalization) starts with at least one allowlist entry. 403 if
not.
4. If `auth_scheme == "none"`: skip the `Authorization` header
step entirely; otherwise inject as today.
5. Forward upstream, stream response (unchanged).
The 403 body should name the route + the disallowed suffix so the
operator can diagnose. cred-proxy's existing log line at request
time picks up the new outcome too.
### Validation
At manifest load:
- `auth_scheme` must be one of `Bearer`, `token`, or `none`.
- When `auth_scheme == "none"`, `token_ref` is forbidden (clearer
error than silently ignoring).
- `path_allowlist` entries must start with `/` and end with `/`
(matching the existing convention for `route.path`).
- Duplicate prefixes are deduplicated with a warning, not an
error.
### Migration / backward compatibility
- Routes without `path_allowlist` behave exactly as today.
- Routes with `auth_scheme: Bearer | token` behave exactly as today.
- No existing manifests need editing; the new fields are opt-in.
## Open questions
- **Match semantics: prefix vs glob vs regex.** Prefix is simple
and matches the existing `route.path` convention. Glob (`/users/*/repos/`)
adds power but is easy to get wrong (does `*` match a `/`?).
Regex is the most powerful and the most footguny. Recommend
prefix-only for v1, glob in a follow-up if operators ask for it.
- **403 body shape.** Plain text vs JSON. Cred-proxy's existing
errors use plain text (`send_error(404, "no route for ...")`).
Match that.
- **Auth-less routes and TLS interception.** A `none`-auth route
still routes outbound HTTPS through pipelock (cred-proxy's
`HTTPS_PROXY` env), so pipelock's CA + body scanner still apply.
Confirm that pipelock's allowlist needs the upstream host in
this case — there's no token to make the cred-proxy → upstream
leg special. Likely yes, same as today.
- **MCP tool / pipelock-block evolution.** Once path filtering
exists, the operator may want a way for the agent to propose
path additions (e.g. "I need /didericis-org/ added to the
github route"). Today that requires manifest edit + cli.py
rebuild, or `routes edit` via the dashboard. Whether a new MCP
tool (or a richer pipelock-block) is wanted is a follow-on PRD
open question.
- **Allowlist semantics for the entire route prefix.** Should an
empty `path_allowlist: []` be allowed? Equivalent to "block
everything at this upstream" — possibly useful as a tombstone,
more likely a typo. Recommend treating empty list the same as
absent (permissive) and flagging in the validation note.
## References
- PRD 0010 — cred-proxy (the engine being extended).
- PRD 0015 — pipelock block remediation (whose hostname-only
ceiling motivates this PRD).
- PR #25`_apply_pipelock_url`'s docstring documents the
follow-up that this PRD formalises.