docs(prd-0017): path-aware egress filtering via cred-proxy
Extends cred-proxy to filter (not just route) paths, including for unauthenticated upstreams via a new `auth_scheme: "none"` mode and `path_allowlist` field per route. Pipelock keeps its hostname allowlist + DLP role; cred-proxy adds path-level enforcement for routes that opt in. Motivated by PR #25's follow-up note in _apply_pipelock_url: pipelock 2.3.0's api_allowlist is hostname-only, so approving pipelock-block opens the entire host. For shared platforms (github.com, gitlab.com, public registries) operators usually want narrower-than-host granularity. Draft status; open questions on match semantics, allow-route-with- empty-allowlist edge case, and the eventual MCP tool shape for agent-proposed path additions. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,195 @@
|
||||
# PRD 0017: Path-aware egress filtering via cred-proxy
|
||||
|
||||
- **Status:** Draft
|
||||
- **Author:** didericis
|
||||
- **Created:** 2026-05-25
|
||||
|
||||
## Summary
|
||||
|
||||
Pipelock's `api_allowlist` is hostname-only — once a host is on the
|
||||
list, every URL path at that host is reachable. For agents working
|
||||
on shared platforms (github.com, gitlab.com, public registries),
|
||||
this means approving access to one user's content also opens
|
||||
access to every other user's content. Cred-proxy already
|
||||
path-prefix-routes authenticated traffic; this PRD extends it to
|
||||
filter (not just route) paths, including for unauthenticated hosts.
|
||||
Per-bottle egress then has two complementary layers: pipelock for
|
||||
hostname allow + DLP + body scanning, cred-proxy for path-level
|
||||
allow on declared hosts.
|
||||
|
||||
## Problem
|
||||
|
||||
PR #25's pipelock-block tool delivers an honest but coarse experience:
|
||||
the agent reports "I tried hitting `https://github.com/didericis`,
|
||||
pipelock 403'd it"; the operator approves and the agent now has
|
||||
access to all of github.com. The path in the proposal is captured
|
||||
as context but not enforced (PR #25 documents this in
|
||||
`_apply_pipelock_url`'s docstring).
|
||||
|
||||
The intended posture for many shared platforms is narrower than
|
||||
hostname-level. "Allow the agent to read github.com/didericis but
|
||||
not github.com/somebody-else" is a normal ask. Today the egress
|
||||
stack can't express that, even though cred-proxy already has 80%
|
||||
of the machinery: it path-routes authenticated traffic with
|
||||
longest-prefix matching, and the manifest's `cred_proxy.routes[]`
|
||||
shape is already a list of `(path, upstream, ...)` rules.
|
||||
|
||||
## Goals / Success Criteria
|
||||
|
||||
A bottle manifest can declare a cred-proxy route with a
|
||||
`path_allowlist` and `auth_scheme: none`. Agents dialing
|
||||
`http://cred-proxy:<port>/<route>/<suffix>` hit a 403 from
|
||||
cred-proxy when `<suffix>` doesn't match any allowlist entry, and
|
||||
a normal forward (no auth header injected) when it does. For
|
||||
existing authenticated routes the addition is opt-in: a route
|
||||
without `path_allowlist` keeps its current permissive behaviour.
|
||||
|
||||
Demonstrable behavior: a bottle manifest declares
|
||||
`{path: "/github/", upstream: "https://github.com", auth_scheme: "none",
|
||||
path_allowlist: ["/didericis/"]}`; the agent reaches
|
||||
`http://cred-proxy:9099/github/didericis/some-repo` successfully,
|
||||
gets a 403 on `http://cred-proxy:9099/github/someone-else/whatever`.
|
||||
|
||||
## Non-goals
|
||||
|
||||
- Replacing pipelock. Pipelock still does the hostname allowlist,
|
||||
DLP body scanning, MCP / WebSocket inspection. Path filtering is
|
||||
additive, sitting in front of pipelock for routes that opt in.
|
||||
- Auto-routing arbitrary outbound HTTP through cred-proxy. The
|
||||
agent's `HTTP_PROXY` stays pointed at pipelock; cred-proxy is
|
||||
reached by explicit URL (with a `git-insteadof`-style rewrite
|
||||
for the few protocol-level helpers that need it).
|
||||
- Reworking pipelock-block. The PR #25 tool stays hostname-only;
|
||||
whether a new path-aware proposal tool (or a richer
|
||||
pipelock-block) is wanted is an open question for a follow-on
|
||||
PRD.
|
||||
- Live mutation of the running container or cred-proxy beyond
|
||||
what cred-proxy SIGHUP already supports (PRD 0014).
|
||||
|
||||
## Scope
|
||||
|
||||
### In scope
|
||||
|
||||
- A new optional `auth_scheme: "none"` mode on cred-proxy routes
|
||||
that suppresses Authorization injection while keeping path
|
||||
routing + (new) path filtering.
|
||||
- A new optional `path_allowlist: [<prefix>, ...]` field per
|
||||
cred-proxy route. When present, cred-proxy 403s requests whose
|
||||
in-route suffix doesn't match at least one prefix.
|
||||
- Manifest schema + validation for the two new fields.
|
||||
- Cred-proxy server logic: enforcement on each request after the
|
||||
longest-prefix route match.
|
||||
- SIGHUP reload picks up `path_allowlist` changes (no new sidecar
|
||||
primitives — the existing reload path already re-reads
|
||||
`routes.json`).
|
||||
|
||||
### Out of scope
|
||||
|
||||
- A new MCP tool for the agent to propose `path_allowlist`
|
||||
additions. Today the operator manages this via the manifest +
|
||||
the existing `routes edit <bottle>` TUI verb.
|
||||
- Glob / regex matching. v1 ships prefix matching only; the open
|
||||
question lays out the trade-offs.
|
||||
- Auto-migrating PR #25's pipelock-block proposals into cred-proxy
|
||||
routes. Manual operator decision per host.
|
||||
- Provisioner-side dotfile changes for HTTPS-to-cred-proxy rewrites
|
||||
on bottles that opt unauth'd hosts onto cred-proxy. Out of scope
|
||||
for the engine work; the manifest can already encode it.
|
||||
|
||||
## Proposed Design
|
||||
|
||||
### Manifest schema additions
|
||||
|
||||
`bottle.cred_proxy.routes[]` gains two optional fields:
|
||||
|
||||
```yaml
|
||||
cred_proxy:
|
||||
routes:
|
||||
- path: "/github/"
|
||||
upstream: "https://github.com"
|
||||
auth_scheme: "none" # new — no Authorization header
|
||||
token_ref: "" # ignored when auth_scheme is "none"
|
||||
path_allowlist: # new — prefix list; empty / absent = permissive
|
||||
- "/didericis/"
|
||||
- "/didericis-org/"
|
||||
```
|
||||
|
||||
- `auth_scheme: "none"` joins the existing `Bearer` / `token` values.
|
||||
When `none`, `token_ref` must be empty or absent and no
|
||||
Authorization header is injected. The route still routes by path
|
||||
prefix and forwards to upstream.
|
||||
- `path_allowlist` is a list of suffix prefixes (matched after the
|
||||
route's `path` is stripped). Empty / absent means permissive
|
||||
(current behaviour). When non-empty, the suffix must start with
|
||||
at least one of the allowlist entries.
|
||||
|
||||
### cred-proxy server changes
|
||||
|
||||
Per request:
|
||||
1. Strip query string, longest-prefix-match against `routes`.
|
||||
2. Compute the suffix = request_path[len(route.path):].
|
||||
3. If `route.path_allowlist` is non-empty: require that
|
||||
`"/" + suffix` (or just `suffix` — pick a consistent
|
||||
normalization) starts with at least one allowlist entry. 403 if
|
||||
not.
|
||||
4. If `auth_scheme == "none"`: skip the `Authorization` header
|
||||
step entirely; otherwise inject as today.
|
||||
5. Forward upstream, stream response (unchanged).
|
||||
|
||||
The 403 body should name the route + the disallowed suffix so the
|
||||
operator can diagnose. cred-proxy's existing log line at request
|
||||
time picks up the new outcome too.
|
||||
|
||||
### Validation
|
||||
|
||||
At manifest load:
|
||||
- `auth_scheme` must be one of `Bearer`, `token`, or `none`.
|
||||
- When `auth_scheme == "none"`, `token_ref` is forbidden (clearer
|
||||
error than silently ignoring).
|
||||
- `path_allowlist` entries must start with `/` and end with `/`
|
||||
(matching the existing convention for `route.path`).
|
||||
- Duplicate prefixes are deduplicated with a warning, not an
|
||||
error.
|
||||
|
||||
### Migration / backward compatibility
|
||||
|
||||
- Routes without `path_allowlist` behave exactly as today.
|
||||
- Routes with `auth_scheme: Bearer | token` behave exactly as today.
|
||||
- No existing manifests need editing; the new fields are opt-in.
|
||||
|
||||
## Open questions
|
||||
|
||||
- **Match semantics: prefix vs glob vs regex.** Prefix is simple
|
||||
and matches the existing `route.path` convention. Glob (`/users/*/repos/`)
|
||||
adds power but is easy to get wrong (does `*` match a `/`?).
|
||||
Regex is the most powerful and the most footguny. Recommend
|
||||
prefix-only for v1, glob in a follow-up if operators ask for it.
|
||||
- **403 body shape.** Plain text vs JSON. Cred-proxy's existing
|
||||
errors use plain text (`send_error(404, "no route for ...")`).
|
||||
Match that.
|
||||
- **Auth-less routes and TLS interception.** A `none`-auth route
|
||||
still routes outbound HTTPS through pipelock (cred-proxy's
|
||||
`HTTPS_PROXY` env), so pipelock's CA + body scanner still apply.
|
||||
Confirm that pipelock's allowlist needs the upstream host in
|
||||
this case — there's no token to make the cred-proxy → upstream
|
||||
leg special. Likely yes, same as today.
|
||||
- **MCP tool / pipelock-block evolution.** Once path filtering
|
||||
exists, the operator may want a way for the agent to propose
|
||||
path additions (e.g. "I need /didericis-org/ added to the
|
||||
github route"). Today that requires manifest edit + cli.py
|
||||
rebuild, or `routes edit` via the dashboard. Whether a new MCP
|
||||
tool (or a richer pipelock-block) is wanted is a follow-on PRD
|
||||
open question.
|
||||
- **Allowlist semantics for the entire route prefix.** Should an
|
||||
empty `path_allowlist: []` be allowed? Equivalent to "block
|
||||
everything at this upstream" — possibly useful as a tombstone,
|
||||
more likely a typo. Recommend treating empty list the same as
|
||||
absent (permissive) and flagging in the validation note.
|
||||
|
||||
## References
|
||||
|
||||
- PRD 0010 — cred-proxy (the engine being extended).
|
||||
- PRD 0015 — pipelock block remediation (whose hostname-only
|
||||
ceiling motivates this PRD).
|
||||
- PR #25 — `_apply_pipelock_url`'s docstring documents the
|
||||
follow-up that this PRD formalises.
|
||||
Reference in New Issue
Block a user