Files
bot-bottle/docs/prds/0017-path-aware-egress-via-cred-proxy.md
T
didericis 5b925a6699
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m34s
docs(prd-0017): path-aware egress filtering via cred-proxy
Extends cred-proxy to filter (not just route) paths, including for
unauthenticated upstreams via a new `auth_scheme: "none"` mode and
`path_allowlist` field per route. Pipelock keeps its hostname
allowlist + DLP role; cred-proxy adds path-level enforcement for
routes that opt in.

Motivated by PR #25's follow-up note in _apply_pipelock_url: pipelock
2.3.0's api_allowlist is hostname-only, so approving pipelock-block
opens the entire host. For shared platforms (github.com, gitlab.com,
public registries) operators usually want narrower-than-host
granularity.

Draft status; open questions on match semantics, allow-route-with-
empty-allowlist edge case, and the eventual MCP tool shape for
agent-proposed path additions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 08:33:01 -04:00

8.2 KiB

PRD 0017: Path-aware egress filtering via cred-proxy

  • Status: Draft
  • Author: didericis
  • Created: 2026-05-25

Summary

Pipelock's api_allowlist is hostname-only — once a host is on the list, every URL path at that host is reachable. For agents working on shared platforms (github.com, gitlab.com, public registries), this means approving access to one user's content also opens access to every other user's content. Cred-proxy already path-prefix-routes authenticated traffic; this PRD extends it to filter (not just route) paths, including for unauthenticated hosts. Per-bottle egress then has two complementary layers: pipelock for hostname allow + DLP + body scanning, cred-proxy for path-level allow on declared hosts.

Problem

PR #25's pipelock-block tool delivers an honest but coarse experience: the agent reports "I tried hitting https://github.com/didericis, pipelock 403'd it"; the operator approves and the agent now has access to all of github.com. The path in the proposal is captured as context but not enforced (PR #25 documents this in _apply_pipelock_url's docstring).

The intended posture for many shared platforms is narrower than hostname-level. "Allow the agent to read github.com/didericis but not github.com/somebody-else" is a normal ask. Today the egress stack can't express that, even though cred-proxy already has 80% of the machinery: it path-routes authenticated traffic with longest-prefix matching, and the manifest's cred_proxy.routes[] shape is already a list of (path, upstream, ...) rules.

Goals / Success Criteria

A bottle manifest can declare a cred-proxy route with a path_allowlist and auth_scheme: none. Agents dialing http://cred-proxy:<port>/<route>/<suffix> hit a 403 from cred-proxy when <suffix> doesn't match any allowlist entry, and a normal forward (no auth header injected) when it does. For existing authenticated routes the addition is opt-in: a route without path_allowlist keeps its current permissive behaviour.

Demonstrable behavior: a bottle manifest declares {path: "/github/", upstream: "https://github.com", auth_scheme: "none", path_allowlist: ["/didericis/"]}; the agent reaches http://cred-proxy:9099/github/didericis/some-repo successfully, gets a 403 on http://cred-proxy:9099/github/someone-else/whatever.

Non-goals

  • Replacing pipelock. Pipelock still does the hostname allowlist, DLP body scanning, MCP / WebSocket inspection. Path filtering is additive, sitting in front of pipelock for routes that opt in.
  • Auto-routing arbitrary outbound HTTP through cred-proxy. The agent's HTTP_PROXY stays pointed at pipelock; cred-proxy is reached by explicit URL (with a git-insteadof-style rewrite for the few protocol-level helpers that need it).
  • Reworking pipelock-block. The PR #25 tool stays hostname-only; whether a new path-aware proposal tool (or a richer pipelock-block) is wanted is an open question for a follow-on PRD.
  • Live mutation of the running container or cred-proxy beyond what cred-proxy SIGHUP already supports (PRD 0014).

Scope

In scope

  • A new optional auth_scheme: "none" mode on cred-proxy routes that suppresses Authorization injection while keeping path routing + (new) path filtering.
  • A new optional path_allowlist: [<prefix>, ...] field per cred-proxy route. When present, cred-proxy 403s requests whose in-route suffix doesn't match at least one prefix.
  • Manifest schema + validation for the two new fields.
  • Cred-proxy server logic: enforcement on each request after the longest-prefix route match.
  • SIGHUP reload picks up path_allowlist changes (no new sidecar primitives — the existing reload path already re-reads routes.json).

Out of scope

  • A new MCP tool for the agent to propose path_allowlist additions. Today the operator manages this via the manifest + the existing routes edit <bottle> TUI verb.
  • Glob / regex matching. v1 ships prefix matching only; the open question lays out the trade-offs.
  • Auto-migrating PR #25's pipelock-block proposals into cred-proxy routes. Manual operator decision per host.
  • Provisioner-side dotfile changes for HTTPS-to-cred-proxy rewrites on bottles that opt unauth'd hosts onto cred-proxy. Out of scope for the engine work; the manifest can already encode it.

Proposed Design

Manifest schema additions

bottle.cred_proxy.routes[] gains two optional fields:

cred_proxy:
  routes:
    - path: "/github/"
      upstream: "https://github.com"
      auth_scheme: "none"         # new — no Authorization header
      token_ref: ""               # ignored when auth_scheme is "none"
      path_allowlist:             # new — prefix list; empty / absent = permissive
        - "/didericis/"
        - "/didericis-org/"
  • auth_scheme: "none" joins the existing Bearer / token values. When none, token_ref must be empty or absent and no Authorization header is injected. The route still routes by path prefix and forwards to upstream.
  • path_allowlist is a list of suffix prefixes (matched after the route's path is stripped). Empty / absent means permissive (current behaviour). When non-empty, the suffix must start with at least one of the allowlist entries.

cred-proxy server changes

Per request:

  1. Strip query string, longest-prefix-match against routes.
  2. Compute the suffix = request_path[len(route.path):].
  3. If route.path_allowlist is non-empty: require that "/" + suffix (or just suffix — pick a consistent normalization) starts with at least one allowlist entry. 403 if not.
  4. If auth_scheme == "none": skip the Authorization header step entirely; otherwise inject as today.
  5. Forward upstream, stream response (unchanged).

The 403 body should name the route + the disallowed suffix so the operator can diagnose. cred-proxy's existing log line at request time picks up the new outcome too.

Validation

At manifest load:

  • auth_scheme must be one of Bearer, token, or none.
  • When auth_scheme == "none", token_ref is forbidden (clearer error than silently ignoring).
  • path_allowlist entries must start with / and end with / (matching the existing convention for route.path).
  • Duplicate prefixes are deduplicated with a warning, not an error.

Migration / backward compatibility

  • Routes without path_allowlist behave exactly as today.
  • Routes with auth_scheme: Bearer | token behave exactly as today.
  • No existing manifests need editing; the new fields are opt-in.

Open questions

  • Match semantics: prefix vs glob vs regex. Prefix is simple and matches the existing route.path convention. Glob (/users/*/repos/) adds power but is easy to get wrong (does * match a /?). Regex is the most powerful and the most footguny. Recommend prefix-only for v1, glob in a follow-up if operators ask for it.
  • 403 body shape. Plain text vs JSON. Cred-proxy's existing errors use plain text (send_error(404, "no route for ...")). Match that.
  • Auth-less routes and TLS interception. A none-auth route still routes outbound HTTPS through pipelock (cred-proxy's HTTPS_PROXY env), so pipelock's CA + body scanner still apply. Confirm that pipelock's allowlist needs the upstream host in this case — there's no token to make the cred-proxy → upstream leg special. Likely yes, same as today.
  • MCP tool / pipelock-block evolution. Once path filtering exists, the operator may want a way for the agent to propose path additions (e.g. "I need /didericis-org/ added to the github route"). Today that requires manifest edit + cli.py rebuild, or routes edit via the dashboard. Whether a new MCP tool (or a richer pipelock-block) is wanted is a follow-on PRD open question.
  • Allowlist semantics for the entire route prefix. Should an empty path_allowlist: [] be allowed? Equivalent to "block everything at this upstream" — possibly useful as a tombstone, more likely a typo. Recommend treating empty list the same as absent (permissive) and flagging in the validation note.

References

  • PRD 0010 — cred-proxy (the engine being extended).
  • PRD 0015 — pipelock block remediation (whose hostname-only ceiling motivates this PRD).
  • PR #25 — _apply_pipelock_url's docstring documents the follow-up that this PRD formalises.