From 5b925a66996790cd05287d63f214eadf675a2847 Mon Sep 17 00:00:00 2001 From: didericis Date: Mon, 25 May 2026 08:33:01 -0400 Subject: [PATCH 1/3] docs(prd-0017): path-aware egress filtering via cred-proxy Extends cred-proxy to filter (not just route) paths, including for unauthenticated upstreams via a new `auth_scheme: "none"` mode and `path_allowlist` field per route. Pipelock keeps its hostname allowlist + DLP role; cred-proxy adds path-level enforcement for routes that opt in. Motivated by PR #25's follow-up note in _apply_pipelock_url: pipelock 2.3.0's api_allowlist is hostname-only, so approving pipelock-block opens the entire host. For shared platforms (github.com, gitlab.com, public registries) operators usually want narrower-than-host granularity. Draft status; open questions on match semantics, allow-route-with- empty-allowlist edge case, and the eventual MCP tool shape for agent-proposed path additions. Co-Authored-By: Claude Opus 4.7 --- .../0017-path-aware-egress-via-cred-proxy.md | 195 ++++++++++++++++++ 1 file changed, 195 insertions(+) create mode 100644 docs/prds/0017-path-aware-egress-via-cred-proxy.md diff --git a/docs/prds/0017-path-aware-egress-via-cred-proxy.md b/docs/prds/0017-path-aware-egress-via-cred-proxy.md new file mode 100644 index 0000000..81101c8 --- /dev/null +++ b/docs/prds/0017-path-aware-egress-via-cred-proxy.md @@ -0,0 +1,195 @@ +# PRD 0017: Path-aware egress filtering via cred-proxy + +- **Status:** Draft +- **Author:** didericis +- **Created:** 2026-05-25 + +## Summary + +Pipelock's `api_allowlist` is hostname-only — once a host is on the +list, every URL path at that host is reachable. For agents working +on shared platforms (github.com, gitlab.com, public registries), +this means approving access to one user's content also opens +access to every other user's content. Cred-proxy already +path-prefix-routes authenticated traffic; this PRD extends it to +filter (not just route) paths, including for unauthenticated hosts. +Per-bottle egress then has two complementary layers: pipelock for +hostname allow + DLP + body scanning, cred-proxy for path-level +allow on declared hosts. + +## Problem + +PR #25's pipelock-block tool delivers an honest but coarse experience: +the agent reports "I tried hitting `https://github.com/didericis`, +pipelock 403'd it"; the operator approves and the agent now has +access to all of github.com. The path in the proposal is captured +as context but not enforced (PR #25 documents this in +`_apply_pipelock_url`'s docstring). + +The intended posture for many shared platforms is narrower than +hostname-level. "Allow the agent to read github.com/didericis but +not github.com/somebody-else" is a normal ask. Today the egress +stack can't express that, even though cred-proxy already has 80% +of the machinery: it path-routes authenticated traffic with +longest-prefix matching, and the manifest's `cred_proxy.routes[]` +shape is already a list of `(path, upstream, ...)` rules. + +## Goals / Success Criteria + +A bottle manifest can declare a cred-proxy route with a +`path_allowlist` and `auth_scheme: none`. Agents dialing +`http://cred-proxy://` hit a 403 from +cred-proxy when `` doesn't match any allowlist entry, and +a normal forward (no auth header injected) when it does. For +existing authenticated routes the addition is opt-in: a route +without `path_allowlist` keeps its current permissive behaviour. + +Demonstrable behavior: a bottle manifest declares +`{path: "/github/", upstream: "https://github.com", auth_scheme: "none", +path_allowlist: ["/didericis/"]}`; the agent reaches +`http://cred-proxy:9099/github/didericis/some-repo` successfully, +gets a 403 on `http://cred-proxy:9099/github/someone-else/whatever`. + +## Non-goals + +- Replacing pipelock. Pipelock still does the hostname allowlist, + DLP body scanning, MCP / WebSocket inspection. Path filtering is + additive, sitting in front of pipelock for routes that opt in. +- Auto-routing arbitrary outbound HTTP through cred-proxy. The + agent's `HTTP_PROXY` stays pointed at pipelock; cred-proxy is + reached by explicit URL (with a `git-insteadof`-style rewrite + for the few protocol-level helpers that need it). +- Reworking pipelock-block. The PR #25 tool stays hostname-only; + whether a new path-aware proposal tool (or a richer + pipelock-block) is wanted is an open question for a follow-on + PRD. +- Live mutation of the running container or cred-proxy beyond + what cred-proxy SIGHUP already supports (PRD 0014). + +## Scope + +### In scope + +- A new optional `auth_scheme: "none"` mode on cred-proxy routes + that suppresses Authorization injection while keeping path + routing + (new) path filtering. +- A new optional `path_allowlist: [, ...]` field per + cred-proxy route. When present, cred-proxy 403s requests whose + in-route suffix doesn't match at least one prefix. +- Manifest schema + validation for the two new fields. +- Cred-proxy server logic: enforcement on each request after the + longest-prefix route match. +- SIGHUP reload picks up `path_allowlist` changes (no new sidecar + primitives — the existing reload path already re-reads + `routes.json`). + +### Out of scope + +- A new MCP tool for the agent to propose `path_allowlist` + additions. Today the operator manages this via the manifest + + the existing `routes edit ` TUI verb. +- Glob / regex matching. v1 ships prefix matching only; the open + question lays out the trade-offs. +- Auto-migrating PR #25's pipelock-block proposals into cred-proxy + routes. Manual operator decision per host. +- Provisioner-side dotfile changes for HTTPS-to-cred-proxy rewrites + on bottles that opt unauth'd hosts onto cred-proxy. Out of scope + for the engine work; the manifest can already encode it. + +## Proposed Design + +### Manifest schema additions + +`bottle.cred_proxy.routes[]` gains two optional fields: + +```yaml +cred_proxy: + routes: + - path: "/github/" + upstream: "https://github.com" + auth_scheme: "none" # new — no Authorization header + token_ref: "" # ignored when auth_scheme is "none" + path_allowlist: # new — prefix list; empty / absent = permissive + - "/didericis/" + - "/didericis-org/" +``` + +- `auth_scheme: "none"` joins the existing `Bearer` / `token` values. + When `none`, `token_ref` must be empty or absent and no + Authorization header is injected. The route still routes by path + prefix and forwards to upstream. +- `path_allowlist` is a list of suffix prefixes (matched after the + route's `path` is stripped). Empty / absent means permissive + (current behaviour). When non-empty, the suffix must start with + at least one of the allowlist entries. + +### cred-proxy server changes + +Per request: +1. Strip query string, longest-prefix-match against `routes`. +2. Compute the suffix = request_path[len(route.path):]. +3. If `route.path_allowlist` is non-empty: require that + `"/" + suffix` (or just `suffix` — pick a consistent + normalization) starts with at least one allowlist entry. 403 if + not. +4. If `auth_scheme == "none"`: skip the `Authorization` header + step entirely; otherwise inject as today. +5. Forward upstream, stream response (unchanged). + +The 403 body should name the route + the disallowed suffix so the +operator can diagnose. cred-proxy's existing log line at request +time picks up the new outcome too. + +### Validation + +At manifest load: +- `auth_scheme` must be one of `Bearer`, `token`, or `none`. +- When `auth_scheme == "none"`, `token_ref` is forbidden (clearer + error than silently ignoring). +- `path_allowlist` entries must start with `/` and end with `/` + (matching the existing convention for `route.path`). +- Duplicate prefixes are deduplicated with a warning, not an + error. + +### Migration / backward compatibility + +- Routes without `path_allowlist` behave exactly as today. +- Routes with `auth_scheme: Bearer | token` behave exactly as today. +- No existing manifests need editing; the new fields are opt-in. + +## Open questions + +- **Match semantics: prefix vs glob vs regex.** Prefix is simple + and matches the existing `route.path` convention. Glob (`/users/*/repos/`) + adds power but is easy to get wrong (does `*` match a `/`?). + Regex is the most powerful and the most footguny. Recommend + prefix-only for v1, glob in a follow-up if operators ask for it. +- **403 body shape.** Plain text vs JSON. Cred-proxy's existing + errors use plain text (`send_error(404, "no route for ...")`). + Match that. +- **Auth-less routes and TLS interception.** A `none`-auth route + still routes outbound HTTPS through pipelock (cred-proxy's + `HTTPS_PROXY` env), so pipelock's CA + body scanner still apply. + Confirm that pipelock's allowlist needs the upstream host in + this case — there's no token to make the cred-proxy → upstream + leg special. Likely yes, same as today. +- **MCP tool / pipelock-block evolution.** Once path filtering + exists, the operator may want a way for the agent to propose + path additions (e.g. "I need /didericis-org/ added to the + github route"). Today that requires manifest edit + cli.py + rebuild, or `routes edit` via the dashboard. Whether a new MCP + tool (or a richer pipelock-block) is wanted is a follow-on PRD + open question. +- **Allowlist semantics for the entire route prefix.** Should an + empty `path_allowlist: []` be allowed? Equivalent to "block + everything at this upstream" — possibly useful as a tombstone, + more likely a typo. Recommend treating empty list the same as + absent (permissive) and flagging in the validation note. + +## References + +- PRD 0010 — cred-proxy (the engine being extended). +- PRD 0015 — pipelock block remediation (whose hostname-only + ceiling motivates this PRD). +- PR #25 — `_apply_pipelock_url`'s docstring documents the + follow-up that this PRD formalises. From b0d98024692caeea7f142afd5bf11c263da79da7 Mon Sep 17 00:00:00 2001 From: didericis Date: Mon, 25 May 2026 13:28:53 -0400 Subject: [PATCH 2/3] docs(prd-0017): pivot to mitmproxy-based egress-proxy MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Significant rewrite of PRD 0017 based on PR #25 design discussion. Original draft proposed adding `path_allowlist` to the existing cred-proxy. That bought opt-in path filtering for tools that voluntarily routed through cred-proxy (Claude Code, git, npm) — but raw `curl https://github.com/foo` from the agent goes to HTTPS_PROXY=pipelock and bypasses cred-proxy entirely, so any universal enforcement claim was a lie. New design: replace cred-proxy with a mitmproxy-based egress-proxy that becomes the agent's HTTP_PROXY/HTTPS_PROXY. Every agent HTTP/HTTPS request flows through it before reaching pipelock. Path-level allow/deny enforcement is universal because the proxy is on every leg. The proxy also absorbs cred-proxy's credential injection role (mitmproxy addon hooks request → strip + inject Authorization). Net sidecar count: unchanged. cred-proxy is replaced 1:1 by egress-proxy. Pipelock stays as hostname allow + DLP downstream of egress-proxy. Decisions baked in per PR-#25 discussion: - Tool: mitmproxy (designed for this; Python addons; well-maintained). - CA custody: egress-proxy holds the per-bottle MITM CA key (concentration accepted; documented in trust-domain section). - Migration: hard cutover. Existing `bottle.cred_proxy.routes[]` manifests fail-fast at load time with a pointer at this PRD. Open questions retained for the implementation PRs: addon distribution (bake vs mount), prefix-vs-glob match, double-strip of Authorization between egress-proxy and pipelock, whether pipelock keeps TLS interception or stays hostname-only post-cutover, performance under two-MITM-hops. Co-Authored-By: Claude Opus 4.7 --- docs/prds/0017-egress-proxy-via-mitmproxy.md | 309 ++++++++++++++++++ .../0017-path-aware-egress-via-cred-proxy.md | 195 ----------- 2 files changed, 309 insertions(+), 195 deletions(-) create mode 100644 docs/prds/0017-egress-proxy-via-mitmproxy.md delete mode 100644 docs/prds/0017-path-aware-egress-via-cred-proxy.md diff --git a/docs/prds/0017-egress-proxy-via-mitmproxy.md b/docs/prds/0017-egress-proxy-via-mitmproxy.md new file mode 100644 index 0000000..031306b --- /dev/null +++ b/docs/prds/0017-egress-proxy-via-mitmproxy.md @@ -0,0 +1,309 @@ +# PRD 0017: Egress-proxy — universal MITM with path filtering + auth injection + +- **Status:** Draft +- **Author:** didericis +- **Created:** 2026-05-25 +- **Supersedes:** the cred-proxy sidecar (PRD 0010) — hard cutover. + +## Summary + +Replace the per-bottle cred-proxy sidecar with a new `egress-proxy` +sidecar built on mitmproxy. The egress-proxy is the agent's +`HTTP_PROXY` / `HTTPS_PROXY` — every agent HTTP/HTTPS request flows +through it before reaching pipelock. It owns three jobs that today +are split between cred-proxy and pipelock: + +1. **MITM the agent's HTTPS.** Uses the per-bottle CA today held by + pipelock; that key moves to the egress-proxy. +2. **Path-level allow/deny.** Manifest-declared `path_allowlist` + per route. Universal coverage — any HTTPS path the agent reaches + for is inspected here, not just traffic that voluntarily dials + the cred-proxy URL. +3. **Credential injection.** Continues cred-proxy's existing role: + match by hostname (or hostname + path), strip inbound + Authorization, inject one based on `auth_scheme` + `token_ref`. + +Pipelock's role narrows to hostname allowlist + DLP body scanning +on the egress-proxy → upstream leg. Pipelock no longer holds the +CA private key; no longer the agent's direct proxy. + +## Problem + +PR #25's pipelock-block flow exposed an honest gap: pipelock's +`api_allowlist` is hostname-only (verified by probing the binary's +strict preset and the `pipelock check --url` output). Approving a +proposed `pipelock-block` opens the entire host, not the URL's +path. For shared platforms (github.com, gitlab.com, public +registries) operators routinely want narrower-than-host granularity +— allow github.com/didericis but block github.com/somebody-else. + +Cred-proxy already does path-prefix routing for credentialed APIs, +but it only sees the requests the agent voluntarily routes to it +(via `ANTHROPIC_BASE_URL`, `~/.gitconfig` insteadOf, npmrc +`registry=`). A raw `curl https://github.com/anyone` from the agent +goes to `HTTPS_PROXY=pipelock` directly and bypasses cred-proxy +entirely. So extending cred-proxy with `path_allowlist` (the earlier +PRD 0017 draft) buys *opt-in* path filtering, not enforcement. + +For enforcement we need a layer that sits on the agent's +`HTTPS_PROXY` path — universal coverage of agent egress. + +## Goals / Success Criteria + +A bottle manifest declares an egress-proxy route with a +`path_allowlist`. From inside the bottle, `curl +https://github.com/didericis/foo` succeeds; `curl +https://github.com/somebody-else/secret` gets a 403 from +egress-proxy, never reaches pipelock or the real github. The same +holds for any tool inside the bottle that respects +`HTTPS_PROXY` — claude-code, git over HTTPS, npm, raw curl, random +Python `requests`. No tool-specific rewrite is required for path +enforcement. + +Existing cred-proxy responsibilities continue to work after the +cutover: Anthropic OAuth injection for claude-code (via the +proxy-side header injection rather than the dotfile rewrite), +git-insteadof routing into the proxy stays useful for hostname +canonicalisation but is no longer load-bearing for credential +delivery. + +## Non-goals + +- Replacing pipelock. Pipelock keeps doing hostname allowlist + + DLP body scanning on the egress-proxy → upstream leg. +- Building our own MITM stack. mitmproxy already does it; we ship + addons. +- Backward compatibility with `bottle.cred_proxy.routes[]`. Hard + cutover (see Migration). +- Path-level rules in pipelock. Upstream feature request is a + separate track (file independently); this PRD doesn't depend on + it. + +## Scope + +### In scope + +- A new `egress-proxy` sidecar replacing the cred-proxy sidecar. + mitmproxy image, pinned by digest. Addons in Python. +- Per-bottle CA generation **moves from pipelock to egress-proxy**. + The agent's trust store is rebuilt against the egress-proxy CA + (was pipelock's CA). +- Manifest rename: `bottle.cred_proxy.routes[]` → + `bottle.egress_proxy.routes[]`. The route shape gains optional + `path_allowlist: [, ...]` and supports `auth_scheme: + "none"`. +- Agent's `HTTP_PROXY` / `HTTPS_PROXY` env vars repointed at the + egress-proxy (was pipelock). +- Pipelock retains its sidecar slot and its own DLP + hostname + scanner. The agent never dials it directly anymore; egress-proxy + uses `HTTPS_PROXY=pipelock` for its outbound leg, matching the + current cred-proxy → pipelock pattern. +- Existing PRDs that depend on cred-proxy: + - PRD 0014 (cred-proxy-block remediation) → renames + retargets + apply path. SIGHUP reload semantics carry over to egress-proxy. + - PRD 0013 (supervise plane) `cred-proxy-block` MCP tool stays; + its proposed file format updates per the new route shape. +- Removal of the old cred-proxy code: `claude_bottle/cred_proxy.py`, + `cred_proxy_server.py`, `backend/docker/cred_proxy.py`, + `provision/cred_proxy.py`, the `Dockerfile.cred-proxy`. Tests + updated. + +### Out of scope + +- Pipelock CA path: pipelock keeps generating its *own* CA for + any internal TLS termination it still does (e.g., on the + egress-proxy → upstream leg if pipelock is the MITM there). + Whether pipelock needs that CA at all post-cutover is an open + question (probably no — egress-proxy already terminated; pipelock + is now downstream of a plain-HTTP forward from egress-proxy). +- Glob / regex matching in `path_allowlist`. v1 ships prefix + matching; expressive forms are a follow-up. +- An MCP tool for the agent to propose `path_allowlist` + additions. Today the operator manages this via the manifest + + the existing `routes edit ` TUI verb (renamed to + `egress-proxy edit `). + +## Proposed design + +### Topology + +``` +[Agent] --HTTP_PROXY=egress-proxy--> + [egress-proxy (mitmproxy)] + MITM with per-bottle CA + path_allowlist enforcement + Authorization header injection + --HTTPS_PROXY=pipelock--> + [pipelock] + hostname allowlist + DLP body scan + --egress--> Internet +``` + +Universal coverage: every HTTP/HTTPS request the agent makes hits +egress-proxy first. cred-proxy's URL convention +(`http://cred-proxy:9099/...`) goes away — there's no need for the +agent to address the proxy by name because it's already on the +default proxy path. + +### Manifest + +```yaml +egress_proxy: + routes: + # Authenticated route (today's cred-proxy shape, slightly + # renamed). path_allowlist optional. + - host: "api.github.com" + auth_scheme: "Bearer" + token_ref: "GH_PAT" + path_allowlist: + - "/repos/didericis/" + - "/users/didericis" + # Unauthenticated path-filtered route. + - host: "github.com" + auth_scheme: "none" + path_allowlist: + - "/didericis/" + # Bare-pass route: no auth injection, no path enforcement. + # Useful when you want a host to skip path filtering but + # still be DLP-scanned by pipelock. + - host: "api.anthropic.com" + auth_scheme: "none" + # no path_allowlist → all paths pass +``` + +Route matching is on `host` (was `path` prefix). The hostname +gates whether a route applies; `path_allowlist` (if present) +constrains the URL path under that host. + +### mitmproxy addon shape + +The egress-proxy ships a small Python addon that: + +- Loads the per-bottle routes from `/etc/egress-proxy/routes.yaml` + (rendered by the prepare step, docker-cp'd in like cred-proxy's + current routes.json). +- On `request` hook: match `flow.request.host` → route. If no route + matches → forward unchanged (pipelock will hostname-gate it). If + route matches and has `path_allowlist`, check `flow.request.path` + against the prefix list; 403 with a clear reason if no match. +- On approved requests: strip inbound Authorization, inject + `Authorization: ` if `auth_scheme + != "none"`. +- SIGHUP / file-mtime watch on `routes.yaml` for hot-reload (same + cadence as today's cred-proxy SIGHUP path). + +mitmproxy's standard CA generation handles per-host leaf certs at +SNI time. The per-bottle CA is generated at bottle launch (was +pipelock's tls-init step; now egress-proxy's). Agent's trust store +gets the egress-proxy CA installed in place of pipelock's. + +### Trust-domain concentration + +The egress-proxy now holds: + +- Every credential the bottle declared in `egress_proxy.routes[]` + (OAuth tokens, PATs, npm tokens). +- The per-bottle MITM CA private key. + +This is a deliberate concentration. With the previous split: + +- cred-proxy held tokens. +- pipelock held the CA. + +A memory disclosure in cred-proxy exposed tokens; in pipelock, +the CA. Both were bad; neither exposed everything. + +The new egress-proxy in one disclosure exposes both. Mitigations: + +- mitmproxy runs as an unprivileged user inside the container. +- Tokens live in the container's environ (same as cred-proxy today). + The CA private key is mounted from the host's stage_dir (mode 600). +- Pipelock stays as a separate sidecar, so a compromise of + egress-proxy doesn't disable pipelock's hostname check + DLP on + the outbound leg — the attacker can forge certs to the agent but + can't easily exfil from inside the agent without pipelock + noticing. + +The user (per PR #25 discussion) accepted this concentration in +exchange for the one-sidecar consolidation. The PRD records it +explicitly. + +### Migration — hard cutover + +No backward-compat alias for `bottle.cred_proxy.routes[]`. At +manifest load: + +- `cred_proxy:` block → `die()` with a clear pointer at this PRD + and a migration recipe (rename to `egress_proxy:`, rename + `path` → `host`, drop the agent-side URL prefix). +- `cred_proxy_routes` field on existing dataclasses removed. +- `Dockerfile.cred-proxy` deleted. +- `claude_bottle/cred_proxy*.py` deleted. +- `claude_bottle/backend/docker/cred_proxy*.py` consolidated into + `egress_proxy*.py`. +- Provisioner files renamed. +- PRDs 0010 (cred-proxy), 0014 (cred-proxy-block remediation) + retroactively annotated as "superseded by 0017" — old text + preserved, header updated. + +### Implementation chunks + +Plausibly three implementation PRs after this PRD lands: + +1. **egress-proxy sidecar core.** Dockerfile + mitmproxy addon + + `routes.yaml` schema + lifecycle (prepare / start / stop / SIGHUP). +2. **Manifest + provisioner migration.** Rename cred-proxy + throughout the codebase, hard-fail on legacy manifests, update + agent CA trust to point at egress-proxy. +3. **PRD 0014 retargeting.** cred-proxy-block remediation's apply + path repointed at egress-proxy (SIGHUP, audit log, etc.). + Supervise tool description updated. + +## Open questions + +- **mitmproxy addon distribution.** Mount the addon Python file + from stage_dir, or bake it into the image. Mount is more + hot-reloadable; bake-in is more reproducible. Recommend bake-in, + with routes.yaml as the only mounted state. +- **Path match semantics.** Prefix-only for v1 (matches PRD 0017 + v1 spirit). Globs / regex are a follow-up if operators ask. +- **Mode for the `Authorization` strip on inbound.** Pipelock has a + similar strip in `sensitive_headers`. Confirm there's no + double-strip causing a real header the agent set to disappear + unexpectedly. Probably want egress-proxy to be the only stripper + for routes that match. +- **Pipelock's TLS interception post-cutover.** Today pipelock + MITMs the cred-proxy → upstream leg using its own CA. After the + cutover, that leg starts as a CONNECT tunnel from egress-proxy + (egress-proxy treats pipelock as a plain HTTPS forward proxy). + Does pipelock still need to MITM? Probably no — egress-proxy + already terminated, body content is already inspected upstream + by egress-proxy's addons (or could be). But that means moving + DLP from pipelock to egress-proxy, which expands egress-proxy's + trust-domain *further*. Punted to the implementation PR to + decide. +- **Performance.** Two MITM hops in the worst case (agent ↔ + egress-proxy and pipelock ↔ upstream if pipelock keeps its + interception). Measure under realistic load; if it's a problem, + the answer is probably to disable pipelock's TLS interception + and let it operate at hostname-only. +- **Agent's existing dotfile rewrites.** Today cred-proxy + provisions ~/.npmrc with `registry=http://cred-proxy:9099/npm/`, + ~/.gitconfig with `insteadOf` rules, etc. After the cutover + none of those rewrites are strictly necessary for routing + (HTTPS_PROXY catches everything), but they may still be useful + for canonicalisation (so the agent's `npm install` doesn't + surprise itself by talking to a different registry). Decide per + dotfile in the migration PR. + +## References + +- PRD 0010 — cred-proxy (superseded by this PRD). +- PRD 0014 — cred-proxy-block remediation (retargeted). +- PRD 0013 — supervise plane (tool descriptions updated). +- PR #25 — the supervise loop, whose `_apply_pipelock_url` + docstring flagged the original "path filtering belongs + somewhere" follow-up. +- mitmproxy — https://mitmproxy.org/ — chosen as the egress-proxy + engine because it's the canonical scriptable MITM forward proxy. diff --git a/docs/prds/0017-path-aware-egress-via-cred-proxy.md b/docs/prds/0017-path-aware-egress-via-cred-proxy.md deleted file mode 100644 index 81101c8..0000000 --- a/docs/prds/0017-path-aware-egress-via-cred-proxy.md +++ /dev/null @@ -1,195 +0,0 @@ -# PRD 0017: Path-aware egress filtering via cred-proxy - -- **Status:** Draft -- **Author:** didericis -- **Created:** 2026-05-25 - -## Summary - -Pipelock's `api_allowlist` is hostname-only — once a host is on the -list, every URL path at that host is reachable. For agents working -on shared platforms (github.com, gitlab.com, public registries), -this means approving access to one user's content also opens -access to every other user's content. Cred-proxy already -path-prefix-routes authenticated traffic; this PRD extends it to -filter (not just route) paths, including for unauthenticated hosts. -Per-bottle egress then has two complementary layers: pipelock for -hostname allow + DLP + body scanning, cred-proxy for path-level -allow on declared hosts. - -## Problem - -PR #25's pipelock-block tool delivers an honest but coarse experience: -the agent reports "I tried hitting `https://github.com/didericis`, -pipelock 403'd it"; the operator approves and the agent now has -access to all of github.com. The path in the proposal is captured -as context but not enforced (PR #25 documents this in -`_apply_pipelock_url`'s docstring). - -The intended posture for many shared platforms is narrower than -hostname-level. "Allow the agent to read github.com/didericis but -not github.com/somebody-else" is a normal ask. Today the egress -stack can't express that, even though cred-proxy already has 80% -of the machinery: it path-routes authenticated traffic with -longest-prefix matching, and the manifest's `cred_proxy.routes[]` -shape is already a list of `(path, upstream, ...)` rules. - -## Goals / Success Criteria - -A bottle manifest can declare a cred-proxy route with a -`path_allowlist` and `auth_scheme: none`. Agents dialing -`http://cred-proxy://` hit a 403 from -cred-proxy when `` doesn't match any allowlist entry, and -a normal forward (no auth header injected) when it does. For -existing authenticated routes the addition is opt-in: a route -without `path_allowlist` keeps its current permissive behaviour. - -Demonstrable behavior: a bottle manifest declares -`{path: "/github/", upstream: "https://github.com", auth_scheme: "none", -path_allowlist: ["/didericis/"]}`; the agent reaches -`http://cred-proxy:9099/github/didericis/some-repo` successfully, -gets a 403 on `http://cred-proxy:9099/github/someone-else/whatever`. - -## Non-goals - -- Replacing pipelock. Pipelock still does the hostname allowlist, - DLP body scanning, MCP / WebSocket inspection. Path filtering is - additive, sitting in front of pipelock for routes that opt in. -- Auto-routing arbitrary outbound HTTP through cred-proxy. The - agent's `HTTP_PROXY` stays pointed at pipelock; cred-proxy is - reached by explicit URL (with a `git-insteadof`-style rewrite - for the few protocol-level helpers that need it). -- Reworking pipelock-block. The PR #25 tool stays hostname-only; - whether a new path-aware proposal tool (or a richer - pipelock-block) is wanted is an open question for a follow-on - PRD. -- Live mutation of the running container or cred-proxy beyond - what cred-proxy SIGHUP already supports (PRD 0014). - -## Scope - -### In scope - -- A new optional `auth_scheme: "none"` mode on cred-proxy routes - that suppresses Authorization injection while keeping path - routing + (new) path filtering. -- A new optional `path_allowlist: [, ...]` field per - cred-proxy route. When present, cred-proxy 403s requests whose - in-route suffix doesn't match at least one prefix. -- Manifest schema + validation for the two new fields. -- Cred-proxy server logic: enforcement on each request after the - longest-prefix route match. -- SIGHUP reload picks up `path_allowlist` changes (no new sidecar - primitives — the existing reload path already re-reads - `routes.json`). - -### Out of scope - -- A new MCP tool for the agent to propose `path_allowlist` - additions. Today the operator manages this via the manifest + - the existing `routes edit ` TUI verb. -- Glob / regex matching. v1 ships prefix matching only; the open - question lays out the trade-offs. -- Auto-migrating PR #25's pipelock-block proposals into cred-proxy - routes. Manual operator decision per host. -- Provisioner-side dotfile changes for HTTPS-to-cred-proxy rewrites - on bottles that opt unauth'd hosts onto cred-proxy. Out of scope - for the engine work; the manifest can already encode it. - -## Proposed Design - -### Manifest schema additions - -`bottle.cred_proxy.routes[]` gains two optional fields: - -```yaml -cred_proxy: - routes: - - path: "/github/" - upstream: "https://github.com" - auth_scheme: "none" # new — no Authorization header - token_ref: "" # ignored when auth_scheme is "none" - path_allowlist: # new — prefix list; empty / absent = permissive - - "/didericis/" - - "/didericis-org/" -``` - -- `auth_scheme: "none"` joins the existing `Bearer` / `token` values. - When `none`, `token_ref` must be empty or absent and no - Authorization header is injected. The route still routes by path - prefix and forwards to upstream. -- `path_allowlist` is a list of suffix prefixes (matched after the - route's `path` is stripped). Empty / absent means permissive - (current behaviour). When non-empty, the suffix must start with - at least one of the allowlist entries. - -### cred-proxy server changes - -Per request: -1. Strip query string, longest-prefix-match against `routes`. -2. Compute the suffix = request_path[len(route.path):]. -3. If `route.path_allowlist` is non-empty: require that - `"/" + suffix` (or just `suffix` — pick a consistent - normalization) starts with at least one allowlist entry. 403 if - not. -4. If `auth_scheme == "none"`: skip the `Authorization` header - step entirely; otherwise inject as today. -5. Forward upstream, stream response (unchanged). - -The 403 body should name the route + the disallowed suffix so the -operator can diagnose. cred-proxy's existing log line at request -time picks up the new outcome too. - -### Validation - -At manifest load: -- `auth_scheme` must be one of `Bearer`, `token`, or `none`. -- When `auth_scheme == "none"`, `token_ref` is forbidden (clearer - error than silently ignoring). -- `path_allowlist` entries must start with `/` and end with `/` - (matching the existing convention for `route.path`). -- Duplicate prefixes are deduplicated with a warning, not an - error. - -### Migration / backward compatibility - -- Routes without `path_allowlist` behave exactly as today. -- Routes with `auth_scheme: Bearer | token` behave exactly as today. -- No existing manifests need editing; the new fields are opt-in. - -## Open questions - -- **Match semantics: prefix vs glob vs regex.** Prefix is simple - and matches the existing `route.path` convention. Glob (`/users/*/repos/`) - adds power but is easy to get wrong (does `*` match a `/`?). - Regex is the most powerful and the most footguny. Recommend - prefix-only for v1, glob in a follow-up if operators ask for it. -- **403 body shape.** Plain text vs JSON. Cred-proxy's existing - errors use plain text (`send_error(404, "no route for ...")`). - Match that. -- **Auth-less routes and TLS interception.** A `none`-auth route - still routes outbound HTTPS through pipelock (cred-proxy's - `HTTPS_PROXY` env), so pipelock's CA + body scanner still apply. - Confirm that pipelock's allowlist needs the upstream host in - this case — there's no token to make the cred-proxy → upstream - leg special. Likely yes, same as today. -- **MCP tool / pipelock-block evolution.** Once path filtering - exists, the operator may want a way for the agent to propose - path additions (e.g. "I need /didericis-org/ added to the - github route"). Today that requires manifest edit + cli.py - rebuild, or `routes edit` via the dashboard. Whether a new MCP - tool (or a richer pipelock-block) is wanted is a follow-on PRD - open question. -- **Allowlist semantics for the entire route prefix.** Should an - empty `path_allowlist: []` be allowed? Equivalent to "block - everything at this upstream" — possibly useful as a tombstone, - more likely a typo. Recommend treating empty list the same as - absent (permissive) and flagging in the validation note. - -## References - -- PRD 0010 — cred-proxy (the engine being extended). -- PRD 0015 — pipelock block remediation (whose hostname-only - ceiling motivates this PRD). -- PR #25 — `_apply_pipelock_url`'s docstring documents the - follow-up that this PRD formalises. From a79b2b7be0186768038d2a2fea28a4b30ed9da0a Mon Sep 17 00:00:00 2001 From: didericis Date: Mon, 25 May 2026 13:35:47 -0400 Subject: [PATCH 3/3] docs(prd-0017): nest auth.scheme + auth.token_ref under optional `auth` MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Earlier draft had `auth_scheme: "none"` as the unauthenticated signal — awkward sentinel. Nest the two credential-injection fields under an optional `auth` key instead. Presence of the key = authenticated; absence = unauthenticated. Empty `auth: {}` is an error (omission is what means "no auth"). Touches: scope bullet, manifest example, mitmproxy addon description's auth-handling step. Two trailing `auth_scheme: "none"` references kept as historical context for what the new shape replaces. Co-Authored-By: Claude Opus 4.7 --- docs/prds/0017-egress-proxy-via-mitmproxy.md | 51 +++++++++++++------- 1 file changed, 33 insertions(+), 18 deletions(-) diff --git a/docs/prds/0017-egress-proxy-via-mitmproxy.md b/docs/prds/0017-egress-proxy-via-mitmproxy.md index 031306b..ef723cf 100644 --- a/docs/prds/0017-egress-proxy-via-mitmproxy.md +++ b/docs/prds/0017-egress-proxy-via-mitmproxy.md @@ -21,7 +21,8 @@ are split between cred-proxy and pipelock: the cred-proxy URL. 3. **Credential injection.** Continues cred-proxy's existing role: match by hostname (or hostname + path), strip inbound - Authorization, inject one based on `auth_scheme` + `token_ref`. + Authorization, inject one based on the route's optional `auth: + { scheme, token_ref }` block. Pipelock's role narrows to hostname allowlist + DLP body scanning on the egress-proxy → upstream leg. Pipelock no longer holds the @@ -90,8 +91,10 @@ delivery. (was pipelock's CA). - Manifest rename: `bottle.cred_proxy.routes[]` → `bottle.egress_proxy.routes[]`. The route shape gains optional - `path_allowlist: [, ...]` and supports `auth_scheme: - "none"`. + `path_allowlist: [, ...]` and a nested optional `auth: + { scheme, token_ref }` block (presence/absence of `auth` is the + authenticated vs unauthenticated signal — replaces the old + `auth_scheme: "none"` pattern). - Agent's `HTTP_PROXY` / `HTTPS_PROXY` env vars repointed at the egress-proxy (was pipelock). - Pipelock retains its sidecar slot and its own DLP + hostname @@ -151,30 +154,41 @@ default proxy path. ```yaml egress_proxy: routes: - # Authenticated route (today's cred-proxy shape, slightly - # renamed). path_allowlist optional. + # Authenticated route — `auth` block carries the injection + # config. path_allowlist optional. - host: "api.github.com" - auth_scheme: "Bearer" - token_ref: "GH_PAT" + auth: + scheme: "Bearer" + token_ref: "GH_PAT" path_allowlist: - "/repos/didericis/" - "/users/didericis" - # Unauthenticated path-filtered route. + # Unauthenticated path-filtered route — `auth` omitted + # entirely (presence/absence of the key is the auth signal). - host: "github.com" - auth_scheme: "none" path_allowlist: - "/didericis/" - # Bare-pass route: no auth injection, no path enforcement. - # Useful when you want a host to skip path filtering but - # still be DLP-scanned by pipelock. + # Bare-pass route: no auth, no path constraint. Useful when + # you want a host to skip path filtering but still be + # DLP-scanned by pipelock on the outbound leg. - host: "api.anthropic.com" - auth_scheme: "none" - # no path_allowlist → all paths pass ``` Route matching is on `host` (was `path` prefix). The hostname gates whether a route applies; `path_allowlist` (if present) -constrains the URL path under that host. +constrains the URL path under that host. The optional `auth` +block carries credential-injection config: + +- Omit `auth` → no Authorization header injected (replaces the + earlier draft's `auth_scheme: "none"`). +- `auth.scheme` → one of `Bearer`, `token` (the values + cred-proxy supports today; sidesteps the gitea-token quirk). +- `auth.token_ref` → host env var holding the secret. Same + semantics as cred-proxy's `TokenRef` field today. + +Validation: `auth` (if present) must contain both `scheme` and +`token_ref`. An empty `auth: {}` is an error rather than a +synonym for "no auth" — that's what omission is for. ### mitmproxy addon shape @@ -187,9 +201,10 @@ The egress-proxy ships a small Python addon that: matches → forward unchanged (pipelock will hostname-gate it). If route matches and has `path_allowlist`, check `flow.request.path` against the prefix list; 403 with a clear reason if no match. -- On approved requests: strip inbound Authorization, inject - `Authorization: ` if `auth_scheme - != "none"`. +- On approved requests: strip inbound Authorization. If the route + carries an `auth` block, inject `Authorization: + `. If the route omits + `auth`, leave Authorization unset. - SIGHUP / file-mtime watch on `routes.yaml` for hot-reload (same cadence as today's cred-proxy SIGHUP path).