PRD 0029: provision egress routes via AgentProvisionPlan #115

Merged
didericis merged 9 commits from prd-0029-egress-routes-via-agent-provision-plan into codex/prd-codex-host-credentials 2026-06-01 22:04:33 -04:00
Collaborator

Addresses review comments on #110.

PRD: docs/prds/0029-codex-host-credentials.md

Summary

  • Remove template == "codex" branching from egress.py and pipelock.py — the same provider-specific pattern the rest of #110 moved out of the backends
  • Add EgressRoute.tls_passthrough: bool; egress_manifest_routes lifts the manifest pipelock.tls_passthrough flag so all route origins share the same field
  • Add AgentProvisionPlan.egress_routes; agent_provision_plan populates it for Codex + forward_host_credentials with tls_passthrough=True
  • Replace Codex-specific egress_routes_for_bottle logic with a generic _merge_provider_route helper (conflict detection preserved)
  • Rewrite pipelock_effective_tls_passthrough to read route.tls_passthrough from the merged route set — no provider check
  • Both backends call agent_provision_plan first, then pass plan.egress_routes to Egress.prepare and PipelockProxy.prepare

Changes (1 commit)

  • b79b490 refactor: provision egress routes via AgentProvisionPlan
Addresses review comments on #110. PRD: [docs/prds/0029-codex-host-credentials.md](https://gitea.dideric.is/didericis/bot-bottle/src/commit/b79b49090f39a5b990b5a6feadf768d60d4de599/docs/prds/0029-codex-host-credentials.md) ## Summary - Remove `template == "codex"` branching from `egress.py` and `pipelock.py` — the same provider-specific pattern the rest of #110 moved out of the backends - Add `EgressRoute.tls_passthrough: bool`; `egress_manifest_routes` lifts the manifest `pipelock.tls_passthrough` flag so all route origins share the same field - Add `AgentProvisionPlan.egress_routes`; `agent_provision_plan` populates it for Codex + `forward_host_credentials` with `tls_passthrough=True` - Replace Codex-specific `egress_routes_for_bottle` logic with a generic `_merge_provider_route` helper (conflict detection preserved) - Rewrite `pipelock_effective_tls_passthrough` to read `route.tls_passthrough` from the merged route set — no provider check - Both backends call `agent_provision_plan` first, then pass `plan.egress_routes` to `Egress.prepare` and `PipelockProxy.prepare` ## Changes (1 commit) - [`b79b490`](https://gitea.dideric.is/didericis/bot-bottle/src/commit/b79b49090f39a5b990b5a6feadf768d60d4de599) refactor: provision egress routes via AgentProvisionPlan
didericis-claude added 1 commit 2026-06-01 19:31:29 -04:00
refactor: provision egress routes via AgentProvisionPlan
test / unit (pull_request) Failing after 31s
test / integration (pull_request) Failing after 18s
b79b49090f
Remove provider-specific branching from egress.py and pipelock.py.
Previously, `egress_routes_for_bottle` and `pipelock_effective_tls_passthrough`
both contained `template == "codex"` checks — the same pattern the rest
of the PR moved out of the backends.

Root cause: `EgressRoute` had no `tls_passthrough` field, so pipelock
couldn't learn from the synthesised Codex routes that they needed
passthrough. Fix:

- Add `EgressRoute.tls_passthrough: bool`. `egress_manifest_routes` lifts
  the existing `pipelock.tls_passthrough` manifest flag here; provider
  routes set it directly.
- Add `AgentProvisionPlan.egress_routes`. `agent_provision_plan` populates
  it for Codex + `forward_host_credentials`, including `tls_passthrough=True`.
- Replace Codex-specific `egress_routes_for_bottle` logic with a generic
  `_merge_provider_route` helper. Backends call `egress_routes_for_bottle(bottle,
  plan.egress_routes)`; no provider type checks inside egress or pipelock.
- Rewrite `pipelock_effective_tls_passthrough` to read `route.tls_passthrough`
  from the merged route set instead of re-implementing the provider check.
- Both backends now call `agent_provision_plan` before `Egress.prepare` and
  `PipelockProxy.prepare`, threading `plan.egress_routes` to both. `has_provider_auth`
  is derived from `egress_manifest_routes` (manifest routes only — provider
  routes carry no auth roles, so the result is identical).

Assisted-by: Claude Code
didericis added 1 commit 2026-06-01 20:29:19 -04:00
fix(egress): break circular import with manifest via TYPE_CHECKING
test / unit (pull_request) Successful in 44s
test / integration (pull_request) Successful in 55s
0233b481b1
manifest → agent_provider → egress → manifest created a cycle that
caused ImportError on any module import. With from __future__ import
annotations already present, Bottle is only needed at type-check time
(annotations are lazy strings under PEP 563).

Assisted-by: Claude Code
didericis reviewed 2026-06-01 20:37:38 -04:00
@@ -93,0 +117,4 @@
self.assertEqual(CODEX_HOST_CREDENTIAL_TOKEN_REF, r.token_ref)
self.assertTrue(r.tls_passthrough)
def test_codex_without_forward_host_credentials_has_no_egress_routes(self):
Owner

when we don't forward host credentials there should still be egress routes, just not egress routes with an auto-injected token (and we should have passthrough set to true so the tokens the user would set after logging in don't get stripped out)

when we don't forward host credentials there should still be egress routes, just not egress routes with an auto-injected token (and we should have passthrough set to true so the tokens the user would set after logging in don't get stripped out)
didericis marked this conversation as resolved
didericis added 1 commit 2026-06-01 20:39:43 -04:00
fix(codex): emit passthrough egress routes when not forwarding host credentials
test / unit (pull_request) Successful in 47s
test / integration (pull_request) Successful in 49s
86cfd94b72
When forward_host_credentials is false, Codex bottles should still get
tls_passthrough routes for the OpenAI/ChatGPT hosts so that tokens a
user sets via `codex login` after launch aren't stripped by pipelock's
header DLP. Previously no routes were emitted, which would have blocked
those requests entirely once pipelock enforcement tightens.

Rename the test to reflect the new expected behavior.

Assisted-by: Claude Code
didericis reviewed 2026-06-01 20:50:56 -04:00
@@ -170,0 +174,4 @@
# route marker that enables a non-secret placeholder env. Codex is
# intentionally absent here: it should use its device/ChatGPT login
# state, and an OPENAI_API_KEY placeholder would force API-key auth.
has_provider_auth = any(
Owner

lines 177-183 should also be moved into the agent provision step: it's the responsibility of the agent provisioner to determine whether or not there should be something like "egress-placeholder" in the env and which env var it should go into. This should also remove the need for placeholder_env in the provider runtime.

lines 177-183 should also be moved into the agent provision step: it's the responsibility of the agent provisioner to determine whether or not there should be something like "egress-placeholder" in the env and which env var it should go into. This should also remove the need for `placeholder_env` in the provider runtime.
didericis marked this conversation as resolved
Author
Collaborator

Agreed — the has_provider_auth detection and placeholder injection is provider logic that should live in agent_provision_plan, not in each backend. Moving it there will also let us drop placeholder_env from AgentProviderRuntime entirely since the backends will no longer need to know which env var to fill. Will fix.

Agreed — the `has_provider_auth` detection and placeholder injection is provider logic that should live in `agent_provision_plan`, not in each backend. Moving it there will also let us drop `placeholder_env` from `AgentProviderRuntime` entirely since the backends will no longer need to know which env var to fill. Will fix.
didericis added 1 commit 2026-06-01 20:57:49 -04:00
refactor(agent): move placeholder env injection into agent_provision_plan
test / unit (pull_request) Successful in 29s
test / integration (pull_request) Successful in 42s
2eb6e02ee1
The has_provider_auth check and egress-placeholder injection were
duplicated in both backends. Move them into agent_provision_plan so
the provisioner owns that decision entirely:

- Replace has_provider_auth: bool param with manifest_egress_routes,
  compute has_provider_auth internally from the route roles.
- Inject CLAUDE_CODE_OAUTH_TOKEN=egress-placeholder inside the plan
  when has_provider_auth, alongside the existing nonessential-traffic
  vars. Backends no longer touch the placeholder env.
- Remove placeholder_env from AgentProviderRuntime; expose
  placeholder_env_for() for print_util's hide-from-summary logic.

Assisted-by: Claude Code
didericis reviewed 2026-06-01 21:03:19 -04:00
@@ -88,7 +99,6 @@ _RUNTIMES = {
image="bot-bottle-claude:latest",
dockerfile=str(_REPO_ROOT / "Dockerfile.claude"),
auth_role="claude_code_oauth",
Owner

I'm not sure if auth_role needs to be declared in the agent runtime anymore either: in the original plan, we were going to have the user manually specify egress routes. But given all of the provider specific logic we already need to do in the agent provider and the fact that we've already started added to egress routes in there means we should probably be doing that for claude as well, NOT looking for this specific auth role on user defined egress routes

I'm not sure if auth_role needs to be declared in the agent runtime anymore either: in the original plan, we were going to have the user manually specify egress routes. But given all of the provider specific logic we already need to do in the agent provider and the fact that we've already started added to egress routes in there means we should probably be doing that for claude as well, NOT looking for this specific auth role on user defined egress routes
didericis reviewed 2026-06-01 21:04:48 -04:00
@@ -113,6 +122,13 @@ def runtime_for(template: str) -> AgentProviderRuntime:
return _RUNTIMES[template]
def placeholder_env_for(template: str) -> str:
Owner

This doesn't need to be a separate function, and the organization is wrong/we want the code for a specific provider to be making the decision about placeholder envs (a function which switches on different provider templates is the wrong abstraction)

This doesn't need to be a separate function, and the organization is wrong/we want the code for a specific provider to be making the decision about placeholder envs (a function which switches on different provider templates is the wrong abstraction)
Author
Collaborator

Re: placeholder_env_for (#issuecomment-941) — agree, wrong abstraction. Will remove it. One option to fix print_util without template-switching: add a hidden_env_names: frozenset[str] field to AgentProvisionPlan and have the Claude branch populate it with {"CLAUDE_CODE_OAUTH_TOKEN"} when injecting the placeholder — print_util then reads from the plan rather than dispatching on template. Does that direction work, or would you rather handle it another way?

Re: auth_role (#issuecomment-939) — understood on the direction (Claude injects its own routes rather than searching user routes for claude_code_oauth). Question: does "doing that for claude as well" mean injecting a provider route for a specific known host (like Codex does for api.openai.com), or should Claude just always inject the placeholder + disable nonessential-traffic flags unconditionally? The claude_code_oauth docs say "host doesn't matter to the placeholder logic" so I'm not sure whether a route is even the right gate.

Re: `placeholder_env_for` (#issuecomment-941) — agree, wrong abstraction. Will remove it. One option to fix `print_util` without template-switching: add a `hidden_env_names: frozenset[str]` field to `AgentProvisionPlan` and have the Claude branch populate it with `{"CLAUDE_CODE_OAUTH_TOKEN"}` when injecting the placeholder — `print_util` then reads from the plan rather than dispatching on template. Does that direction work, or would you rather handle it another way? Re: `auth_role` (#issuecomment-939) — understood on the direction (Claude injects its own routes rather than searching user routes for `claude_code_oauth`). Question: does "doing that for claude as well" mean injecting a provider route for a specific known host (like Codex does for `api.openai.com`), or should Claude just always inject the placeholder + disable nonessential-traffic flags unconditionally? The `claude_code_oauth` docs say "host doesn't matter to the placeholder logic" so I'm not sure whether a route is even the right gate.
Owner

RE #115 (comment)

Yes, this is what the user defined egress routes are currently:

egress:
  routes:
    - host: api.anthropic.com
      role: claude_code_oauth
      auth:
        scheme: Bearer
        token_ref: BOT_BOTTLE_CLAUDE_OAUTH_TOKEN
      pipelock:
        tls_passthrough: true

We also need to allow agent provisioners to declare required tokens in the agent provider template for this to work so we can pipe BOT_BOTTLE_CLAUDE_OAUTH_TOKEN where it needs to go. EX:

agent_provider:
  template: claude
  auth_token: BOT_BOTTLE_CLAUDE_OAUTH_TOKEN # should take BOT_BOTTLE_CLAUDE_OAUTH_TOKEN from the host env and pipe it down into the egress proxy for api.anthropic.com
RE https://gitea.dideric.is/didericis/bot-bottle/pulls/115#issuecomment-943 Yes, this is what the user defined egress routes are currently: ```yaml egress: routes: - host: api.anthropic.com role: claude_code_oauth auth: scheme: Bearer token_ref: BOT_BOTTLE_CLAUDE_OAUTH_TOKEN pipelock: tls_passthrough: true ``` We also need to allow agent provisioners to declare required tokens in the agent provider template for this to work so we can pipe `BOT_BOTTLE_CLAUDE_OAUTH_TOKEN` where it needs to go. EX: ```yaml agent_provider: template: claude auth_token: BOT_BOTTLE_CLAUDE_OAUTH_TOKEN # should take BOT_BOTTLE_CLAUDE_OAUTH_TOKEN from the host env and pipe it down into the egress proxy for api.anthropic.com ```
Author
Collaborator

Got it — so the end state is: agent_provider.auth_token: BOT_BOTTLE_CLAUDE_OAUTH_TOKEN in the manifest drives the provisioner to inject a provider-owned api.anthropic.com route (Bearer, tls_passthrough, token_ref from that field), the placeholder + nonessential-traffic flags follow from the provisioner having a route to inject, and auth_role + claude_code_oauth go away.

One flag though: PRD 0029 has "Supporting host credential forwarding for non-Codex providers" as an explicit non-goal. The auth_token field + Claude route injection is new scope beyond what the PRD covers. Do you want me to expand this branch to include it, or land what's here and open a follow-up PRD for the Claude provider auth work?

Got it — so the end state is: `agent_provider.auth_token: BOT_BOTTLE_CLAUDE_OAUTH_TOKEN` in the manifest drives the provisioner to inject a provider-owned `api.anthropic.com` route (Bearer, tls_passthrough, token_ref from that field), the placeholder + nonessential-traffic flags follow from the provisioner having a route to inject, and `auth_role` + `claude_code_oauth` go away. One flag though: PRD 0029 has "Supporting host credential forwarding for non-Codex providers" as an explicit non-goal. The `auth_token` field + Claude route injection is new scope beyond what the PRD covers. Do you want me to expand this branch to include it, or land what's here and open a follow-up PRD for the Claude provider auth work?
Owner

RE #115 (comment)

Yeah, we should include it and rescope the other PR to be about injecting provider auth credentials and other settings more generally. Need to do both/makes sense to do them together to make sure we get the agent provider abstraction right.

RE hidden_env_names on the other issue, sounds good/do that.

RE https://gitea.dideric.is/didericis/bot-bottle/pulls/115#issuecomment-945 Yeah, we should include it and rescope the other PR to be about injecting provider auth credentials and other settings more generally. Need to do both/makes sense to do them together to make sure we get the agent provider abstraction right. RE `hidden_env_names` on the other issue, sounds good/do that.
Author
Collaborator

Will expand the branch to include it. I'll update the PRD to reflect the broader scope, implement agent_provider.auth_token, have the Claude provisioner inject the api.anthropic.com route from it, drop auth_role from the runtime, remove placeholder_env_for, and add hidden_env_names to AgentProvisionPlan for print_util. Will also clean up claude_code_oauth role handling once the provider route replaces it.

Will expand the branch to include it. I'll update the PRD to reflect the broader scope, implement `agent_provider.auth_token`, have the Claude provisioner inject the `api.anthropic.com` route from it, drop `auth_role` from the runtime, remove `placeholder_env_for`, and add `hidden_env_names` to `AgentProvisionPlan` for `print_util`. Will also clean up `claude_code_oauth` role handling once the provider route replaces it.
didericis added 1 commit 2026-06-01 21:24:28 -04:00
feat(manifest): add agent_provider.auth_token for Claude OAuth via egress
test / unit (pull_request) Successful in 39s
test / integration (pull_request) Successful in 1m0s
200a113cce
Operators can now declare:

  agent_provider:
    template: claude
    auth_token: BOT_BOTTLE_CLAUDE_OAUTH_TOKEN

and the provisioner injects a provider-owned api.anthropic.com egress
route (Bearer, tls_passthrough) rather than requiring a manually
declared route with the former claude_code_oauth role.

Changes:
- Add auth_token field to AgentProvider; validate claude-only.
- Remove claude_code_oauth from EGRESS_ROLES / PROVIDER_EGRESS_ROLES.
  Manifests that declare the role now fail at parse time with "unknown
  role" — the provisioner owns the route.
- agent_provision_plan: replace manifest_egress_routes/has_provider_auth
  with auth_token; Claude branch injects the api.anthropic.com route,
  placeholder env, and nonessential-traffic flags when auth_token is set.
- Add hidden_env_names: frozenset[str] to AgentProvisionPlan; Claude
  branch populates it with CLAUDE_CODE_OAUTH_TOKEN.
- Remove auth_role from AgentProviderRuntime and placeholder_env_for().
- print_util.visible_agent_env_names: accept hidden_env_names from the
  plan instead of dispatching on agent_provider_template.
- Both backends: drop manifest_egress_routes call, pass auth_token.
- PRD 0029 rescoped to cover both Codex and Claude provider auth.

Assisted-by: Claude Code
didericis reviewed 2026-06-01 21:32:10 -04:00
@@ -174,2 +195,3 @@
)))
if template == PROVIDER_CLAUDE and has_provider_auth:
if template == PROVIDER_CLAUDE and auth_token:
egress_routes.append(EgressRoute(
Owner

Similarly to codex, we should always include these in egress routes (whether or not auth_token is present), but we only have the egress add the auth token when it's present

Similarly to codex, we should always include these in egress routes (whether or not auth_token is present), but we only have the egress add the auth token when it's present
didericis marked this conversation as resolved
didericis added 1 commit 2026-06-01 21:34:31 -04:00
fix(agent): always emit passthrough egress route for api.anthropic.com
test / unit (pull_request) Successful in 45s
test / integration (pull_request) Successful in 56s
962f8144fc
Mirrors the Codex pattern: Claude always gets a tls_passthrough route
for api.anthropic.com so user-set tokens aren't stripped by pipelock,
whether or not auth_token is declared. Auth injection (scheme + token_ref)
and the placeholder env only apply when auth_token is set.

Assisted-by: Claude Code
Author
Collaborator

Done — api.anthropic.com is now always emitted as a tls_passthrough route for Claude (same pattern as Codex + the OpenAI hosts). Auth scheme and token_ref only populated when auth_token is set; placeholder env and hidden_env_names likewise.

Done — `api.anthropic.com` is now always emitted as a `tls_passthrough` route for Claude (same pattern as Codex + the OpenAI hosts). Auth scheme and token_ref only populated when `auth_token` is set; placeholder env and hidden_env_names likewise.
didericis added 1 commit 2026-06-01 21:41:44 -04:00
fix(agent): move default claude env vars to the right location
test / unit (pull_request) Successful in 36s
test / integration (pull_request) Successful in 55s
3b96de95ab
didericis reviewed 2026-06-01 21:48:53 -04:00
@@ -194,3 +182,1 @@
# have a stable role name, but it no longer triggers an
# OPENAI_API_KEY placeholder. Codex bottles should prefer
# device/ChatGPT login state today.
# codex_auth: placeholder marker for Codex egress-held auth flows.
Owner

should remove this/is no longer necessary

should remove this/is no longer necessary
didericis added 1 commit 2026-06-01 21:50:47 -04:00
refactor(manifest): remove codex_auth egress role
test / unit (pull_request) Successful in 30s
test / integration (pull_request) Successful in 42s
8a038dcceb
Both provider-owned roles are now gone. Provider auth routes are
provisioner-owned (claude: auth_token, codex: forward_host_credentials);
the role field and validation plumbing stay for future use but EGRESS_ROLES
is empty. Any manifest declaring a role now fails at parse time.

Assisted-by: Claude Code
didericis added 1 commit 2026-06-01 21:58:03 -04:00
refactor(manifest): remove empty EGRESS_ROLES and related plumbing
test / unit (pull_request) Successful in 30s
test / integration (pull_request) Successful in 45s
650f3aa93e
EGRESS_ROLES, EGRESS_SINGLETON_ROLES, and PROVIDER_EGRESS_ROLES were
all empty frozensets after the codex_auth and claude_code_oauth roles
were removed. Delete the constants and all validation code that iterated
over them (the singleton-role loop and provider-role check in
_validate_egress_routes, the EGRESS_ROLES membership test in
EgressRoute.from_dict). EgressRoute.from_dict now rejects any role
string unconditionally; _validate_egress_routes loses its
agent_provider_template parameter entirely.

Assisted-by: Claude Code
didericis approved these changes 2026-06-01 22:04:17 -04:00
didericis merged commit 650f3aa93e into codex/prd-codex-host-credentials 2026-06-01 22:04:33 -04:00
didericis deleted branch prd-0029-egress-routes-via-agent-provision-plan 2026-06-01 22:04:33 -04:00
Sign in to join this conversation.