Replace the bare `except BaseException: pass` in the `teardown` closure
with a `warn()` call that includes the container name and operation type
("compose-down"), so cleanup failures are visible in the log rather than
silently discarded. Non-blocking: the exception is consumed and teardown
continues, preserving the original error-propagation contract.
Add test_docker_launch_teardown.py to lock the new behaviour: it injects
a RuntimeError via a mocked `compose_down` callback and asserts the
WARNING message contains the container name and operation label.
Previously when the access-hook returned non-zero, git-http would pipe
the hook's stderr into the 403 body sent back to the agent's git
client but never log it locally, so docker logs just showed
`"GET ... 403 -"` with no explanation. Operators had to shell into
the sidecar and re-run the hook by hand to find out why a clone was
being refused (e.g. upstream SSH unreachable, missing credentials).
Route the hook's stderr/stdout through the existing log_message
channel before sending the 403, one log line per output line so the
default request-log format stays readable. When the hook exits
non-zero with no output, log the exit code so the line is still
informative.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Shared fixtures build DockerBottlePlan and SmolmachinesBottlePlan from
identical git_gate_plan and egress_plan inputs and assert that both
backends render the same git gate lines (name → host:port) and egress
lines (host [auth:scheme] when authenticated, host alone otherwise).
Closes#140. In restart_daemon, the old process's stdout pipe was never
explicitly closed after p.wait() returned, leaking the fd until the
supervisor object was GC'd. Similarly, when the watch loop converged
(all children dead), no pipe was closed. Both paths now call
p.stdout.close() immediately after the process is confirmed exited.
Tests enforce this with warnings.simplefilter("error", ResourceWarning)
in TestSupervisor.setUp.
Closes#139. Adds tests/unit/test_backend_parity.py which verifies that
DockerBottle and SmolmachinesBottle expose identical observable contracts
for agent_argv shape, env injection, exec user-switching, ExecResult
fields, and close() idempotency. All assertions use mock subprocess
layers — no live Docker daemon or VM required.
Before this change, int() on a non-numeric Content-Length raised an
unhandled ValueError, crashing the request handler. There was also no
upper bound on how much memory a POST body could consume.
After this change:
- Non-numeric or missing Content-Length returns HTTP 400.
- Negative Content-Length returns HTTP 400.
- Bodies declared larger than 1 MiB (_MAX_BODY_BYTES) return HTTP 413,
matching the cap already in supervise_server.py.
Closes#138
BottleMetadata gains a backend field (default ""). Docker prepare writes
"docker"; smolmachines prepare writes "smolmachines". read_metadata
deserialises it with "" as the backward-compatible default.
resume now passes metadata.backend to _launch_bottle so a preserved
smolmachines bottle is resumed on the right backend without requiring
BOT_BOTTLE_BACKEND to be set manually.
_bottle_for_slug now reads metadata.backend and constructs a
SmolmachinesBottle for smolmachines slugs instead of always defaulting
to DockerBottle. No-metadata slugs still fall back to Docker.
Closes#137
apply_capability_change is Docker-only teardown/apply code. Before this
change it was called regardless of backend, so approving a capability-block
proposal from a smolmachines agent would run Docker commands against a
slug that has no Docker container.
After this change approve() reads the bottle's metadata: if compose_project
is empty (the smolmachines indicator) it raises CapabilityApplyError with
a clear operator message before any teardown runs. Docker bottles (non-empty
compose_project) and unknown bottles (no metadata) fall through to the
existing Docker path unchanged.
Closes#136
Before this change smolmachines prepare.py spliced bottle.env directly
into guest_env, so ?prompt and ${HOST_VAR} entries reached the VM as
raw sentinels rather than being prompted or interpolated.
After this change prepare.py calls resolve_env(), matching the Docker
backend's contract. Forwarded (secret/interpolated) values still flow
through smolvm -e K=V argv — the known exposure gap documented in PRD
0038's open question.
Closes#135
die() raises Die(SystemExit), which implies a process exit. A timeout in
wait_exec_ready is a bringup failure — raising SmolvmError lets the caller
decide whether it's fatal, consistent with how machine_start failures propagate.
Decompose the 207-line launch() into six named helpers: _allocate_resources,
_mint_certs, _start_bundle, _discover_urls, _launch_vm, _init_vm. Each has
explicit inputs/outputs and is independently testable.
Replace time.sleep(1.5) with smolvm.wait_exec_ready(), which polls
`machine exec true` with exponential backoff. Exits as soon as the exec
channel is ready; dies loudly with a timeout message instead of silently
leaving the VM in an unknown state.
File-lock loopback_alias.allocate() with fcntl.flock(LOCK_EX) so concurrent
bottle launches can't race on docker state and claim the same alias.
Replace _merge_provider_route's five-case nested conditional with a flat
provisioned-wins merge: provider routes claim their hosts outright, manifest
routes for unclaimed hosts append unchanged. Token slot assignment moves to a
single _assign_token_slots pass over the merged list.
Add _route_to_yaml_fields as the single authoritative EgressRoute→YAML mapping,
eliminating the risk of EgressRoute and egress_addon_core.Route silently
drifting apart when new fields are added.
egress_manifest_routes is now a pure lifter with no slot assignment.
_merge_provider_route and _find_or_alloc_token_env are removed.
Tests updated: conflict-die case removed, upgrade-bare replaced with
provider-wins semantics, slot-assignment tests moved to TestSlotAssignment.
Add `provisioned_env: dict[str, str]` to `AgentProvisionPlan`. When
`forward_host_credentials=True`, `agent_provision_plan` reads the host
Codex access token at prepare time and stores it under
`CODEX_HOST_CREDENTIAL_TOKEN_REF`. Both backends merge `provisioned_env`
over `os.environ` before calling `egress_resolve_token_values`, so the
token slot resolves like any other manifest-declared token ref.
Removes `egress_resolve_token_values_with_provider` and the sentinel
`continue` skip from `egress_resolve_token_values`. The function is now
fully generic — it neither knows nor cares about provider identity.
Extract egress_resolve_token_values_with_provider into bot_bottle/egress.py.
Both docker and smolmachines launch paths now call the shared function
instead of duplicating the forward_host_credentials / CODEX_HOST_CREDENTIAL_TOKEN_REF
resolution block.
Also fixes the host_env: object annotation on smolmachines._resolve_token_env
to the correct dict[str, str].
Closes#118.
Both provider-owned roles are now gone. Provider auth routes are
provisioner-owned (claude: auth_token, codex: forward_host_credentials);
the role field and validation plumbing stay for future use but EGRESS_ROLES
is empty. Any manifest declaring a role now fails at parse time.
Assisted-by: Claude Code
Mirrors the Codex pattern: Claude always gets a tls_passthrough route
for api.anthropic.com so user-set tokens aren't stripped by pipelock,
whether or not auth_token is declared. Auth injection (scheme + token_ref)
and the placeholder env only apply when auth_token is set.
Assisted-by: Claude Code
Operators can now declare:
agent_provider:
template: claude
auth_token: BOT_BOTTLE_CLAUDE_OAUTH_TOKEN
and the provisioner injects a provider-owned api.anthropic.com egress
route (Bearer, tls_passthrough) rather than requiring a manually
declared route with the former claude_code_oauth role.
Changes:
- Add auth_token field to AgentProvider; validate claude-only.
- Remove claude_code_oauth from EGRESS_ROLES / PROVIDER_EGRESS_ROLES.
Manifests that declare the role now fail at parse time with "unknown
role" — the provisioner owns the route.
- agent_provision_plan: replace manifest_egress_routes/has_provider_auth
with auth_token; Claude branch injects the api.anthropic.com route,
placeholder env, and nonessential-traffic flags when auth_token is set.
- Add hidden_env_names: frozenset[str] to AgentProvisionPlan; Claude
branch populates it with CLAUDE_CODE_OAUTH_TOKEN.
- Remove auth_role from AgentProviderRuntime and placeholder_env_for().
- print_util.visible_agent_env_names: accept hidden_env_names from the
plan instead of dispatching on agent_provider_template.
- Both backends: drop manifest_egress_routes call, pass auth_token.
- PRD 0029 rescoped to cover both Codex and Claude provider auth.
Assisted-by: Claude Code
The has_provider_auth check and egress-placeholder injection were
duplicated in both backends. Move them into agent_provision_plan so
the provisioner owns that decision entirely:
- Replace has_provider_auth: bool param with manifest_egress_routes,
compute has_provider_auth internally from the route roles.
- Inject CLAUDE_CODE_OAUTH_TOKEN=egress-placeholder inside the plan
when has_provider_auth, alongside the existing nonessential-traffic
vars. Backends no longer touch the placeholder env.
- Remove placeholder_env from AgentProviderRuntime; expose
placeholder_env_for() for print_util's hide-from-summary logic.
Assisted-by: Claude Code
When forward_host_credentials is false, Codex bottles should still get
tls_passthrough routes for the OpenAI/ChatGPT hosts so that tokens a
user sets via `codex login` after launch aren't stripped by pipelock's
header DLP. Previously no routes were emitted, which would have blocked
those requests entirely once pipelock enforcement tightens.
Rename the test to reflect the new expected behavior.
Assisted-by: Claude Code
Remove provider-specific branching from egress.py and pipelock.py.
Previously, `egress_routes_for_bottle` and `pipelock_effective_tls_passthrough`
both contained `template == "codex"` checks — the same pattern the rest
of the PR moved out of the backends.
Root cause: `EgressRoute` had no `tls_passthrough` field, so pipelock
couldn't learn from the synthesised Codex routes that they needed
passthrough. Fix:
- Add `EgressRoute.tls_passthrough: bool`. `egress_manifest_routes` lifts
the existing `pipelock.tls_passthrough` manifest flag here; provider
routes set it directly.
- Add `AgentProvisionPlan.egress_routes`. `agent_provision_plan` populates
it for Codex + `forward_host_credentials`, including `tls_passthrough=True`.
- Replace Codex-specific `egress_routes_for_bottle` logic with a generic
`_merge_provider_route` helper. Backends call `egress_routes_for_bottle(bottle,
plan.egress_routes)`; no provider type checks inside egress or pipelock.
- Rewrite `pipelock_effective_tls_passthrough` to read `route.tls_passthrough`
from the merged route set instead of re-implementing the provider check.
- Both backends now call `agent_provision_plan` before `Egress.prepare` and
`PipelockProxy.prepare`, threading `plan.egress_routes` to both. `has_provider_auth`
is derived from `egress_manifest_routes` (manifest routes only — provider
routes carry no auth roles, so the result is identical).
Assisted-by: Claude Code
Debugging a live codex smolmachines bottle surfaced three independent
failures past the sign-in screen; fix each so forward_host_credentials
works end to end:
- codex_auth: dummy access/id tokens now inherit the *real* host token's
exp instead of now+1h. Codex (0.135) refreshes when its local token's
JWT exp lapses; with a placeholder refresh_token that refresh fails and
drops to the sign-in screen. Aligning exp tracks the real token's life.
- prepare: set CODEX_CA_CERTIFICATE to the agent CA bundle for codex
bottles. Codex is rustls and ignores the system store / NODE_EXTRA_CA_
CERTS; it reads CODEX_CA_CERTIFICATE (fallback SSL_CERT_FILE) for custom
roots across HTTPS + wss, so it must be pointed at the egress MITM CA or
injection can't work without tls_passthrough.
- pipelock: auto tls_passthrough the Codex API hosts when
forward_host_credentials is on. Egress injects the bearer before
pipelock, whose header DLP then flags the JWT ("request header contains
secret") and the retry storm trips its 429. passthrough host-gates the
CONNECT but skips decrypt+rescan of egress-owned auth. The auto-added
routes aren't in bottle.egress.routes, so the hosts are added explicitly.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>