Commit Graph

39 Commits

Author SHA1 Message Date
didericis-claude 9282bceaf8 fix: emit WARNING when Docker teardown ExitStack raises (issue #156)
test / unit (pull_request) Successful in 37s
test / integration (pull_request) Successful in 40s
test / unit (push) Successful in 32s
test / integration (push) Successful in 43s
Replace the bare `except BaseException: pass` in the `teardown` closure
with a `warn()` call that includes the container name and operation type
("compose-down"), so cleanup failures are visible in the log rather than
silently discarded.  Non-blocking: the exception is consumed and teardown
continues, preserving the original error-propagation contract.

Add test_docker_launch_teardown.py to lock the new behaviour: it injects
a RuntimeError via a mocked `compose_down` callback and asserts the
WARNING message contains the container name and operation label.
2026-06-03 04:13:53 +00:00
didericis-codex 941f316462 feat(git-gate): remove git remote host override plumbing 2026-06-02 18:17:24 +00:00
didericis 3885e2f5ad fix(workspace): include hidden cwd files in docker layer
test / unit (pull_request) Successful in 32s
test / integration (pull_request) Successful in 42s
test / unit (push) Successful in 33s
test / integration (push) Successful in 40s
2026-06-02 13:12:40 -04:00
didericis-codex d5fcbe53ef feat(workspace): port cwd across backends 2026-06-02 17:01:19 +00:00
didericis-codex 6150497b47 feat(workspace): trust resolved project path 2026-06-02 16:57:52 +00:00
didericis-codex 5308d53288 feat(workspace): add shared workspace plan 2026-06-02 16:56:57 +00:00
didericis 44273be9eb fix(dashboard): stop agents in dashboard from moving during selection
test / unit (push) Successful in 38s
test / integration (push) Successful in 58s
2026-06-02 12:47:00 -04:00
didericis-claude 53219a55e1 refactor: hoist plan fields and print to BottlePlan base class (PRD 0044)
Move git_gate_plan, egress_plan, supervise_plan, and agent_provision
from DockerBottlePlan and SmolmachinesBottlePlan into BottlePlan.
Replace the abstract print method with a single concrete implementation
that renders git gate entries as "name → upstream_host:upstream_port"
and egress routes with conditional "[auth:scheme]" annotations.
2026-06-02 12:12:08 -04:00
didericis-claude a3d9ac9605 feat: persist backend in BottleMetadata; use it in resume and dashboard reattach (PRD 0040)
BottleMetadata gains a backend field (default ""). Docker prepare writes
"docker"; smolmachines prepare writes "smolmachines". read_metadata
deserialises it with "" as the backward-compatible default.

resume now passes metadata.backend to _launch_bottle so a preserved
smolmachines bottle is resumed on the right backend without requiring
BOT_BOTTLE_BACKEND to be set manually.

_bottle_for_slug now reads metadata.backend and constructs a
SmolmachinesBottle for smolmachines slugs instead of always defaulting
to DockerBottle. No-metadata slugs still fall back to Docker.

Closes #137
2026-06-02 11:16:17 -04:00
didericis-claude 8830306101 feat(smolmachines): resolve manifest env through resolve_env() (PRD 0038)
Before this change smolmachines prepare.py spliced bottle.env directly
into guest_env, so ?prompt and ${HOST_VAR} entries reached the VM as
raw sentinels rather than being prompted or interpolated.

After this change prepare.py calls resolve_env(), matching the Docker
backend's contract. Forwarded (secret/interpolated) values still flow
through smolvm -e K=V argv — the known exposure gap documented in PRD
0038's open question.

Closes #135
2026-06-02 14:38:36 +00:00
didericis-claude a81f0ffa49 fix(smolmachines): raise SmolvmError instead of die() on wait_exec_ready timeout
test / unit (pull_request) Successful in 39s
test / integration (pull_request) Successful in 58s
test / unit (push) Successful in 38s
test / integration (push) Successful in 55s
die() raises Die(SystemExit), which implies a process exit. A timeout in
wait_exec_ready is a bringup failure — raising SmolvmError lets the caller
decide whether it's fatal, consistent with how machine_start failures propagate.
2026-06-02 06:29:05 +00:00
didericis-claude 0d922371b0 refactor(smolmachines): decompose launch(), add wait_exec_ready, file-lock allocate() (PRD 0032)
Decompose the 207-line launch() into six named helpers: _allocate_resources,
_mint_certs, _start_bundle, _discover_urls, _launch_vm, _init_vm. Each has
explicit inputs/outputs and is independently testable.

Replace time.sleep(1.5) with smolvm.wait_exec_ready(), which polls
`machine exec true` with exponential backoff. Exits as soon as the exec
channel is ready; dies loudly with a timeout message instead of silently
leaving the VM in an unknown state.

File-lock loopback_alias.allocate() with fcntl.flock(LOCK_EX) so concurrent
bottle launches can't race on docker state and claim the same alias.
2026-06-02 06:23:39 +00:00
didericis-claude 0e29bcc829 refactor(egress): use provisioned_env instead of sentinel for Codex token (PRD 0030)
test / unit (pull_request) Successful in 39s
test / integration (pull_request) Successful in 45s
Add `provisioned_env: dict[str, str]` to `AgentProvisionPlan`. When
`forward_host_credentials=True`, `agent_provision_plan` reads the host
Codex access token at prepare time and stores it under
`CODEX_HOST_CREDENTIAL_TOKEN_REF`. Both backends merge `provisioned_env`
over `os.environ` before calling `egress_resolve_token_values`, so the
token slot resolves like any other manifest-declared token ref.

Removes `egress_resolve_token_values_with_provider` and the sentinel
`continue` skip from `egress_resolve_token_values`. The function is now
fully generic — it neither knows nor cares about provider identity.
2026-06-02 04:53:23 +00:00
didericis-claude 75f0f9d907 refactor(egress): deduplicate token resolution across backends (PRD 0030)
Extract egress_resolve_token_values_with_provider into bot_bottle/egress.py.
Both docker and smolmachines launch paths now call the shared function
instead of duplicating the forward_host_credentials / CODEX_HOST_CREDENTIAL_TOKEN_REF
resolution block.

Also fixes the host_env: object annotation on smolmachines._resolve_token_env
to the correct dict[str, str].

Closes #118.
2026-06-02 04:22:43 +00:00
didericis 2dd8113f7c fix(smolmachines): retry CA install after exec SIGKILL
test / unit (push) Successful in 38s
test / integration (push) Successful in 54s
2026-06-01 23:28:33 -04:00
didericis-claude de9bd7eb83 feat(manifest): add agent_provider.auth_token for Claude OAuth via egress
Operators can now declare:

  agent_provider:
    template: claude
    auth_token: BOT_BOTTLE_CLAUDE_OAUTH_TOKEN

and the provisioner injects a provider-owned api.anthropic.com egress
route (Bearer, tls_passthrough) rather than requiring a manually
declared route with the former claude_code_oauth role.

Changes:
- Add auth_token field to AgentProvider; validate claude-only.
- Remove claude_code_oauth from EGRESS_ROLES / PROVIDER_EGRESS_ROLES.
  Manifests that declare the role now fail at parse time with "unknown
  role" — the provisioner owns the route.
- agent_provision_plan: replace manifest_egress_routes/has_provider_auth
  with auth_token; Claude branch injects the api.anthropic.com route,
  placeholder env, and nonessential-traffic flags when auth_token is set.
- Add hidden_env_names: frozenset[str] to AgentProvisionPlan; Claude
  branch populates it with CLAUDE_CODE_OAUTH_TOKEN.
- Remove auth_role from AgentProviderRuntime and placeholder_env_for().
- print_util.visible_agent_env_names: accept hidden_env_names from the
  plan instead of dispatching on agent_provider_template.
- Both backends: drop manifest_egress_routes call, pass auth_token.
- PRD 0029 rescoped to cover both Codex and Claude provider auth.

Assisted-by: Claude Code
2026-06-01 22:24:17 -04:00
didericis-claude 952dcd7eec refactor(agent): move placeholder env injection into agent_provision_plan
The has_provider_auth check and egress-placeholder injection were
duplicated in both backends. Move them into agent_provision_plan so
the provisioner owns that decision entirely:

- Replace has_provider_auth: bool param with manifest_egress_routes,
  compute has_provider_auth internally from the route roles.
- Inject CLAUDE_CODE_OAUTH_TOKEN=egress-placeholder inside the plan
  when has_provider_auth, alongside the existing nonessential-traffic
  vars. Backends no longer touch the placeholder env.
- Remove placeholder_env from AgentProviderRuntime; expose
  placeholder_env_for() for print_util's hide-from-summary logic.

Assisted-by: Claude Code
2026-06-01 22:24:17 -04:00
didericis-claude 884cedc160 refactor: provision egress routes via AgentProvisionPlan
Remove provider-specific branching from egress.py and pipelock.py.
Previously, `egress_routes_for_bottle` and `pipelock_effective_tls_passthrough`
both contained `template == "codex"` checks — the same pattern the rest
of the PR moved out of the backends.

Root cause: `EgressRoute` had no `tls_passthrough` field, so pipelock
couldn't learn from the synthesised Codex routes that they needed
passthrough. Fix:

- Add `EgressRoute.tls_passthrough: bool`. `egress_manifest_routes` lifts
  the existing `pipelock.tls_passthrough` manifest flag here; provider
  routes set it directly.
- Add `AgentProvisionPlan.egress_routes`. `agent_provision_plan` populates
  it for Codex + `forward_host_credentials`, including `tls_passthrough=True`.
- Replace Codex-specific `egress_routes_for_bottle` logic with a generic
  `_merge_provider_route` helper. Backends call `egress_routes_for_bottle(bottle,
  plan.egress_routes)`; no provider type checks inside egress or pipelock.
- Rewrite `pipelock_effective_tls_passthrough` to read `route.tls_passthrough`
  from the merged route set instead of re-implementing the provider check.
- Both backends now call `agent_provision_plan` before `Egress.prepare` and
  `PipelockProxy.prepare`, threading `plan.egress_routes` to both. `has_provider_auth`
  is derived from `egress_manifest_routes` (manifest routes only — provider
  routes carry no auth roles, so the result is identical).

Assisted-by: Claude Code
2026-06-01 22:24:17 -04:00
didericis-codex 76a7921ae6 refactor(agent): move claude env defaults into plan 2026-06-01 22:24:17 -04:00
didericis-codex c8ab0c67a8 refactor(agent): surface provider env defaults 2026-06-01 22:24:17 -04:00
didericis-codex e808e81b87 refactor(agent): group provider provisioning into plan 2026-06-01 22:24:17 -04:00
didericis-codex 36ce7aed4f refactor(codex): derive trusted paths from guest home 2026-06-01 22:24:17 -04:00
didericis-codex a5d83bdcdc fix(codex): trust launch home directory 2026-06-01 22:24:17 -04:00
didericis-codex 8e6583fcb7 fix(codex): trust bottle workspace on launch 2026-06-01 22:24:17 -04:00
didericis-codex ac1aa197d4 fix(smolmachines): reset codex runtime db before auth check 2026-06-01 22:24:17 -04:00
didericis-codex 68e5097534 fix(codex): make host-credential bottles actually authenticate
Debugging a live codex smolmachines bottle surfaced three independent
failures past the sign-in screen; fix each so forward_host_credentials
works end to end:

- codex_auth: dummy access/id tokens now inherit the *real* host token's
  exp instead of now+1h. Codex (0.135) refreshes when its local token's
  JWT exp lapses; with a placeholder refresh_token that refresh fails and
  drops to the sign-in screen. Aligning exp tracks the real token's life.

- prepare: set CODEX_CA_CERTIFICATE to the agent CA bundle for codex
  bottles. Codex is rustls and ignores the system store / NODE_EXTRA_CA_
  CERTS; it reads CODEX_CA_CERTIFICATE (fallback SSL_CERT_FILE) for custom
  roots across HTTPS + wss, so it must be pointed at the egress MITM CA or
  injection can't work without tls_passthrough.

- pipelock: auto tls_passthrough the Codex API hosts when
  forward_host_credentials is on. Egress injects the bearer before
  pipelock, whose header DLP then flags the JWT ("request header contains
  secret") and the retry storm trips its 429. passthrough host-gates the
  CONNECT but skips decrypt+rescan of egress-owned auth. The auto-added
  routes aren't in bottle.egress.routes, so the hosts are added explicitly.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 22:24:17 -04:00
didericis-codex a6332b9535 fix(codex): provision dummy user auth state 2026-06-01 22:24:17 -04:00
didericis-codex 711cb9c194 feat(codex): inject host credentials via egress 2026-06-01 22:24:17 -04:00
didericis-codex 6ea19a8d53 fix(git-gate): use smart http for smolmachines pushes
test / unit (pull_request) Successful in 40s
test / integration (pull_request) Successful in 54s
test / unit (push) Successful in 37s
test / integration (push) Successful in 44s
2026-05-29 23:21:50 -04:00
didericis-codex cea832b21d fix(codex): stop injecting api key placeholder
test / unit (pull_request) Successful in 27s
test / integration (pull_request) Successful in 41s
2026-05-29 02:39:37 -04:00
didericis 0708e99e4e feat(manifest): lift git.user to the agent layer
Agents may declare git.user (name/email); it overlays the referenced
bottle's git.user per-field at Manifest.bottle_for (agent wins on
non-empty), mirroring the extends: merge. git.remotes is rejected on
agents — it carries credentials and host trust and stays bottle-only.

The overlay lives at bottle_for, the single chokepoint both backends
use, so the docker/smolmachines git provisioners are unchanged. Adds
Manifest.git_identity_summary with per-field (agent)/(bottle)
provenance, printed in both preflights and `info`.

Refs #94

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-28 21:10:47 -04:00
didericis-claude e641bacf2d refactor(backend): move AGENT_CA path/bundle constants to shared util
test / unit (pull_request) Successful in 34s
test / integration (pull_request) Successful in 58s
test / unit (push) Successful in 27s
test / integration (push) Successful in 43s
The two Debian-family CA-layout constants lived in
docker/provision/ca.py, which forced the smolmachines backend to
import them cross-backend (smolmachines -> docker). Move them into
the shared backend/util.py next to select_ca_cert; docker, compose,
and smolmachines now all import from there. No behavior change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 00:14:21 +00:00
didericis-claude c9b18ea17e refactor(backend): lift shared CA cert select + fingerprint helpers
Both backends' provision_ca duplicated _select_ca_cert and the
SHA-256 fingerprint computation verbatim. Lift them into the shared
backend/util.py as select_ca_cert + log_ca_fingerprint; docker and
smolmachines now call the shared helpers. No behavior change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 00:13:59 +00:00
didericis-codex c854db87c6 fix(git): mount git-gate known hosts
test / unit (push) Successful in 36s
test / integration (push) Successful in 57s
test / unit (pull_request) Successful in 32s
test / integration (pull_request) Successful in 59s
2026-05-28 19:59:37 -04:00
didericis-codex 9399626ba6 fix(agent): hide auth placeholder env in preflight
test / unit (pull_request) Successful in 31s
test / integration (pull_request) Successful in 55s
2026-05-28 19:00:39 -04:00
didericis-codex 43cd83d77b fix(smolmachines): build sidecar image before launch
test / unit (pull_request) Successful in 26s
test / integration (pull_request) Successful in 39s
2026-05-28 18:49:28 -04:00
didericis-codex 7f3998e79e fix(dashboard): quiet docker polling errors
test / unit (pull_request) Successful in 29s
test / integration (pull_request) Successful in 41s
2026-05-28 18:33:13 -04:00
didericis-codex 1cbedc91c0 refactor(agent): use agent-neutral runtime names
Assisted-by: Codex
2026-05-28 17:59:24 -04:00
didericis-codex c08b09dc9f refactor!: rename project to bot-bottle
Assisted-by: Codex
2026-05-28 17:56:14 -04:00