Cleanup backend and agent provider abstractions #216

Merged
didericis merged 21 commits from issue-215-dockerfile-colocation into main 2026-06-08 23:01:36 -04:00
Collaborator

Closes #215.

Summary

  • Moves the built-in Claude and Codex Dockerfiles into their provider packages at bot_bottle/contrib/claude/Dockerfile and bot_bottle/contrib/codex/Dockerfile so each provider owns its default image build inputs.
  • Makes AgentProvider.dockerfile and AgentProvider.guest_home provider-level defaults, removes Dockerfile paths from AgentProviderRuntime, and falls back to each provider's bundled Dockerfile unless the manifest overrides it.
  • Promotes AgentProvisionPlan as the source of truth for instance name, prompt file, image, Dockerfile path, and guest home across Docker and smolmachines bottle plans.
  • Extracts shared backend plan-resolution work into bot_bottle/backend/resolve_common.py, renames backend prepare.py modules to resolve_plan.py, and threads the resolved environment and launch slug through backend-specific resolution.
  • Restores --cwd workspace provisioning through BottleBackend.provision_workspace against the running bottle for both Docker and smolmachines, without restoring the Docker derived-image workspace build path.
  • Moves bottle_state.py to the top-level bot_bottle package and updates imports/callers accordingly.
  • Prefixes manifest model classes with Manifest* to distinguish manifest data from runtime/provider objects.
  • Updates egress outbound DLP scanning to ignore an agent-supplied Authorization header when the matched route injects sidecar-owned auth.
  • Updates affected unit/integration tests for the provider defaults, plan fields, backend prepare wiring, runtime workspace copy, imports, and egress behavior.
Closes #215. ## Summary - Moves the built-in Claude and Codex Dockerfiles into their provider packages at `bot_bottle/contrib/claude/Dockerfile` and `bot_bottle/contrib/codex/Dockerfile` so each provider owns its default image build inputs. - Makes `AgentProvider.dockerfile` and `AgentProvider.guest_home` provider-level defaults, removes Dockerfile paths from `AgentProviderRuntime`, and falls back to each provider's bundled Dockerfile unless the manifest overrides it. - Promotes `AgentProvisionPlan` as the source of truth for instance name, prompt file, image, Dockerfile path, and guest home across Docker and smolmachines bottle plans. - Extracts shared backend plan-resolution work into `bot_bottle/backend/resolve_common.py`, renames backend `prepare.py` modules to `resolve_plan.py`, and threads the resolved environment and launch slug through backend-specific resolution. - Restores `--cwd` workspace provisioning through `BottleBackend.provision_workspace` against the running bottle for both Docker and smolmachines, without restoring the Docker derived-image workspace build path. - Moves `bottle_state.py` to the top-level `bot_bottle` package and updates imports/callers accordingly. - Prefixes manifest model classes with `Manifest*` to distinguish manifest data from runtime/provider objects. - Updates egress outbound DLP scanning to ignore an agent-supplied Authorization header when the matched route injects sidecar-owned auth. - Updates affected unit/integration tests for the provider defaults, plan fields, backend prepare wiring, runtime workspace copy, imports, and egress behavior.
didericis reviewed 2026-06-07 23:42:17 -04:00
@@ -134,0 +129,4 @@
"""Path to the Claude provider Dockerfile. Resolved at call time so
the path is correct regardless of where this module is imported from."""
# bot_bottle/backend/docker/ -> bot_bottle/ -> contrib/claude/Dockerfile
return Path(__file__).resolve().parent.parent.parent / "contrib" / "claude" / "Dockerfile"
Owner

Let's have the path be declared in the agent provider, and have it be a public attribute. Probably want to set it relative to file specifically for these inbuilt agent providers, but other agent providers could declare it elsewhere.

The other option would be to hardcode the agent provider to look for a dockerfile that sits next to the agent provider file in an agent provider contrib folder. What seems like the better option? Am thinking defaulting to a dockerfile that sits next to the agent provider file if no public attribute is set when declaring an agent provider probably makes sesne.

Let's have the path be declared in the agent provider, and have it be a public attribute. Probably want to set it relative to __file__ specifically for these inbuilt agent providers, but other agent providers could declare it elsewhere. The other option would be to hardcode the agent provider to look for a dockerfile that sits next to the agent provider file in an agent provider contrib folder. What seems like the better option? Am thinking defaulting to a dockerfile that sits next to the agent provider file if no public attribute is set when declaring an agent provider probably makes sesne.
Author
Collaborator

Agree that the convention-based default is the better option — providers that follow the layout (Dockerfile next to agent_provider.py) get it for free without any boilerplate, and the attribute override is an escape hatch for the non-standard case.

Concretely I am thinking:

  • Add a dockerfile property to the AgentProvider base class that returns the path of a sibling Dockerfile file (i.e. Path(inspect.getfile(type(self))).parent / "Dockerfile") when that file exists, else None.
  • Built-in providers inherit that automatically (since we just moved the Dockerfiles there).
  • Custom providers at ~/.bot-bottle/contrib/<template>/agent_provider.py work the same way if they drop a Dockerfile next to the file.
  • A provider needing a non-standard path just overrides the property.
  • _repo_dockerfile_path() in capability_apply.py (and the equivalent fallback in prepare.py) would delegate to provider.dockerfile instead of hardcoding.

One question: AgentProviderRuntime already carries a dockerfile field that the built-in providers populate. Should AgentProvider.dockerfile delegate to self.runtime.dockerfile first and fall back to the sibling-file convention? Or should runtime.dockerfile be dropped in favour of the new property? Happy to proceed either way — just want to confirm before updating the implementation.

Agree that the convention-based default is the better option — providers that follow the layout (Dockerfile next to `agent_provider.py`) get it for free without any boilerplate, and the attribute override is an escape hatch for the non-standard case. Concretely I am thinking: - Add a `dockerfile` property to the `AgentProvider` base class that returns the path of a sibling `Dockerfile` file (i.e. `Path(inspect.getfile(type(self))).parent / "Dockerfile"`) when that file exists, else `None`. - Built-in providers inherit that automatically (since we just moved the Dockerfiles there). - Custom providers at `~/.bot-bottle/contrib/<template>/agent_provider.py` work the same way if they drop a `Dockerfile` next to the file. - A provider needing a non-standard path just overrides the property. - `_repo_dockerfile_path()` in `capability_apply.py` (and the equivalent fallback in `prepare.py`) would delegate to `provider.dockerfile` instead of hardcoding. One question: `AgentProviderRuntime` already carries a `dockerfile` field that the built-in providers populate. Should `AgentProvider.dockerfile` delegate to `self.runtime.dockerfile` first and fall back to the sibling-file convention? Or should `runtime.dockerfile` be dropped in favour of the new property? Happy to proceed either way — just want to confirm before updating the implementation.
Owner

@didericis-claude runtime.dockerfile should be dropped.

@didericis-claude `runtime.dockerfile` should be dropped.
Author
Collaborator

Done in 963a178. runtime.dockerfile is dropped. AgentProvider now has a dockerfile property that returns the path of a sibling Dockerfile file (via inspect.getfile(type(self))), or None if no file exists. Built-in providers inherit it for free; custom user providers work the same way. All callers (docker/prepare.py, smolmachines/prepare.py, capability_apply.py) now call provider_obj.dockerfile directly.

Done in 963a178. `runtime.dockerfile` is dropped. `AgentProvider` now has a `dockerfile` property that returns the path of a sibling `Dockerfile` file (via `inspect.getfile(type(self))`), or `None` if no file exists. Built-in providers inherit it for free; custom user providers work the same way. All callers (`docker/prepare.py`, `smolmachines/prepare.py`, `capability_apply.py`) now call `provider_obj.dockerfile` directly.
didericis reviewed 2026-06-08 00:01:39 -04:00
@@ -113,4 +112,2 @@
image_default = provider_runtime.image
else:
image_default = provider_runtime.image
image = os.environ.get("BOT_BOTTLE_IMAGE", image_default)
Owner

Are we using this anywhere? Not sure if it's worth keeping this. If it's being used should be renamed to BOT_BOTTLE_AGENT_IMAGE.

Are we using this anywhere? Not sure if it's worth keeping this. If it's being used should be renamed to `BOT_BOTTLE_AGENT_IMAGE`.
didericis marked this conversation as resolved
Author
Collaborator

It is used — both docker/prepare.py and smolmachines/prepare.py read BOT_BOTTLE_IMAGE as an env-var override that pins the agent image at launch time (useful in CI / integration tests to skip triggering a docker build). It's not referenced in tests, docs, or examples though, so it's an undocumented escape hatch.

Agree BOT_BOTTLE_AGENT_IMAGE is clearer (distinguishes it from BOT_BOTTLE_DERIVED_IMAGE and any future image overrides). Happy to rename it in both prepare files as part of this PR — want me to go ahead?

It is used — both `docker/prepare.py` and `smolmachines/prepare.py` read `BOT_BOTTLE_IMAGE` as an env-var override that pins the agent image at launch time (useful in CI / integration tests to skip triggering a docker build). It's not referenced in tests, docs, or examples though, so it's an undocumented escape hatch. Agree `BOT_BOTTLE_AGENT_IMAGE` is clearer (distinguishes it from `BOT_BOTTLE_DERIVED_IMAGE` and any future image overrides). Happy to rename it in both prepare files as part of this PR — want me to go ahead?
Owner

@didericis-claude no, remove it. Don't think we need it/overriding the image file seems unnecessary if we aren't using it in tests or in the wild. Can add it back if/when merited.

@didericis-claude no, remove it. Don't think we need it/overriding the image file seems unnecessary if we aren't using it in tests or in the wild. Can add it back if/when merited.
Author
Collaborator

Done in 41590edBOT_BOTTLE_IMAGE removed from both docker/prepare.py and smolmachines/prepare.py.

Done in 41590ed — `BOT_BOTTLE_IMAGE` removed from both `docker/prepare.py` and `smolmachines/prepare.py`.
didericis reviewed 2026-06-08 02:05:00 -04:00
@@ -128,2 +128,4 @@
template."""
@property
def dockerfile(self) -> Path | None:
Owner

This should always return a docker/should never return None

This should always return a docker/should never return `None`
didericis marked this conversation as resolved
Author
Collaborator

Done in e0ecb7cdockerfile now returns Path unconditionally; all is not None guards on the result removed from callers.

Done in e0ecb7c — `dockerfile` now returns `Path` unconditionally; all `is not None` guards on the result removed from callers.
didericis reviewed 2026-06-08 02:11:17 -04:00
@@ -111,4 +103,1 @@
dockerfile_path = str(user_dockerfile)
else:
image_default = provider_runtime.image
else:
Owner

A provider should always have a dockerfile now, so this else shouldn't be necessary.

A provider should always have a dockerfile now, so this else shouldn't be necessary.
didericis marked this conversation as resolved
didericis reviewed 2026-06-08 02:12:20 -04:00
@@ -117,0 +107,4 @@
else:
image_default = f"bot-bottle-{provider.template}:{slug}"
dockerfile_path = str(p_dockerfile)
image = image_default
Owner

This rename used to make sense, but not anymore: places where we're assigning image_default should be changed so we assign to image and we should just remove this image = image_default assignment.

This rename used to make sense, but not anymore: places where we're assigning `image_default` should be changed so we assign to `image` and we should just remove this `image = image_default` assignment.
didericis marked this conversation as resolved
didericis reviewed 2026-06-08 02:14:24 -04:00
@@ -60,3 +60,3 @@
bottle = manifest.bottle_for(spec.agent_name)
provider = bottle.agent_provider
provider_runtime = runtime_for(provider.template)
provider_obj = get_provider(provider.template)
Owner

We shouldn't be doing this/we already have the provider object on line 61.

We shouldn't be doing this/we already have the provider object on line 61.
didericis marked this conversation as resolved
Author
Collaborator

#1864 + #1866 — done in a4e1285. Provider default is set first, overrides applied after; image_default is gone and so is the trailing else.

#1868 — need a clarification: provider = bottle.agent_provider on line 61 is a ManifestAgentProvider (the manifest data object — template name, optional dockerfile path, auth_token, etc.), not the plugin AgentProvider. get_provider(provider.template) is what gives us the plugin instance (.dockerfile, .runtime). Are you suggesting that bottle.agent_provider should return the plugin directly, or something else?

**#1864 + #1866** — done in a4e1285. Provider default is set first, overrides applied after; `image_default` is gone and so is the trailing `else`. **#1868** — need a clarification: `provider = bottle.agent_provider` on line 61 is a `ManifestAgentProvider` (the manifest data object — template name, optional dockerfile path, auth_token, etc.), not the plugin `AgentProvider`. `get_provider(provider.template)` is what gives us the plugin instance (`.dockerfile`, `.runtime`). Are you suggesting that `bottle.agent_provider` should return the plugin directly, or something else?
Owner

@didericis-claude we shouldn't have multiple classes with the same name... actual class name is AgentProvider, even though it's a manifest. I thought it was the actual AgentProvider object when looking in my IDE because it has the same name.

Rename all of the manifest related class names to ManifestX (so ManifestAgentProvider, ManifestEgressRoute, etc) so it's clear which are just data classes and which are not.

@didericis-claude we shouldn't have multiple classes with the same name... actual class name is `AgentProvider`, even though it's a manifest. I thought it was the actual `AgentProvider` object when looking in my IDE because it has the same name. Rename all of the manifest related class names to `ManifestX` (so `ManifestAgentProvider`, `ManifestEgressRoute`, etc) so it's clear which are just data classes and which are not.
Author
Collaborator

Got it. The conflicts are manifest_agent.py:AgentProvider vs agent_provider.py:AgentProvider and manifest_egress.py:EgressRoute vs egress.py:EgressRoute (plus any other manifest classes that shadow runtime names). The rename would touch the manifest files and all their importers across the codebase. Want me to do it in this PR, or open a separate one so this PR stays focused on the Dockerfile colocation work?

Got it. The conflicts are `manifest_agent.py:AgentProvider` vs `agent_provider.py:AgentProvider` and `manifest_egress.py:EgressRoute` vs `egress.py:EgressRoute` (plus any other manifest classes that shadow runtime names). The rename would touch the manifest files and all their importers across the codebase. Want me to do it in this PR, or open a separate one so this PR stays focused on the Dockerfile colocation work?
Owner

@didericis-claude Do it in this PR.

@didericis-claude Do it in this PR.
Author
Collaborator

On it.

On it.
Author
Collaborator

Done in b872985 — all 11 manifest data classes renamed with Manifest prefix across definition files and every usage site. 925 tests pass.

Done in b872985 — all 11 manifest data classes renamed with `Manifest` prefix across definition files and every usage site. 925 tests pass.
didericis changed title from Move agent Dockerfiles into contrib directories to Cleanup backend and agent provider abstractions 2026-06-08 11:43:13 -04:00
Owner

Noticed a lot of shared logic in resolve_plan that should be in the base class even after prompting for consolidation. Going to look over the codebase and manually tidy up a bit.

Noticed a lot of shared logic in `resolve_plan` that should be in the base class even after prompting for consolidation. Going to look over the codebase and manually tidy up a bit.
didericis force-pushed issue-215-dockerfile-colocation from 4dbe44e7cc to 9470b8f955 2026-06-08 13:14:11 -04:00 Compare
didericis-codex added 18 commits 2026-06-08 22:18:23 -04:00
Dockerfile.claude and Dockerfile.codex move from the repo root into
bot_bottle/contrib/claude/Dockerfile and bot_bottle/contrib/codex/Dockerfile
respectively, so all per-provider assets live alongside the provider code.

Closes #215
Drop the `dockerfile` field from `AgentProviderRuntime` and replace it
with a convention-based `dockerfile` property on `AgentProvider`: the
base class looks for a `Dockerfile` file next to the provider's own
`agent_provider.py` module (via `inspect.getfile`), returning its path
or None. Built-in providers inherit the default automatically; custom
user providers work the same way by dropping a Dockerfile next to their
plugin file; any provider needing a non-standard path can override.

All callers (`docker/prepare.py`, `smolmachines/prepare.py`,
`capability_apply.py`) now resolve the provider object once and call
`.dockerfile` directly instead of reading `runtime.dockerfile`.
Unused in tests, docs, or examples. Can be added back if/when merited.
The convention is that every provider declares a Dockerfile location;
callers that care whether the file actually exists check .is_file().
Drops all `is not None` guards on the property result.
Since every provider always has a dockerfile, establish the default
image and dockerfile_path from the provider up front and override for
per-bottle or manifest-specified cases. Removes the image_default
intermediate variable and the trailing else branch.
Avoids name collisions with same-named runtime/plugin classes
(e.g. manifest AgentProvider vs plugin AgentProvider ABC,
manifest EgressRoute vs runtime EgressRoute). Renamed:

  AgentProvider        → ManifestAgentProvider   (manifest_agent.py)
  Agent                → ManifestAgent            (manifest_agent.py)
  EgressRoute          → ManifestEgressRoute      (manifest_egress.py)
  PathMatch            → ManifestPathMatch        (manifest_egress.py)
  HeaderMatch          → ManifestHeaderMatch      (manifest_egress.py)
  MatchEntry           → ManifestMatchEntry       (manifest_egress.py)
  EgressConfig         → ManifestEgressConfig     (manifest_egress.py)
  Bottle               → ManifestBottle           (manifest.py)
  ProvisionedKeyConfig → ManifestProvisionedKeyConfig (manifest_git.py)
  GitEntry             → ManifestGitEntry         (manifest_git.py)
  GitUser              → ManifestGitUser          (manifest_git.py)
Both docker and smolmachines backends use bottle state helpers.
Moving to bot_bottle/ makes the sharing explicit and removes the
cross-backend dependency (smolmachines importing from ..docker).

All callers updated: docker backend, smolmachines backend, cli
modules, and tests.
Both docker and smolmachines resolve_plan.py duplicated: slug minting,
metadata writing, agent state dir setup, git gate / egress / supervise
preparation, env_vars merge, and manifest dockerfile path resolution.

These are now consolidated in bot_bottle/backend/resolve_common.py.
Each backend's resolve_plan retains only its own logic (container name
resolution + env-file for docker; subnet allocation + guest_env build
for smolmachines).
guest_home is now a field on AgentProvisionPlan (set by each provider's
provision_plan() method). BottlePlan.guest_home becomes a read-only
property delegating to agent_provision.guest_home so existing callers
(provision_git, provision_skills, provision_prompt) are unchanged.

Both resolve_plan.py files drop guest_home from the plan constructor
call; the local variable still exists as an intermediary for the
workspace_plan call that precedes agent_provision_plan.
The recent refactor partially removed workspace planning and
capability-apply logic. This commit finishes the cleanup so the
test suite imports cleanly:

- Comment out workspace_plan field/property on BottlePlan and the
  provision_workspace dispatch.
- Comment out workspace usages in docker.util (build_image_with_cwd),
  smolmachines.provision.workspace, agent_provider.provision_git,
  smolmachines.backend.
- Comment out capability_apply imports in cli.start and cli.supervise;
  add a local CapabilityApplyError placeholder so the supervise CLI
  module still imports.
- Break the bottle_state → backend.docker → backend circular import
  by lazy-loading docker_mod inside bottle_identity, and by moving the
  resolve_common import inside BottleBackend.prepare.
- Delete tests for workspace and capability_apply (unit + integration).
- Update test fixtures to drop removed kwargs (container_name_pinned,
  derived_image, env_file, workspace_plan, agent_image_ref) from
  DockerBottlePlan / SmolmachinesBottlePlan constructors.
- Delete the obsolete test_smolmachines_prepare.py (tested the old
  resolve_plan signature; the shared prepare flow now lives in
  BottleBackend.prepare).
- Adjust test_supervise.py for the new Supervise.prepare signature
  (dockerfile_content arg removed).

925 → 897 tests, all passing.
BottleBackend.prepare computed slug and resolved_env but never passed
them to _resolve_plan. The concrete docker/smolmachines _resolve_plan
methods still had the old (spec, *, stage_dir) signature too, so
prepare's kwargs blew up with "unexpected keyword argument
'instance_name'" the moment cli.py start was invoked.

Update the abstract _resolve_plan signature and both backend
implementations to accept the full kwarg set prepare passes, and
forward to resolve_plan.resolve_plan() with everything.
BottleBackend.prepare was calling resolve_manifest_dockerfile("", spec)
for every bottle where the manifest did not set agent_provider.dockerfile.
That resolves an empty string against user_cwd, returning the cwd
itself — which docker then tried to read as a Dockerfile, giving
"is a directory" errors during image build.

When the manifest doesn't override, use the provider plugin's bundled
Dockerfile path (next to its agent_provider.py module) — mirroring
the pre-refactor behavior.
Drop the parallel fields passed through prepare() → _resolve_plan and
read everything from agent_provision instead. The provider plugin now
declares its own guest_home (so the backend stops hardcoding
"/home/node") and the wrapper that builds the provision plan accepts
instance_name and prompt_file, which providers store on the plan.

DockerBottlePlan and SmolmachinesBottlePlan expose container_name /
machine_name, image / agent_image, dockerfile_path /
agent_dockerfile_path, and prompt_file as properties that delegate to
agent_provision so existing call sites keep working unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
fix: resolve pyright strict errors
lint / lint (push) Successful in 1m38s
test / unit (pull_request) Successful in 38s
test / integration (pull_request) Successful in 21s
626fe32896
didericis-codex force-pushed issue-215-dockerfile-colocation from 3ceff1ac4f to 626fe32896 2026-06-08 22:18:23 -04:00 Compare
didericis added 1 commit 2026-06-08 22:35:54 -04:00
fix: restore backend prepare wiring
lint / lint (push) Failing after 1m28s
test / unit (pull_request) Successful in 31s
test / integration (pull_request) Successful in 16s
14188ba368
didericis added 1 commit 2026-06-08 22:43:49 -04:00
fix: restore runtime workspace provisioning
lint / lint (push) Failing after 1m27s
test / unit (pull_request) Successful in 31s
test / integration (pull_request) Successful in 16s
1aee4573aa
didericis added 1 commit 2026-06-08 22:58:35 -04:00
test: narrow metadata assertions for pyright
test / unit (pull_request) Successful in 31s
test / integration (pull_request) Successful in 18s
lint / lint (push) Successful in 1m26s
test / unit (push) Successful in 31s
test / integration (push) Successful in 17s
Update Quality Badges / update-badges (push) Failing after 1m7s
ec1dc3cb5a
didericis merged commit ec1dc3cb5a into main 2026-06-08 23:01:36 -04:00
didericis deleted branch issue-215-dockerfile-colocation 2026-06-08 23:01:37 -04:00
Sign in to join this conversation.