Spike on removing docker as a dependency for the sidecar #220

Closed
opened 2026-06-09 01:44:42 -04:00 by didericis · 7 comments
Owner

Currently docker is required to build OCI images for smolmachines to consume. It'd be ideal if we could remove that dependency, and then remove the need to have two VM related dependencies on a machine. Then we could convert the sidecar to run on a smolmachines backend.

Currently docker is required to build OCI images for smolmachines to consume. It'd be ideal if we could remove that dependency, and then remove the need to have two VM related dependencies on a machine. Then we could convert the sidecar to run on a smolmachines backend.
didericis added the Kind/TestingKind/Enhancement labels 2026-06-09 01:44:58 -04:00
Collaborator

Spike findings

What Docker is actually used for

Docker touches five distinct things in the smolmachines path:

Usage Location
docker build agent image smolmachines/launch.pydocker/util.py:build_image
docker build sidecar image smolmachines/sidecar_bundle.py:ensure_bundle_image
docker save + ephemeral registry + crane container (push tarball so smolvm can pull) smolmachines/launch.py:_ensure_smolmachine
docker image inspect for cache-key (image ID) docker/util.py:image_id
docker network create/rm, docker run/rm/port for the sidecar bundle smolmachines/sidecar_bundle.py

The last item is the big one: even after the agent moves to smolvm, the sidecar still runs as a Docker container on a per-bottle bridge network. The smolmachine agent reaches it via host loopback alias port-forwards (127.0.0.x:<random port>).


Two separable problems

Problem A — image building: replace docker build / docker save / docker image inspect / the registry+crane pipeline with a Docker-free equivalent.

Problem B — sidecar runtime: stop running the sidecar bundle as a Docker container.

They're independent and can be sequenced. Problem B is the higher-value change (it's the actual runtime dependency). Problem A is required to fully eliminate Docker from the host.


Problem A: Docker-free image building

The pipeline today:

docker build  →  docker save (tar)  →  docker run registry:2.8.3
                                     →  docker run crane push
                                     →  smolvm pack create --image localhost:<port>/...

The registry+crane containers exist because Docker Desktop's daemon VM isn't on localhost from macOS's perspective, so smolvm can't pull directly from the daemon.

Replacement candidates:

  • buildah — builds OCI images from Dockerfiles without a daemon. buildah push can target a local registry directly. Mature project (Red Hat), ships as a static Linux binary. On macOS, needs a Linux VM to build Linux images. Can run inside a smolmachine (bootstrap VM with buildah pre-installed).
  • podman + podman machine — Docker-compatible CLI backed by a QEMU/libkrun VM on macOS. Replaces docker build/save/inspect. Adds another VM dependency, arguably worse than just keeping Docker.
  • nerdctl + containerd — same concern; still a daemon.
  • smolvm bootstrap VM — run buildah inside a dedicated long-lived smolmachine that has buildah+git installed. bot-bottle SSHes/execs into it, runs buildah build, buildah pushes to a native zot registry on the host. smolvm then pulls from zot. This is the most elegant: the only VM runtime needed is smolvm itself.

Replacing the ephemeral registry + crane containers:

  • zot (zotregistry.dev) — a lightweight native OCI registry binary. Runs as a host process on a random port. Eliminates both docker run registry:2.8.3 and docker run crane. buildah push understands OCI directly. crane can also run as a native binary (it's a single Go binary, not container-only).
  • skopeo — can copy between OCI directories and registries without Docker.

Simplest credible path for A:

  1. Ship a buildah-vm.smolmachine (pre-built, committed to the repo or fetched on first run) — a minimal Debian image with buildah.
  2. Replace docker/util.py:build_image with buildah build executed via smolvm machine exec into the buildah VM.
  3. Replace the ephemeral-registry dance with a persistent zot process managed by bot-bottle (start on first use, stop on exit, unix socket or localhost).
  4. Replace docker save + crane with buildah push localhost:<zot-port>/....
  5. Replace docker image inspect cache-keying with buildah inspect --format {{.FromImageDigest}}.

Risk: bootstrapping — we need something to build the buildah VM's image initially. Options: pre-build and commit the .smolmachine artifact, or build it with Docker once and then never again.


Problem B: sidecar as smolmachine

Current flow:

docker build Dockerfile.sidecars
docker network create bot-bottle-bundle-<slug>   (dedicated bridge)
docker run --network <bridge> --ip <pinned> -p 127.0.0.x::<port> ...

The agent VM talks to the sidecar via loopback alias port-forwards. The Docker bridge + pinned IP + TSI allowlist (<bundle-ip>/32) enforce per-bottle isolation.

If sidecar runs as a smolmachine:

The smolmachine agent already reaches services via host loopback — no VM-to-VM networking is needed. A sidecar smolmachine would:

  1. Start its daemons (egress/mitmproxy, supervise, git-gate) inside the VM.
  2. smolvm machine exec has no -p equivalent for publishing ports. We'd need smolvm to expose the sidecar VM's ports on host loopback, which means either:
    a. ssh -L tunnels from the sidecar VM (if smolvm allocates an SSH port), or
    b. The sidecar daemons bind on the VM's host-facing interface (10.0.2.x/vhost-user) and smolvm's TSI forwards that to a host port — need to verify smolvm supports this.

Alternative: run sidecar services as host processes (simplest)

All three sidecar daemons are Python or have native binaries:

  • egressmitmdump has macOS/Linux binaries, runs as a plain process
  • supervise — pure Python MCP server, already in the repo
  • git-gate — Python HTTP daemon + git daemon, already in the repo

Running them directly as host subprocesses (launched and tracked via subprocess.Popen, killed on teardown) eliminates Docker for the sidecar with zero new VM infrastructure. The loopback alias port assignment stays the same. Main downside: slightly less isolation vs. containerization, and adds process-management complexity to launch.py.


Unknowns / things to verify

  1. Does smolvm expose guest ports on the host? The current smolmachines backend uses Docker to host-forward sidecar ports; there's no equivalent for a second smolvm VM. Need to check if smolvm's --smolfile / TSI config supports outbound port publishing or if SSH tunnel approach is needed.
  2. buildah on macOS: buildah's Linux binary can run in a smolmachine, but we'd need a small bootstrap story for the initial smolmachine build.
  3. zot persistence: a persistent zot process vs. per-build ephemeral registry — need to decide lifecycle (per-session vs. per-build vs. long-lived daemon).
  4. Sidecar smolmachine image size: the current sidecar is based on mitmproxy:11.1.3 (~300MB compressed). A VM image would be larger. Caching strategy matters.

Recommended sequencing

Given the unknowns around smolvm port-forwarding, the lowest-risk path is:

  1. Start with sidecar-as-host-processes (Problem B, no new infra needed). Replace docker run/network/port/rm in sidecar_bundle.py with Popen-managed host daemons. Removes the most Docker surface area with the least risk.
  2. Then tackle image building (Problem A) with buildah-in-smolmachine + zot. This removes the last docker build/save dependency but has the bootstrap question to answer first.

The alternative order (A then B) is also valid if we want to keep the sidecar containerized but just stop using Docker for it (e.g. switch sidecar runtime to podman or nerdctl). But that still requires a container runtime on the host.

## Spike findings ### What Docker is actually used for Docker touches five distinct things in the smolmachines path: | Usage | Location | |---|---| | `docker build` agent image | `smolmachines/launch.py` → `docker/util.py:build_image` | | `docker build` sidecar image | `smolmachines/sidecar_bundle.py:ensure_bundle_image` | | `docker save` + ephemeral registry + `crane` container (push tarball so smolvm can pull) | `smolmachines/launch.py:_ensure_smolmachine` | | `docker image inspect` for cache-key (image ID) | `docker/util.py:image_id` | | `docker network create/rm`, `docker run/rm/port` for the sidecar bundle | `smolmachines/sidecar_bundle.py` | The last item is the big one: even after the agent moves to smolvm, the *sidecar* still runs as a Docker container on a per-bottle bridge network. The smolmachine agent reaches it via host loopback alias port-forwards (`127.0.0.x:<random port>`). --- ### Two separable problems **Problem A — image building**: replace `docker build` / `docker save` / `docker image inspect` / the registry+crane pipeline with a Docker-free equivalent. **Problem B — sidecar runtime**: stop running the sidecar bundle as a Docker container. They're independent and can be sequenced. Problem B is the higher-value change (it's the actual runtime dependency). Problem A is required to fully eliminate Docker from the host. --- ### Problem A: Docker-free image building **The pipeline today:** ``` docker build → docker save (tar) → docker run registry:2.8.3 → docker run crane push → smolvm pack create --image localhost:<port>/... ``` The registry+crane containers exist because Docker Desktop's daemon VM isn't on `localhost` from macOS's perspective, so `smolvm` can't pull directly from the daemon. **Replacement candidates:** - **buildah** — builds OCI images from Dockerfiles without a daemon. `buildah push` can target a local registry directly. Mature project (Red Hat), ships as a static Linux binary. On macOS, needs a Linux VM to build Linux images. Can run *inside* a smolmachine (bootstrap VM with buildah pre-installed). - **podman + podman machine** — Docker-compatible CLI backed by a QEMU/libkrun VM on macOS. Replaces `docker build/save/inspect`. Adds another VM dependency, arguably worse than just keeping Docker. - **nerdctl + containerd** — same concern; still a daemon. - **`smolvm` bootstrap VM** — run buildah inside a dedicated long-lived smolmachine that has buildah+git installed. `bot-bottle` SSHes/execs into it, runs `buildah build`, buildah pushes to a native zot registry on the host. smolvm then pulls from zot. This is the most elegant: the only VM runtime needed is smolvm itself. **Replacing the ephemeral registry + crane containers:** - **zot** (`zotregistry.dev`) — a lightweight native OCI registry binary. Runs as a host process on a random port. Eliminates both `docker run registry:2.8.3` and `docker run crane`. `buildah push` understands OCI directly. `crane` can also run as a native binary (it's a single Go binary, not container-only). - **skopeo** — can copy between OCI directories and registries without Docker. **Simplest credible path for A:** 1. Ship a `buildah-vm.smolmachine` (pre-built, committed to the repo or fetched on first run) — a minimal Debian image with buildah. 2. Replace `docker/util.py:build_image` with `buildah build` executed via `smolvm machine exec` into the buildah VM. 3. Replace the ephemeral-registry dance with a persistent `zot` process managed by bot-bottle (start on first use, stop on exit, unix socket or localhost). 4. Replace `docker save` + crane with `buildah push localhost:<zot-port>/...`. 5. Replace `docker image inspect` cache-keying with `buildah inspect --format {{.FromImageDigest}}`. **Risk**: bootstrapping — we need *something* to build the buildah VM's image initially. Options: pre-build and commit the `.smolmachine` artifact, or build it with Docker once and then never again. --- ### Problem B: sidecar as smolmachine **Current flow:** ``` docker build Dockerfile.sidecars docker network create bot-bottle-bundle-<slug> (dedicated bridge) docker run --network <bridge> --ip <pinned> -p 127.0.0.x::<port> ... ``` The agent VM talks to the sidecar via loopback alias port-forwards. The Docker bridge + pinned IP + TSI allowlist (`<bundle-ip>/32`) enforce per-bottle isolation. **If sidecar runs as a smolmachine:** The smolmachine agent already reaches services via host loopback — no VM-to-VM networking is needed. A sidecar smolmachine would: 1. Start its daemons (egress/mitmproxy, supervise, git-gate) inside the VM. 2. `smolvm machine exec` has no `-p` equivalent for publishing ports. We'd need smolvm to expose the sidecar VM's ports on host loopback, which means either: a. `ssh -L` tunnels from the sidecar VM (if smolvm allocates an SSH port), or b. The sidecar daemons bind on the VM's host-facing interface (10.0.2.x/vhost-user) and smolvm's TSI forwards that to a host port — need to verify smolvm supports this. **Alternative: run sidecar services as host processes (simplest)** All three sidecar daemons are Python or have native binaries: - `egress` — `mitmdump` has macOS/Linux binaries, runs as a plain process - `supervise` — pure Python MCP server, already in the repo - `git-gate` — Python HTTP daemon + `git daemon`, already in the repo Running them directly as host subprocesses (launched and tracked via `subprocess.Popen`, killed on teardown) eliminates Docker for the sidecar with zero new VM infrastructure. The loopback alias port assignment stays the same. Main downside: slightly less isolation vs. containerization, and adds process-management complexity to `launch.py`. --- ### Unknowns / things to verify 1. **Does smolvm expose guest ports on the host?** The current smolmachines backend uses Docker to host-forward sidecar ports; there's no equivalent for a second smolvm VM. Need to check if smolvm's `--smolfile` / TSI config supports outbound port publishing or if SSH tunnel approach is needed. 2. **buildah on macOS**: buildah's Linux binary can run in a smolmachine, but we'd need a small bootstrap story for the initial smolmachine build. 3. **zot persistence**: a persistent zot process vs. per-build ephemeral registry — need to decide lifecycle (per-session vs. per-build vs. long-lived daemon). 4. **Sidecar smolmachine image size**: the current sidecar is based on `mitmproxy:11.1.3` (~300MB compressed). A VM image would be larger. Caching strategy matters. --- ### Recommended sequencing Given the unknowns around smolvm port-forwarding, the lowest-risk path is: 1. **Start with sidecar-as-host-processes** (Problem B, no new infra needed). Replace `docker run/network/port/rm` in `sidecar_bundle.py` with `Popen`-managed host daemons. Removes the most Docker surface area with the least risk. 2. **Then tackle image building** (Problem A) with buildah-in-smolmachine + zot. This removes the last `docker build/save` dependency but has the bootstrap question to answer first. The alternative order (A then B) is also valid if we want to keep the sidecar containerized but just stop using Docker for it (e.g. switch sidecar runtime to podman or nerdctl). But that still requires a container runtime on the host.
Author
Owner

RE recommended sequencing, do NOT want to start as sidecar as host process, requires too many components.

RE recommended sequencing, do NOT want to start as sidecar as host process, requires too many components.
Author
Owner

Suppose we could even run the same docker commands within smolvm: bootstrapping the initial smolvm image could be done by pulling from an image registry. For extra paranoid users we could allow them to build the bootstrap from source via docker.

Suppose we could even run the same docker commands within smolvm: bootstrapping the initial smolvm image could be done by pulling from an image registry. For extra paranoid users we could allow them to build the bootstrap from source via docker.
Collaborator

Agreed, sidecar-as-host-processes should be dropped from the recommendation. It removes Docker, but it turns one managed bundle into several host-level processes with more lifecycle and environment surface area than we want.

The better sequencing is probably:

  1. Prototype a bootstrap smolmachine that contains the Docker-compatible build/runtime stack.
  2. Pull that bootstrap image from a registry for normal users.
  3. Optionally document a paranoid/source-build path that uses Docker once to build the bootstrap artifact.
  4. Move the existing Docker build/run flow into that smolmachine with the fewest semantic changes first.
  5. Then tighten the implementation by swapping pieces for buildkit/buildah/zot only where they reduce complexity.

That keeps the host dependency goal intact: no Docker daemon or Docker Desktop on the host. It also avoids exploding the sidecar into host subprocesses. Running the same Docker commands inside smolvm is a good migration bridge because most of the current bundle assumptions stay valid while the runtime dependency moves behind smolvm.

The main thing to verify next is sidecar port exposure from the smolvm guest back to host loopback. If smolvm can publish or tunnel the bundle ports cleanly, then the sidecar can stay as one VM-contained bundle. If it cannot, the spike should focus there before spending time on replacing the image builder.

Agreed, sidecar-as-host-processes should be dropped from the recommendation. It removes Docker, but it turns one managed bundle into several host-level processes with more lifecycle and environment surface area than we want. The better sequencing is probably: 1. Prototype a bootstrap smolmachine that contains the Docker-compatible build/runtime stack. 2. Pull that bootstrap image from a registry for normal users. 3. Optionally document a paranoid/source-build path that uses Docker once to build the bootstrap artifact. 4. Move the existing Docker build/run flow into that smolmachine with the fewest semantic changes first. 5. Then tighten the implementation by swapping pieces for buildkit/buildah/zot only where they reduce complexity. That keeps the host dependency goal intact: no Docker daemon or Docker Desktop on the host. It also avoids exploding the sidecar into host subprocesses. Running the same Docker commands inside smolvm is a good migration bridge because most of the current bundle assumptions stay valid while the runtime dependency moves behind smolvm. The main thing to verify next is sidecar port exposure from the smolvm guest back to host loopback. If smolvm can publish or tunnel the bundle ports cleanly, then the sidecar can stay as one VM-contained bundle. If it cannot, the spike should focus there before spending time on replacing the image builder.
Collaborator

Verified the next question behaviorally on the current host.

Short version: smolvm can publish a guest port back to the host, and another smolvm guest can reach that published service through the existing per-bottle loopback alias path. That means the sidecar-as-smolmachine shape is viable from a basic connectivity/TSI perspective; we do not need to abandon it for lack of host port exposure.

What I ran:

  • smolvm --help documents --port/-p HOST:GUEST and Smolfile dev.ports.
  • Created a disposable sidecar-like smolvm machine with --port <random-host-port>:18080.
  • Started a tiny HTTP listener inside that guest on 18080.
  • Confirmed the host could fetch it through 127.0.0.16:<random-host-port>.
  • Created a second bare smolvm client machine with --allow-cidr 127.0.0.16/32.
  • Confirmed that client VM could fetch http://127.0.0.16:<random-host-port>/ and received the sidecar response.
  • Deleted the temporary machines afterward.

The important caveat is that smolvm does not appear to support Docker-style per-IP bind syntax. It accepts --port HOST:GUEST, but rejected --port 127.0.0.1:HOST:GUEST with invalid host port: 127.0.0.1. lsof showed the smolvm process listening as *:HOST_PORT, not specifically on 127.0.0.16.

Implications:

  1. The VM-to-VM path works. An agent VM restricted to 127.0.0.16/32 can reach a sidecar VM published on a host port, so the existing TSI-per-loopback-alias model still works for the agent side.
  2. The host exposure model is weaker than Docker today. Docker currently gives us -p 127.0.0.16::PORT, which binds the daemon surface only to the per-bottle loopback alias. smolvm publishing appears to expose the chosen host port on all host interfaces, or at least not to a caller-selected loopback address. That is a material difference for sidecars because egress/git-gate/supervise are privileged-ish services and should not be accidentally reachable outside the intended bottle path.
  3. Random high ports are not a security boundary. They reduce accidental collisions and casual discovery, but they do not replace loopback-only binding. Before moving the sidecar bundle to smolvm, we need a deliberate answer for non-agent callers reaching the published port from the host or LAN.
  4. Implementation would need preselected host ports. Docker currently supports random host-port assignment plus docker port discovery. smolvm wants HOST:GUEST, so bot-bottle would likely allocate free host ports up front, pass them into smolvm machine create --port, and then use those known ports in agent env.
  5. Best next spike is exposure mitigation, not basic forwarding. Options include asking/patching smolvm for host-IP bind support, adding a macOS pf anchor around the published ports, adding authentication/mTLS on sidecar endpoints, or using a tunnel that can bind to the chosen loopback alias. The cleanest product shape is still upstream smolvm support for HOST_IP:HOST_PORT:GUEST_PORT or equivalent Smolfile syntax.

So the updated conclusion is: port exposure is sufficient to continue the sidecar-smolmachine spike, but the lack of per-IP bind syntax is the main security/design gap to resolve before replacing the Docker sidecar runtime.

Verified the next question behaviorally on the current host. Short version: **smolvm can publish a guest port back to the host, and another smolvm guest can reach that published service through the existing per-bottle loopback alias path.** That means the sidecar-as-smolmachine shape is viable from a basic connectivity/TSI perspective; we do not need to abandon it for lack of host port exposure. What I ran: - `smolvm --help` documents `--port/-p HOST:GUEST` and Smolfile `dev.ports`. - Created a disposable sidecar-like smolvm machine with `--port <random-host-port>:18080`. - Started a tiny HTTP listener inside that guest on `18080`. - Confirmed the host could fetch it through `127.0.0.16:<random-host-port>`. - Created a second bare smolvm client machine with `--allow-cidr 127.0.0.16/32`. - Confirmed that client VM could fetch `http://127.0.0.16:<random-host-port>/` and received the sidecar response. - Deleted the temporary machines afterward. The important caveat is that smolvm does **not** appear to support Docker-style per-IP bind syntax. It accepts `--port HOST:GUEST`, but rejected `--port 127.0.0.1:HOST:GUEST` with `invalid host port: 127.0.0.1`. `lsof` showed the smolvm process listening as `*:HOST_PORT`, not specifically on `127.0.0.16`. Implications: 1. **The VM-to-VM path works.** An agent VM restricted to `127.0.0.16/32` can reach a sidecar VM published on a host port, so the existing TSI-per-loopback-alias model still works for the agent side. 2. **The host exposure model is weaker than Docker today.** Docker currently gives us `-p 127.0.0.16::PORT`, which binds the daemon surface only to the per-bottle loopback alias. smolvm publishing appears to expose the chosen host port on all host interfaces, or at least not to a caller-selected loopback address. That is a material difference for sidecars because egress/git-gate/supervise are privileged-ish services and should not be accidentally reachable outside the intended bottle path. 3. **Random high ports are not a security boundary.** They reduce accidental collisions and casual discovery, but they do not replace loopback-only binding. Before moving the sidecar bundle to smolvm, we need a deliberate answer for non-agent callers reaching the published port from the host or LAN. 4. **Implementation would need preselected host ports.** Docker currently supports random host-port assignment plus `docker port` discovery. smolvm wants `HOST:GUEST`, so bot-bottle would likely allocate free host ports up front, pass them into `smolvm machine create --port`, and then use those known ports in agent env. 5. **Best next spike is exposure mitigation, not basic forwarding.** Options include asking/patching smolvm for host-IP bind support, adding a macOS `pf` anchor around the published ports, adding authentication/mTLS on sidecar endpoints, or using a tunnel that can bind to the chosen loopback alias. The cleanest product shape is still upstream smolvm support for `HOST_IP:HOST_PORT:GUEST_PORT` or equivalent Smolfile syntax. So the updated conclusion is: port exposure is sufficient to continue the sidecar-smolmachine spike, but the lack of per-IP bind syntax is the main security/design gap to resolve before replacing the Docker sidecar runtime.
Collaborator

Closing this spike with the current conclusion: do not remove Docker from the sidecar runtime path yet.

The reason is security, not implementation convenience. Docker is currently providing the sidecar boundary we need: a per-bottle network plus precise host-loopback bind semantics (127.0.0.X::<port>). The smolvm probes showed that sidecar ports can be published and reached by an agent VM through the existing loopback-alias/TSI path, but smolvm port publishing does not currently appear to support Docker-style per-IP bind syntax. Its published listener is not caller-bound to the per-bottle alias.

That means moving the sidecar bundle to smolvm today would either:

  1. expose privileged sidecar services like egress, git-gate, and supervise more broadly than they are exposed now, or
  2. require compensating machinery such as host firewall rules, SSH tunnel processes, bearer-token auth, or other bespoke networking layers.

Those alternatives are either weaker than the current model, add lifecycle/security complexity, or both. A private multi-VM network in smolvm would be the ideal replacement, but current smolvm docs/API/source do not show support for joining an agent VM and a sidecar VM to the same isolated private network.

So the pragmatic decision is: keep Docker in the loop for sidecar networking until smolvm can preserve the same security properties. A future revisit should focus on upstream smolvm support for either private multi-machine networks or host-IP-specific port binding. Until then, removing Docker would be a regression in the privacy/security model.

Closing this spike with the current conclusion: **do not remove Docker from the sidecar runtime path yet**. The reason is security, not implementation convenience. Docker is currently providing the sidecar boundary we need: a per-bottle network plus precise host-loopback bind semantics (`127.0.0.X::<port>`). The smolvm probes showed that sidecar ports can be published and reached by an agent VM through the existing loopback-alias/TSI path, but smolvm port publishing does not currently appear to support Docker-style per-IP bind syntax. Its published listener is not caller-bound to the per-bottle alias. That means moving the sidecar bundle to smolvm today would either: 1. expose privileged sidecar services like egress, git-gate, and supervise more broadly than they are exposed now, or 2. require compensating machinery such as host firewall rules, SSH tunnel processes, bearer-token auth, or other bespoke networking layers. Those alternatives are either weaker than the current model, add lifecycle/security complexity, or both. A private multi-VM network in smolvm would be the ideal replacement, but current smolvm docs/API/source do not show support for joining an agent VM and a sidecar VM to the same isolated private network. So the pragmatic decision is: **keep Docker in the loop for sidecar networking until smolvm can preserve the same security properties**. A future revisit should focus on upstream smolvm support for either private multi-machine networks or host-IP-specific port binding. Until then, removing Docker would be a regression in the privacy/security model.
Author
Owner

Follow-up: Apple container as a native macOS backend

Revisiting this after digging into how Docker's per-bottle isolation actually works and whether we can reproduce it ourselves.

The original closing rationale still holds for the libkrun/TSI path. Docker buys us two things we depend on: per-IP host-loopback binding (127.0.0.X::<port>) and a private segment shared by two VMs. libkrun's TSI is socket-level impersonation, not L2, so it can't give us either — which is exactly why removing Docker regressed the isolation model and why this issue closed. Nothing about that has changed.

What has changed: Apple shipped container 1.0.0 (2026-06-09, Apple Silicon, macOS 26), and its model provides those primitives natively. It runs each container in its own lightweight VM and exposes a Docker-shaped networking CLI on top of vmnet. That turns "remove Docker" from a regression into a viable swap — but a swap of the whole VM backend, not a drop-in. Filing this as a separate follow-up rather than reopening, since it supersedes the host-process direction this issue explored.

Reasons to add a macOS container backend

  • Hypervisor isolation per agent, for free. VM-per-container matches our concurrent-untrusted-agent threat model. A kernel escape in one bottle doesn't land it next to another — the boundary is the hypervisor, not namespaces. (This is also why plain podman/container-on-shared-kernel was ruled out.)
  • Native per-bottle isolated networks. container network create --internal bottle-<slug> gives each bottle a host-only segment with no internet path. Replaces our docker network create bridge directly.
  • Forced-egress topology is supported and maintainer-confirmed. The exact pattern — agent on an --internal network, sidecar dual-homed onto --internal + an egress network, acting as the only path out — is confirmed working in [apple/container#1170](https://github.com/apple/container/discussions/1170). That's our cred-proxy/mitmproxy chokepoint, enforced by topology.
  • Per-IP loopback publish is back. Publish syntax is [host-ip:]host-port:container-port — the precise property whose absence closed this issue. Regained natively if we ever need host-side observability.
  • OCI-native build + run kills the whole image pipeline. container build (BuildKit) replaces docker build, and container run consumes the local OCI image directly, so the docker save → ephemeral registry → crane push → smolvm pull dance disappears. Closes the build half of this issue too.
  • DNS-by-name on the internal network lets the agent reach sidecar-<slug>:<port> directly, dropping the 127.0.0.x alias plumbing entirely.

The sidecar bundle stays a single unit (one OCI image, three daemons), so no host-process sprawl.

Caveats / scope

  • Hard floor: macOS 26 Tahoe + Apple Silicon. Container-to-container networking does not work on macOS 15, and there's no Intel path. This gates the backend to Tahoe users; smolvm stays the only option below that.
  • Backend swap, not a plugin. We'd re-home cred-proxy / git-gate / supervise as an OCI bundle and lose the TSI-integrated egress allowlist (enforcement moves into in-guest routing instead).
  • Isolation boundary is the network. Containers on a shared network can ARP-spoof each other with no intra-network block option, so isolation must be one --internal network per bottle (which we'd do anyway).
  • Transparent-proxy enforcement is still ours to build. Routing the agent's traffic through mitmproxy (default-route + REDIRECT, not just HTTPS_PROXY) has no off-the-shelf answer — same enforcement we do today via TSI, relocated into guest routing.

Verify before committing

  1. An --internal-only agent has provably zero host reachability (test, don't assume).
  2. Multi-homing attach order does the right thing for the sidecar's egress leg.

Conclusion

Keep Docker as the backend for now; this issue stays resolved as-is. Add Apple container as a new, optional macOS backend — it's the first off-the-shelf stack that gives us VM-per-agent isolation and the private-network + per-IP-bind primitives Docker was standing in for, and it removes the Docker dependency for real on Tahoe. Tracking as a follow-up task.

## Follow-up: Apple `container` as a native macOS backend Revisiting this after digging into *how* Docker's per-bottle isolation actually works and whether we can reproduce it ourselves. **The original closing rationale still holds for the libkrun/TSI path.** Docker buys us two things we depend on: per-IP host-loopback binding (`127.0.0.X::<port>`) and a private segment shared by two VMs. libkrun's TSI is socket-level impersonation, not L2, so it can't give us either — which is exactly why removing Docker regressed the isolation model and why this issue closed. Nothing about that has changed. **What *has* changed:** Apple shipped `container` 1.0.0 (2026-06-09, Apple Silicon, macOS 26), and its model provides those primitives natively. It runs each container in its own lightweight VM and exposes a Docker-shaped networking CLI on top of vmnet. That turns "remove Docker" from a regression into a viable swap — but a swap of the whole VM backend, not a drop-in. Filing this as a separate follow-up rather than reopening, since it supersedes the host-process direction this issue explored. ### Reasons to add a macOS `container` backend - **Hypervisor isolation per agent, for free.** VM-per-container matches our concurrent-untrusted-agent threat model. A kernel escape in one bottle doesn't land it next to another — the boundary is the hypervisor, not namespaces. (This is also why plain podman/container-on-shared-kernel was ruled out.) - **Native per-bottle isolated networks.** `container network create --internal bottle-<slug>` gives each bottle a host-only segment with no internet path. Replaces our `docker network create` bridge directly. - **Forced-egress topology is supported and maintainer-confirmed.** The exact pattern — agent on an `--internal` network, sidecar dual-homed onto `--internal` + an egress network, acting as the only path out — is confirmed working in [[apple/container#1170](https://github.com/apple/container/discussions/1170)](https://github.com/apple/container/discussions/1170). That's our cred-proxy/mitmproxy chokepoint, enforced by topology. - **Per-IP loopback publish is back.** Publish syntax is `[host-ip:]host-port:container-port` — the precise property whose absence closed this issue. Regained natively if we ever need host-side observability. - **OCI-native build + run kills the whole image pipeline.** `container build` (BuildKit) replaces `docker build`, and `container run` consumes the local OCI image directly, so the `docker save → ephemeral registry → crane push → smolvm pull` dance disappears. Closes the build half of this issue too. - **DNS-by-name on the internal network** lets the agent reach `sidecar-<slug>:<port>` directly, dropping the `127.0.0.x` alias plumbing entirely. The sidecar bundle stays a single unit (one OCI image, three daemons), so no host-process sprawl. ### Caveats / scope - **Hard floor: macOS 26 Tahoe + Apple Silicon.** Container-to-container networking does not work on macOS 15, and there's no Intel path. This gates the backend to Tahoe users; smolvm stays the only option below that. - **Backend swap, not a plugin.** We'd re-home cred-proxy / git-gate / supervise as an OCI bundle and lose the TSI-integrated egress allowlist (enforcement moves into in-guest routing instead). - **Isolation boundary is the network.** Containers on a shared network can ARP-spoof each other with no intra-network block option, so isolation must be one `--internal` network per bottle (which we'd do anyway). - **Transparent-proxy enforcement is still ours to build.** Routing the agent's traffic through mitmproxy (default-route + REDIRECT, not just `HTTPS_PROXY`) has no off-the-shelf answer — same enforcement we do today via TSI, relocated into guest routing. ### Verify before committing 1. An `--internal`-only agent has provably zero host reachability (test, don't assume). 2. Multi-homing attach order does the right thing for the sidecar's egress leg. ### Conclusion Keep Docker as the backend for now; this issue stays resolved as-is. **Add Apple `container` as a new, optional macOS backend** — it's the first off-the-shelf stack that gives us VM-per-agent isolation *and* the private-network + per-IP-bind primitives Docker was standing in for, and it removes the Docker dependency for real on Tahoe. Tracking as a follow-up task.
Sign in to join this conversation.
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: didericis/bot-bottle#220