Spike on removing docker as a dependency for the sidecar #220
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Currently docker is required to build OCI images for smolmachines to consume. It'd be ideal if we could remove that dependency, and then remove the need to have two VM related dependencies on a machine. Then we could convert the sidecar to run on a smolmachines backend.
Spike findings
What Docker is actually used for
Docker touches five distinct things in the smolmachines path:
docker buildagent imagesmolmachines/launch.py→docker/util.py:build_imagedocker buildsidecar imagesmolmachines/sidecar_bundle.py:ensure_bundle_imagedocker save+ ephemeral registry +cranecontainer (push tarball so smolvm can pull)smolmachines/launch.py:_ensure_smolmachinedocker image inspectfor cache-key (image ID)docker/util.py:image_iddocker network create/rm,docker run/rm/portfor the sidecar bundlesmolmachines/sidecar_bundle.pyThe last item is the big one: even after the agent moves to smolvm, the sidecar still runs as a Docker container on a per-bottle bridge network. The smolmachine agent reaches it via host loopback alias port-forwards (
127.0.0.x:<random port>).Two separable problems
Problem A — image building: replace
docker build/docker save/docker image inspect/ the registry+crane pipeline with a Docker-free equivalent.Problem B — sidecar runtime: stop running the sidecar bundle as a Docker container.
They're independent and can be sequenced. Problem B is the higher-value change (it's the actual runtime dependency). Problem A is required to fully eliminate Docker from the host.
Problem A: Docker-free image building
The pipeline today:
The registry+crane containers exist because Docker Desktop's daemon VM isn't on
localhostfrom macOS's perspective, sosmolvmcan't pull directly from the daemon.Replacement candidates:
buildah pushcan target a local registry directly. Mature project (Red Hat), ships as a static Linux binary. On macOS, needs a Linux VM to build Linux images. Can run inside a smolmachine (bootstrap VM with buildah pre-installed).docker build/save/inspect. Adds another VM dependency, arguably worse than just keeping Docker.smolvmbootstrap VM — run buildah inside a dedicated long-lived smolmachine that has buildah+git installed.bot-bottleSSHes/execs into it, runsbuildah build, buildah pushes to a native zot registry on the host. smolvm then pulls from zot. This is the most elegant: the only VM runtime needed is smolvm itself.Replacing the ephemeral registry + crane containers:
zotregistry.dev) — a lightweight native OCI registry binary. Runs as a host process on a random port. Eliminates bothdocker run registry:2.8.3anddocker run crane.buildah pushunderstands OCI directly.cranecan also run as a native binary (it's a single Go binary, not container-only).Simplest credible path for A:
buildah-vm.smolmachine(pre-built, committed to the repo or fetched on first run) — a minimal Debian image with buildah.docker/util.py:build_imagewithbuildah buildexecuted viasmolvm machine execinto the buildah VM.zotprocess managed by bot-bottle (start on first use, stop on exit, unix socket or localhost).docker save+ crane withbuildah push localhost:<zot-port>/....docker image inspectcache-keying withbuildah inspect --format {{.FromImageDigest}}.Risk: bootstrapping — we need something to build the buildah VM's image initially. Options: pre-build and commit the
.smolmachineartifact, or build it with Docker once and then never again.Problem B: sidecar as smolmachine
Current flow:
The agent VM talks to the sidecar via loopback alias port-forwards. The Docker bridge + pinned IP + TSI allowlist (
<bundle-ip>/32) enforce per-bottle isolation.If sidecar runs as a smolmachine:
The smolmachine agent already reaches services via host loopback — no VM-to-VM networking is needed. A sidecar smolmachine would:
smolvm machine exechas no-pequivalent for publishing ports. We'd need smolvm to expose the sidecar VM's ports on host loopback, which means either:a.
ssh -Ltunnels from the sidecar VM (if smolvm allocates an SSH port), orb. The sidecar daemons bind on the VM's host-facing interface (10.0.2.x/vhost-user) and smolvm's TSI forwards that to a host port — need to verify smolvm supports this.
Alternative: run sidecar services as host processes (simplest)
All three sidecar daemons are Python or have native binaries:
egress—mitmdumphas macOS/Linux binaries, runs as a plain processsupervise— pure Python MCP server, already in the repogit-gate— Python HTTP daemon +git daemon, already in the repoRunning them directly as host subprocesses (launched and tracked via
subprocess.Popen, killed on teardown) eliminates Docker for the sidecar with zero new VM infrastructure. The loopback alias port assignment stays the same. Main downside: slightly less isolation vs. containerization, and adds process-management complexity tolaunch.py.Unknowns / things to verify
--smolfile/ TSI config supports outbound port publishing or if SSH tunnel approach is needed.mitmproxy:11.1.3(~300MB compressed). A VM image would be larger. Caching strategy matters.Recommended sequencing
Given the unknowns around smolvm port-forwarding, the lowest-risk path is:
docker run/network/port/rminsidecar_bundle.pywithPopen-managed host daemons. Removes the most Docker surface area with the least risk.docker build/savedependency but has the bootstrap question to answer first.The alternative order (A then B) is also valid if we want to keep the sidecar containerized but just stop using Docker for it (e.g. switch sidecar runtime to podman or nerdctl). But that still requires a container runtime on the host.
RE recommended sequencing, do NOT want to start as sidecar as host process, requires too many components.
Suppose we could even run the same docker commands within smolvm: bootstrapping the initial smolvm image could be done by pulling from an image registry. For extra paranoid users we could allow them to build the bootstrap from source via docker.
Agreed, sidecar-as-host-processes should be dropped from the recommendation. It removes Docker, but it turns one managed bundle into several host-level processes with more lifecycle and environment surface area than we want.
The better sequencing is probably:
That keeps the host dependency goal intact: no Docker daemon or Docker Desktop on the host. It also avoids exploding the sidecar into host subprocesses. Running the same Docker commands inside smolvm is a good migration bridge because most of the current bundle assumptions stay valid while the runtime dependency moves behind smolvm.
The main thing to verify next is sidecar port exposure from the smolvm guest back to host loopback. If smolvm can publish or tunnel the bundle ports cleanly, then the sidecar can stay as one VM-contained bundle. If it cannot, the spike should focus there before spending time on replacing the image builder.
Verified the next question behaviorally on the current host.
Short version: smolvm can publish a guest port back to the host, and another smolvm guest can reach that published service through the existing per-bottle loopback alias path. That means the sidecar-as-smolmachine shape is viable from a basic connectivity/TSI perspective; we do not need to abandon it for lack of host port exposure.
What I ran:
smolvm --helpdocuments--port/-p HOST:GUESTand Smolfiledev.ports.--port <random-host-port>:18080.18080.127.0.0.16:<random-host-port>.--allow-cidr 127.0.0.16/32.http://127.0.0.16:<random-host-port>/and received the sidecar response.The important caveat is that smolvm does not appear to support Docker-style per-IP bind syntax. It accepts
--port HOST:GUEST, but rejected--port 127.0.0.1:HOST:GUESTwithinvalid host port: 127.0.0.1.lsofshowed the smolvm process listening as*:HOST_PORT, not specifically on127.0.0.16.Implications:
127.0.0.16/32can reach a sidecar VM published on a host port, so the existing TSI-per-loopback-alias model still works for the agent side.-p 127.0.0.16::PORT, which binds the daemon surface only to the per-bottle loopback alias. smolvm publishing appears to expose the chosen host port on all host interfaces, or at least not to a caller-selected loopback address. That is a material difference for sidecars because egress/git-gate/supervise are privileged-ish services and should not be accidentally reachable outside the intended bottle path.docker portdiscovery. smolvm wantsHOST:GUEST, so bot-bottle would likely allocate free host ports up front, pass them intosmolvm machine create --port, and then use those known ports in agent env.pfanchor around the published ports, adding authentication/mTLS on sidecar endpoints, or using a tunnel that can bind to the chosen loopback alias. The cleanest product shape is still upstream smolvm support forHOST_IP:HOST_PORT:GUEST_PORTor equivalent Smolfile syntax.So the updated conclusion is: port exposure is sufficient to continue the sidecar-smolmachine spike, but the lack of per-IP bind syntax is the main security/design gap to resolve before replacing the Docker sidecar runtime.
Closing this spike with the current conclusion: do not remove Docker from the sidecar runtime path yet.
The reason is security, not implementation convenience. Docker is currently providing the sidecar boundary we need: a per-bottle network plus precise host-loopback bind semantics (
127.0.0.X::<port>). The smolvm probes showed that sidecar ports can be published and reached by an agent VM through the existing loopback-alias/TSI path, but smolvm port publishing does not currently appear to support Docker-style per-IP bind syntax. Its published listener is not caller-bound to the per-bottle alias.That means moving the sidecar bundle to smolvm today would either:
Those alternatives are either weaker than the current model, add lifecycle/security complexity, or both. A private multi-VM network in smolvm would be the ideal replacement, but current smolvm docs/API/source do not show support for joining an agent VM and a sidecar VM to the same isolated private network.
So the pragmatic decision is: keep Docker in the loop for sidecar networking until smolvm can preserve the same security properties. A future revisit should focus on upstream smolvm support for either private multi-machine networks or host-IP-specific port binding. Until then, removing Docker would be a regression in the privacy/security model.
Follow-up: Apple
containeras a native macOS backendRevisiting this after digging into how Docker's per-bottle isolation actually works and whether we can reproduce it ourselves.
The original closing rationale still holds for the libkrun/TSI path. Docker buys us two things we depend on: per-IP host-loopback binding (
127.0.0.X::<port>) and a private segment shared by two VMs. libkrun's TSI is socket-level impersonation, not L2, so it can't give us either — which is exactly why removing Docker regressed the isolation model and why this issue closed. Nothing about that has changed.What has changed: Apple shipped
container1.0.0 (2026-06-09, Apple Silicon, macOS 26), and its model provides those primitives natively. It runs each container in its own lightweight VM and exposes a Docker-shaped networking CLI on top of vmnet. That turns "remove Docker" from a regression into a viable swap — but a swap of the whole VM backend, not a drop-in. Filing this as a separate follow-up rather than reopening, since it supersedes the host-process direction this issue explored.Reasons to add a macOS
containerbackendcontainer network create --internal bottle-<slug>gives each bottle a host-only segment with no internet path. Replaces ourdocker network createbridge directly.--internalnetwork, sidecar dual-homed onto--internal+ an egress network, acting as the only path out — is confirmed working in [apple/container#1170](https://github.com/apple/container/discussions/1170). That's our cred-proxy/mitmproxy chokepoint, enforced by topology.[host-ip:]host-port:container-port— the precise property whose absence closed this issue. Regained natively if we ever need host-side observability.container build(BuildKit) replacesdocker build, andcontainer runconsumes the local OCI image directly, so thedocker save → ephemeral registry → crane push → smolvm pulldance disappears. Closes the build half of this issue too.sidecar-<slug>:<port>directly, dropping the127.0.0.xalias plumbing entirely.The sidecar bundle stays a single unit (one OCI image, three daemons), so no host-process sprawl.
Caveats / scope
--internalnetwork per bottle (which we'd do anyway).HTTPS_PROXY) has no off-the-shelf answer — same enforcement we do today via TSI, relocated into guest routing.Verify before committing
--internal-only agent has provably zero host reachability (test, don't assume).Conclusion
Keep Docker as the backend for now; this issue stays resolved as-is. Add Apple
containeras a new, optional macOS backend — it's the first off-the-shelf stack that gives us VM-per-agent isolation and the private-network + per-IP-bind primitives Docker was standing in for, and it removes the Docker dependency for real on Tahoe. Tracking as a follow-up task.