diff --git a/docs/research/remote-docker-vm-isolation.md b/docs/research/remote-docker-vm-isolation.md index fe291f6..b10d3bb 100644 --- a/docs/research/remote-docker-vm-isolation.md +++ b/docs/research/remote-docker-vm-isolation.md @@ -139,6 +139,112 @@ For the "VPN pivot" failure mode, see remote VM back to your LAN. If the agent needs LAN resources, expose those through a narrow API instead. +## Case study: Fly Machines + +Fly.io's Machines product is a useful concrete worked example because +it satisfies all the provider requirements (root, Firecracker-backed +isolation, scriptable lifecycle, per-second billing) and surfaces the +gotchas the abstract pattern leaves implicit. + +### Image strategy + +Build a custom OCI image `FROM docker:dind` that bakes in: + +- The claude-bottle repository checkout. +- A pre-built `claude-bottle:latest` image, saved via `docker save` on + your laptop and loaded in at image-build time + (`RUN docker load < claude-bottle.tar`) or pushed as a layer into + the dind storage. Without this step, the first in-VM `docker build` + runs `apt-get` and a global `npm install -g + @anthropic-ai/claude-code`, which adds 30–90 s to every cold start. +- A `flyctl secrets`-injected `CLAUDE_BOTTLE_OAUTH_TOKEN`, exposed to + the VM's PID 1 as an env var. +- An entrypoint that starts dockerd, waits for it to be healthy, then + either drops into a shell or directly runs `cli.py start `. + +Deploy with `flyctl deploy` or `flyctl machine run --image …`. + +### Boot-to-first-prompt timing + +Three scenarios, all assuming the custom image above (claude-bottle +image baked in, token injected, no in-VM rebuild): + +| Phase | Cold (image not cached on Fly host) | Warm (image cached, `machine run` fresh) | Hot (`machine stop`ped, `machine start`) | +| --- | --- | --- | --- | +| Fly schedule + image fetch | 10–30 s | 2–3 s | ~1 s | +| Firecracker kernel boot | ~1 s | ~1 s | ~1 s (resume) | +| dockerd-in-VM startup | 2–4 s | 2–4 s | 0 s (already running) | +| `cli.py start ` housekeeping (network creates, pipelock sidecar, agent container, skill copy) | 4–6 s | 4–6 s | 4–6 s | +| Claude prints first prompt | 1–3 s | 1–3 s | 1–3 s | +| **End-to-end** | **~20–45 s** | **~10–17 s** | **~7–11 s** | + +For interactive sessions the warm path is the realistic baseline once +the custom image is registered. The hot path trims only a few extra +seconds — the question of whether to keep stopped Machines on standby +is mostly about cost, not speed. + +### Cost of standby vs. create-per-session + +Stopped Fly Machines stop billing CPU/RAM but continue to bill for +storage and any allocated IPv4. A reasonable claude-bottle Machine +size (2 vCPU / 2 GB / ~3 GB rootfs) costs roughly: + +| Item | While stopped | Monthly | +| --- | --- | --- | +| CPU + RAM | not billed | $0 | +| Rootfs storage | ~$0.15/GB-month | ~$0.45 | +| Dedicated IPv4 (if allocated) | $2/month flat | $2.00 | +| Dedicated IPv6 | free | $0 | +| Bandwidth | usage-based | $0 | + +So **roughly $0.50–$2.50/month per standby Machine**, with the IPv4 +line dominating. Drop the dedicated v4 (use IPv6 or Fly's shared v4 +via WireGuard) and standby falls under $1/month. + +For comparison, running the same Machine 24/7 lands in the +$15–$40/month range depending on size, and the create-and-destroy +pattern (one Machine per session, destroyed on exit) is effectively +$0 since you only pay for the seconds it ran. + +### Practical pattern + +Two reasonable workflows, plus one that's tempting but worse: + +1. **Pure ephemeral.** `flyctl machine run` at session start, + `flyctl machine destroy` on exit. ~20–45 s cold start, $0 idle. + Maximally isolated; nothing persists between sessions. Best fit + when sessions are infrequent or when state continuity across + sessions is itself a concern. +2. **Standby pool.** A small fleet of pre-built Machines that get + `start`ed fresh and `destroy`ed (or wiped) per session. The + *Machine identity* is short-lived but the image is pre-cached on + Fly's hosts, keeping warm-path latency at ~10–17 s. + ~$0.50–$1/month per Machine in the pool without dedicated v4. +3. **Permanently stopped Machine, just `start`/`stop`.** Saves a few + extra seconds (~7–11 s hot) but is the weakest of the three on + the isolation axis — the rootfs persists across sessions, so + anything a previous session wrote is still there. Avoid unless + the saved seconds matter more than the state-continuity concern. + +### Fly-specific caveats + +- **DinD requires kernel features.** Fly Machines historically had + some namespacing quirks for nested Docker; verify on a smoke-test + Machine before committing. The pattern is supported (Fly's own + Remote Builders use it), but kernel/runtime updates have shifted + the requirements over time. +- **The launcher's interactive y/N preflight blocks automated remote + start.** `cli.py start` waits on `/dev/tty`. For an automated entry + point you need to pipe `y\n` into stdin, drive it from a pty, or + add a `--yes`/`--non-interactive` flag (a small patch). The + `--remote=user@host` ergonomics direction below would handle this + in passing. +- **Pricing has been re-tariffed multiple times.** The structure + (per-second compute, GB-month storage, $2/v4) has been stable; + specific rates may have moved. Verify against + [fly.io/docs/about/pricing](https://fly.io/docs/about/pricing) + before committing numbers to any planning doc. + ## Optional ergonomics direction A future addon — not architecturally necessary, just nicer: