docs: add Fly Machines case study to remote-docker-vm-isolation note
test / run tests/run_tests.py (push) Successful in 13s

Concrete worked example covering image strategy (with the bake-the-
claude-bottle-image-in optimization that elides 30-90s of in-VM
build), cold/warm/hot boot-to-prompt timing, standby vs ephemeral
cost breakdown, three workflow patterns, and Fly-specific gotchas
(DinD kernel requirements, the y/N preflight blocking automated
launch, pricing-may-have-moved hedge).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-10 01:18:08 -04:00
parent 43453c66ea
commit ec6261cd77
+106
View File
@@ -139,6 +139,112 @@ For the "VPN pivot" failure mode, see
remote VM back to your LAN. If the agent needs LAN resources, expose
those through a narrow API instead.
## Case study: Fly Machines
Fly.io's Machines product is a useful concrete worked example because
it satisfies all the provider requirements (root, Firecracker-backed
isolation, scriptable lifecycle, per-second billing) and surfaces the
gotchas the abstract pattern leaves implicit.
### Image strategy
Build a custom OCI image `FROM docker:dind` that bakes in:
- The claude-bottle repository checkout.
- A pre-built `claude-bottle:latest` image, saved via `docker save` on
your laptop and loaded in at image-build time
(`RUN docker load < claude-bottle.tar`) or pushed as a layer into
the dind storage. Without this step, the first in-VM `docker build`
runs `apt-get` and a global `npm install -g
@anthropic-ai/claude-code`, which adds 3090 s to every cold start.
- A `flyctl secrets`-injected `CLAUDE_BOTTLE_OAUTH_TOKEN`, exposed to
the VM's PID 1 as an env var.
- An entrypoint that starts dockerd, waits for it to be healthy, then
either drops into a shell or directly runs `cli.py start <agent>`.
Deploy with `flyctl deploy` or `flyctl machine run --image …`.
### Boot-to-first-prompt timing
Three scenarios, all assuming the custom image above (claude-bottle
image baked in, token injected, no in-VM rebuild):
| Phase | Cold (image not cached on Fly host) | Warm (image cached, `machine run` fresh) | Hot (`machine stop`ped, `machine start`) |
| --- | --- | --- | --- |
| Fly schedule + image fetch | 1030 s | 23 s | ~1 s |
| Firecracker kernel boot | ~1 s | ~1 s | ~1 s (resume) |
| dockerd-in-VM startup | 24 s | 24 s | 0 s (already running) |
| `cli.py start <agent>` housekeeping (network creates, pipelock sidecar, agent container, skill copy) | 46 s | 46 s | 46 s |
| Claude prints first prompt | 13 s | 13 s | 13 s |
| **End-to-end** | **~2045 s** | **~1017 s** | **~711 s** |
For interactive sessions the warm path is the realistic baseline once
the custom image is registered. The hot path trims only a few extra
seconds — the question of whether to keep stopped Machines on standby
is mostly about cost, not speed.
### Cost of standby vs. create-per-session
Stopped Fly Machines stop billing CPU/RAM but continue to bill for
storage and any allocated IPv4. A reasonable claude-bottle Machine
size (2 vCPU / 2 GB / ~3 GB rootfs) costs roughly:
| Item | While stopped | Monthly |
| --- | --- | --- |
| CPU + RAM | not billed | $0 |
| Rootfs storage | ~$0.15/GB-month | ~$0.45 |
| Dedicated IPv4 (if allocated) | $2/month flat | $2.00 |
| Dedicated IPv6 | free | $0 |
| Bandwidth | usage-based | $0 |
So **roughly $0.50$2.50/month per standby Machine**, with the IPv4
line dominating. Drop the dedicated v4 (use IPv6 or Fly's shared v4
via WireGuard) and standby falls under $1/month.
For comparison, running the same Machine 24/7 lands in the
$15$40/month range depending on size, and the create-and-destroy
pattern (one Machine per session, destroyed on exit) is effectively
$0 since you only pay for the seconds it ran.
### Practical pattern
Two reasonable workflows, plus one that's tempting but worse:
1. **Pure ephemeral.** `flyctl machine run` at session start,
`flyctl machine destroy` on exit. ~2045 s cold start, $0 idle.
Maximally isolated; nothing persists between sessions. Best fit
when sessions are infrequent or when state continuity across
sessions is itself a concern.
2. **Standby pool.** A small fleet of pre-built Machines that get
`start`ed fresh and `destroy`ed (or wiped) per session. The
*Machine identity* is short-lived but the image is pre-cached on
Fly's hosts, keeping warm-path latency at ~1017 s.
~$0.50$1/month per Machine in the pool without dedicated v4.
3. **Permanently stopped Machine, just `start`/`stop`.** Saves a few
extra seconds (~711 s hot) but is the weakest of the three on
the isolation axis — the rootfs persists across sessions, so
anything a previous session wrote is still there. Avoid unless
the saved seconds matter more than the state-continuity concern.
### Fly-specific caveats
- **DinD requires kernel features.** Fly Machines historically had
some namespacing quirks for nested Docker; verify on a smoke-test
Machine before committing. The pattern is supported (Fly's own
Remote Builders use it), but kernel/runtime updates have shifted
the requirements over time.
- **The launcher's interactive y/N preflight blocks automated remote
start.** `cli.py start` waits on `/dev/tty`. For an automated entry
point you need to pipe `y\n` into stdin, drive it from a pty, or
add a `--yes`/`--non-interactive` flag (a small patch). The
`--remote=user@host` ergonomics direction below would handle this
in passing.
- **Pricing has been re-tariffed multiple times.** The structure
(per-second compute, GB-month storage, $2/v4) has been stable;
specific rates may have moved. Verify against
[fly.io/docs/about/pricing](https://fly.io/docs/about/pricing)
before committing numbers to any planning doc.
## Optional ergonomics direction
A future addon — not architecturally necessary, just nicer: