docs: add Fly Machines case study to remote-docker-vm-isolation note
test / run tests/run_tests.py (push) Successful in 13s

Concrete worked example covering image strategy (with the bake-the-
claude-bottle-image-in optimization that elides 30-90s of in-VM
build), cold/warm/hot boot-to-prompt timing, standby vs ephemeral
cost breakdown, three workflow patterns, and Fly-specific gotchas
(DinD kernel requirements, the y/N preflight blocking automated
launch, pricing-may-have-moved hedge).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-10 01:18:08 -04:00
parent 43453c66ea
commit ec6261cd77
+106
View File
@@ -139,6 +139,112 @@ For the "VPN pivot" failure mode, see
remote VM back to your LAN. If the agent needs LAN resources, expose remote VM back to your LAN. If the agent needs LAN resources, expose
those through a narrow API instead. those through a narrow API instead.
## Case study: Fly Machines
Fly.io's Machines product is a useful concrete worked example because
it satisfies all the provider requirements (root, Firecracker-backed
isolation, scriptable lifecycle, per-second billing) and surfaces the
gotchas the abstract pattern leaves implicit.
### Image strategy
Build a custom OCI image `FROM docker:dind` that bakes in:
- The claude-bottle repository checkout.
- A pre-built `claude-bottle:latest` image, saved via `docker save` on
your laptop and loaded in at image-build time
(`RUN docker load < claude-bottle.tar`) or pushed as a layer into
the dind storage. Without this step, the first in-VM `docker build`
runs `apt-get` and a global `npm install -g
@anthropic-ai/claude-code`, which adds 3090 s to every cold start.
- A `flyctl secrets`-injected `CLAUDE_BOTTLE_OAUTH_TOKEN`, exposed to
the VM's PID 1 as an env var.
- An entrypoint that starts dockerd, waits for it to be healthy, then
either drops into a shell or directly runs `cli.py start <agent>`.
Deploy with `flyctl deploy` or `flyctl machine run --image …`.
### Boot-to-first-prompt timing
Three scenarios, all assuming the custom image above (claude-bottle
image baked in, token injected, no in-VM rebuild):
| Phase | Cold (image not cached on Fly host) | Warm (image cached, `machine run` fresh) | Hot (`machine stop`ped, `machine start`) |
| --- | --- | --- | --- |
| Fly schedule + image fetch | 1030 s | 23 s | ~1 s |
| Firecracker kernel boot | ~1 s | ~1 s | ~1 s (resume) |
| dockerd-in-VM startup | 24 s | 24 s | 0 s (already running) |
| `cli.py start <agent>` housekeeping (network creates, pipelock sidecar, agent container, skill copy) | 46 s | 46 s | 46 s |
| Claude prints first prompt | 13 s | 13 s | 13 s |
| **End-to-end** | **~2045 s** | **~1017 s** | **~711 s** |
For interactive sessions the warm path is the realistic baseline once
the custom image is registered. The hot path trims only a few extra
seconds — the question of whether to keep stopped Machines on standby
is mostly about cost, not speed.
### Cost of standby vs. create-per-session
Stopped Fly Machines stop billing CPU/RAM but continue to bill for
storage and any allocated IPv4. A reasonable claude-bottle Machine
size (2 vCPU / 2 GB / ~3 GB rootfs) costs roughly:
| Item | While stopped | Monthly |
| --- | --- | --- |
| CPU + RAM | not billed | $0 |
| Rootfs storage | ~$0.15/GB-month | ~$0.45 |
| Dedicated IPv4 (if allocated) | $2/month flat | $2.00 |
| Dedicated IPv6 | free | $0 |
| Bandwidth | usage-based | $0 |
So **roughly $0.50$2.50/month per standby Machine**, with the IPv4
line dominating. Drop the dedicated v4 (use IPv6 or Fly's shared v4
via WireGuard) and standby falls under $1/month.
For comparison, running the same Machine 24/7 lands in the
$15$40/month range depending on size, and the create-and-destroy
pattern (one Machine per session, destroyed on exit) is effectively
$0 since you only pay for the seconds it ran.
### Practical pattern
Two reasonable workflows, plus one that's tempting but worse:
1. **Pure ephemeral.** `flyctl machine run` at session start,
`flyctl machine destroy` on exit. ~2045 s cold start, $0 idle.
Maximally isolated; nothing persists between sessions. Best fit
when sessions are infrequent or when state continuity across
sessions is itself a concern.
2. **Standby pool.** A small fleet of pre-built Machines that get
`start`ed fresh and `destroy`ed (or wiped) per session. The
*Machine identity* is short-lived but the image is pre-cached on
Fly's hosts, keeping warm-path latency at ~1017 s.
~$0.50$1/month per Machine in the pool without dedicated v4.
3. **Permanently stopped Machine, just `start`/`stop`.** Saves a few
extra seconds (~711 s hot) but is the weakest of the three on
the isolation axis — the rootfs persists across sessions, so
anything a previous session wrote is still there. Avoid unless
the saved seconds matter more than the state-continuity concern.
### Fly-specific caveats
- **DinD requires kernel features.** Fly Machines historically had
some namespacing quirks for nested Docker; verify on a smoke-test
Machine before committing. The pattern is supported (Fly's own
Remote Builders use it), but kernel/runtime updates have shifted
the requirements over time.
- **The launcher's interactive y/N preflight blocks automated remote
start.** `cli.py start` waits on `/dev/tty`. For an automated entry
point you need to pipe `y\n` into stdin, drive it from a pty, or
add a `--yes`/`--non-interactive` flag (a small patch). The
`--remote=user@host` ergonomics direction below would handle this
in passing.
- **Pricing has been re-tariffed multiple times.** The structure
(per-second compute, GB-month storage, $2/v4) has been stable;
specific rates may have moved. Verify against
[fly.io/docs/about/pricing](https://fly.io/docs/about/pricing)
before committing numbers to any planning doc.
## Optional ergonomics direction ## Optional ergonomics direction
A future addon — not architecturally necessary, just nicer: A future addon — not architecturally necessary, just nicer: