ec6261cd77
test / run tests/run_tests.py (push) Successful in 13s
Concrete worked example covering image strategy (with the bake-the- claude-bottle-image-in optimization that elides 30-90s of in-VM build), cold/warm/hot boot-to-prompt timing, standby vs ephemeral cost breakdown, three workflow patterns, and Fly-specific gotchas (DinD kernel requirements, the y/N preflight blocking automated launch, pricing-may-have-moved hedge). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
296 lines
14 KiB
Markdown
296 lines
14 KiB
Markdown
# Remote Docker VM as an isolation upgrade for claude-bottle
|
||
|
||
Note on the cheapest practical path to stronger isolation than local
|
||
Docker: run claude-bottle unchanged on a remote Linux VM that has
|
||
dockerd. Complements `stronger-isolation-alternatives.md` (which
|
||
surveys runtime swaps like gVisor, Kata, Firecracker, Apple Container)
|
||
and `local-vs-remote-agent-execution.md` (which surveys the
|
||
local-vs-remote decision broadly).
|
||
|
||
## Summary
|
||
|
||
If the goal is "stronger isolation than Docker-on-my-laptop without
|
||
rewriting the runtime," the cleanest answer is to keep claude-bottle
|
||
exactly as it is and run it on a remote Linux VM where you can install
|
||
dockerd. The v1 design — pipelock as a separate container on a
|
||
`--internal` network, ephemeral agent containers, OAuth-token
|
||
forwarding — works as-is. The only thing that changes is that the
|
||
"host" is now a disposable VM you provisioned for the session, not your
|
||
laptop.
|
||
|
||
This is structurally equivalent to a Firecracker rewrite (Rung 3 in
|
||
`stronger-isolation-alternatives.md`), but the cloud provider operates
|
||
the runtime for you. It is also strictly cheaper than adopting a cloud
|
||
sandbox SDK (Vercel Sandbox, E2B, Cloudflare Sandbox SDK) because you
|
||
keep the existing Docker-shaped abstractions instead of swapping them
|
||
for a vendor API.
|
||
|
||
## The argument
|
||
|
||
### What changes in the threat model
|
||
|
||
The agent's blast radius shrinks from "developer laptop + everything
|
||
on the LAN" to "this disposable VM." Concretely, what's no longer
|
||
reachable on container escape:
|
||
|
||
- `~/.ssh/`, `~/.aws/credentials`, `~/.config/gh`, the macOS Keychain
|
||
- Browser cookies and session state
|
||
- Other dev machines on the home/office LAN
|
||
- NAS, printers, smart-home devices, anything else on the local network
|
||
|
||
What replaces it on the remote side: only what the operator chose to
|
||
ship to the VM for the session. Typically the OAuth token, optional SSH
|
||
keys for the bottle, the manifest, and the workspace if the agent needs
|
||
one. None of which are on the laptop after the VM is destroyed.
|
||
|
||
### Why the boundary is equivalent to v1, not weaker
|
||
|
||
A natural objection — raised in the design discussion that produced
|
||
this note — is that running pipelock and the agent on the same VM
|
||
collapses a network boundary into a kernel-namespace boundary, which
|
||
sounds weaker. It is not, *if you reuse Docker for the inner topology.*
|
||
|
||
Docker on the remote VM gives the agent and pipelock their own network
|
||
namespaces by default, with the agent attached to a `--internal`
|
||
network and pipelock straddling it and an egress bridge. That is the
|
||
same v1 topology. Bypassing pipelock from the agent requires the same
|
||
class of attack as bypassing it on a laptop: a kernel-level netns
|
||
escape inside the VM. The only difference is that the kernel under
|
||
attack belongs to a disposable VM, not the developer's machine.
|
||
|
||
In other words: the "weaker because colocated" framing only applies if
|
||
you naively run agent and pipelock as two processes in the same
|
||
namespace. With Docker on the VM, you don't.
|
||
|
||
### Why this is cheaper than the alternatives
|
||
|
||
| Path | Effort | Where the VM-grade boundary comes from |
|
||
| --- | --- | --- |
|
||
| gVisor (`runsc`) per bottle | ~1–2 days | Userspace syscall barrier; not a full VM |
|
||
| Kata Containers per bottle | ~1–2 days, Linux-only | Kata's microVM-per-container |
|
||
| Firecracker rewrite | 2–4 weeks | Self-operated Firecracker |
|
||
| Apple Container (macOS) | ~1 week spike + integration | Apple's Virtualization.framework, per-container |
|
||
| Cloud sandbox SDK (Vercel, E2B, …) | Days–weeks of API rewrite + lock-in | Provider-operated Firecracker / equivalent |
|
||
| **Remote Docker VM (this note)** | **0 lines of code** | **Cloud-provider hypervisor under the VM** |
|
||
|
||
The "stronger isolation alternatives" doc concludes that gVisor is the
|
||
right today-step and Apple Container is probably the right v2.
|
||
This note adds a third option that sits orthogonal to both: don't
|
||
change the runtime, change the host. Use it when the failure mode you
|
||
care about is "agent compromises my laptop" specifically, rather than
|
||
"agent escapes Docker into a kernel I share with other workloads."
|
||
|
||
## What the provider has to give you
|
||
|
||
Not every cloud sandbox is suitable. The minimum for this approach to
|
||
work:
|
||
|
||
- Root or rootless-Docker capability inside the VM. Rules out
|
||
Fargate-style locked-down container hosts and most "function" tier
|
||
FaaS. Verify before committing — Vercel Sandbox specifically may or
|
||
may not allow installing dockerd depending on tier; Fly Machines,
|
||
EC2, GCE, Hetzner, Linode, and self-hosted hypervisors give you full
|
||
control.
|
||
- Enough disk + RAM to host the claude-bottle image, the agent
|
||
container, and the pipelock sidecar. Headroom of ~2–4 GB RAM and
|
||
~5 GB disk is comfortable; less works for short sessions.
|
||
- An interactive reach path. SSH is fine. The launcher uses
|
||
`docker exec -it`, so any TTY-capable session works.
|
||
|
||
## What you give up
|
||
|
||
- **Typing latency.** Interactive Claude sessions over SSH have visible
|
||
per-keystroke latency; usually fine on wired/fiber, less fine on
|
||
Wi-Fi-to-cloud. Mosh helps if it's bothersome.
|
||
- **Token shipping.** `CLAUDE_BOTTLE_OAUTH_TOKEN` has to live on the
|
||
remote box for the launcher to forward it into containers. Use the
|
||
provider's secret-injection path (cloud-init user-data,
|
||
`flyctl secrets`, Tailscale-served local file, etc.). Never echo the
|
||
token onto the SSH command line; it ends up in the local shell
|
||
history and possibly the SSH server's auth log.
|
||
- **Idle cost.** Unless the VM is torn down between sessions, you pay
|
||
for it sitting idle. Ephemeral provisioning (one VM per session,
|
||
destroyed on exit) is the cheaper and more secure pattern; see
|
||
`local-vs-remote-agent-execution.md` on why ephemeral is also
|
||
recommended for credential-concentration reasons.
|
||
- **Source code goes to the VM.** Same as any remote-execution
|
||
topology. If the project is under NDA, the VM provider matters.
|
||
- **Provider trust.** Multi-tenancy side channels, supply-chain
|
||
compromise of the provider, insider risk. Generally smaller than
|
||
laptop-kernel-CVE risk, but the failure mode (provider-wide breach)
|
||
is correlated across all your sandboxes.
|
||
|
||
## Operational shape
|
||
|
||
The minimum-viable workflow, no claude-bottle code changes:
|
||
|
||
1. `terraform apply` / `flyctl machine run` / `gcloud compute
|
||
instances create` — provision a fresh Linux VM.
|
||
2. Install dockerd via the provider's image or a one-liner
|
||
(`curl -fsSL https://get.docker.com | sh`).
|
||
3. SSH in.
|
||
4. `git clone` claude-bottle on the VM, drop a manifest in place,
|
||
inject `CLAUDE_BOTTLE_OAUTH_TOKEN` via the provider's secrets path.
|
||
5. `./cli.py start <agent>` — the existing launcher handles the rest.
|
||
6. On exit: destroy the VM. No host artifacts persist.
|
||
|
||
For the "VPN pivot" failure mode, see
|
||
`local-vs-remote-agent-execution.md`. Short version: never VPN the
|
||
remote VM back to your LAN. If the agent needs LAN resources, expose
|
||
those through a narrow API instead.
|
||
|
||
## Case study: Fly Machines
|
||
|
||
Fly.io's Machines product is a useful concrete worked example because
|
||
it satisfies all the provider requirements (root, Firecracker-backed
|
||
isolation, scriptable lifecycle, per-second billing) and surfaces the
|
||
gotchas the abstract pattern leaves implicit.
|
||
|
||
### Image strategy
|
||
|
||
Build a custom OCI image `FROM docker:dind` that bakes in:
|
||
|
||
- The claude-bottle repository checkout.
|
||
- A pre-built `claude-bottle:latest` image, saved via `docker save` on
|
||
your laptop and loaded in at image-build time
|
||
(`RUN docker load < claude-bottle.tar`) or pushed as a layer into
|
||
the dind storage. Without this step, the first in-VM `docker build`
|
||
runs `apt-get` and a global `npm install -g
|
||
@anthropic-ai/claude-code`, which adds 30–90 s to every cold start.
|
||
- A `flyctl secrets`-injected `CLAUDE_BOTTLE_OAUTH_TOKEN`, exposed to
|
||
the VM's PID 1 as an env var.
|
||
- An entrypoint that starts dockerd, waits for it to be healthy, then
|
||
either drops into a shell or directly runs `cli.py start <agent>`.
|
||
|
||
Deploy with `flyctl deploy` or `flyctl machine run --image …`.
|
||
|
||
### Boot-to-first-prompt timing
|
||
|
||
Three scenarios, all assuming the custom image above (claude-bottle
|
||
image baked in, token injected, no in-VM rebuild):
|
||
|
||
| Phase | Cold (image not cached on Fly host) | Warm (image cached, `machine run` fresh) | Hot (`machine stop`ped, `machine start`) |
|
||
| --- | --- | --- | --- |
|
||
| Fly schedule + image fetch | 10–30 s | 2–3 s | ~1 s |
|
||
| Firecracker kernel boot | ~1 s | ~1 s | ~1 s (resume) |
|
||
| dockerd-in-VM startup | 2–4 s | 2–4 s | 0 s (already running) |
|
||
| `cli.py start <agent>` housekeeping (network creates, pipelock sidecar, agent container, skill copy) | 4–6 s | 4–6 s | 4–6 s |
|
||
| Claude prints first prompt | 1–3 s | 1–3 s | 1–3 s |
|
||
| **End-to-end** | **~20–45 s** | **~10–17 s** | **~7–11 s** |
|
||
|
||
For interactive sessions the warm path is the realistic baseline once
|
||
the custom image is registered. The hot path trims only a few extra
|
||
seconds — the question of whether to keep stopped Machines on standby
|
||
is mostly about cost, not speed.
|
||
|
||
### Cost of standby vs. create-per-session
|
||
|
||
Stopped Fly Machines stop billing CPU/RAM but continue to bill for
|
||
storage and any allocated IPv4. A reasonable claude-bottle Machine
|
||
size (2 vCPU / 2 GB / ~3 GB rootfs) costs roughly:
|
||
|
||
| Item | While stopped | Monthly |
|
||
| --- | --- | --- |
|
||
| CPU + RAM | not billed | $0 |
|
||
| Rootfs storage | ~$0.15/GB-month | ~$0.45 |
|
||
| Dedicated IPv4 (if allocated) | $2/month flat | $2.00 |
|
||
| Dedicated IPv6 | free | $0 |
|
||
| Bandwidth | usage-based | $0 |
|
||
|
||
So **roughly $0.50–$2.50/month per standby Machine**, with the IPv4
|
||
line dominating. Drop the dedicated v4 (use IPv6 or Fly's shared v4
|
||
via WireGuard) and standby falls under $1/month.
|
||
|
||
For comparison, running the same Machine 24/7 lands in the
|
||
$15–$40/month range depending on size, and the create-and-destroy
|
||
pattern (one Machine per session, destroyed on exit) is effectively
|
||
$0 since you only pay for the seconds it ran.
|
||
|
||
### Practical pattern
|
||
|
||
Two reasonable workflows, plus one that's tempting but worse:
|
||
|
||
1. **Pure ephemeral.** `flyctl machine run` at session start,
|
||
`flyctl machine destroy` on exit. ~20–45 s cold start, $0 idle.
|
||
Maximally isolated; nothing persists between sessions. Best fit
|
||
when sessions are infrequent or when state continuity across
|
||
sessions is itself a concern.
|
||
2. **Standby pool.** A small fleet of pre-built Machines that get
|
||
`start`ed fresh and `destroy`ed (or wiped) per session. The
|
||
*Machine identity* is short-lived but the image is pre-cached on
|
||
Fly's hosts, keeping warm-path latency at ~10–17 s.
|
||
~$0.50–$1/month per Machine in the pool without dedicated v4.
|
||
3. **Permanently stopped Machine, just `start`/`stop`.** Saves a few
|
||
extra seconds (~7–11 s hot) but is the weakest of the three on
|
||
the isolation axis — the rootfs persists across sessions, so
|
||
anything a previous session wrote is still there. Avoid unless
|
||
the saved seconds matter more than the state-continuity concern.
|
||
|
||
### Fly-specific caveats
|
||
|
||
- **DinD requires kernel features.** Fly Machines historically had
|
||
some namespacing quirks for nested Docker; verify on a smoke-test
|
||
Machine before committing. The pattern is supported (Fly's own
|
||
Remote Builders use it), but kernel/runtime updates have shifted
|
||
the requirements over time.
|
||
- **The launcher's interactive y/N preflight blocks automated remote
|
||
start.** `cli.py start` waits on `/dev/tty`. For an automated entry
|
||
point you need to pipe `y\n` into stdin, drive it from a pty, or
|
||
add a `--yes`/`--non-interactive` flag (a small patch). The
|
||
`--remote=user@host` ergonomics direction below would handle this
|
||
in passing.
|
||
- **Pricing has been re-tariffed multiple times.** The structure
|
||
(per-second compute, GB-month storage, $2/v4) has been stable;
|
||
specific rates may have moved. Verify against
|
||
[fly.io/docs/about/pricing](https://fly.io/docs/about/pricing)
|
||
before committing numbers to any planning doc.
|
||
|
||
## Optional ergonomics direction
|
||
|
||
A future addon — not architecturally necessary, just nicer:
|
||
|
||
- `cli.py start --remote=user@host <agent>` that:
|
||
- rsyncs the manifest and (optionally) cwd to the remote
|
||
- SSHes in with the OAuth token forwarded via `SendEnv`
|
||
- runs `cli.py start <agent>` on the remote
|
||
- forwards the TTY for the interactive session
|
||
- on exit, optionally tears down the remote VM via a provider hook
|
||
(`flyctl machine destroy`, `terraform destroy`, etc.)
|
||
|
||
This is roughly a day of work and would make the remote pattern feel
|
||
like a single launcher invocation. It is the only piece of remote
|
||
support that would benefit from being upstreamed; everything else is
|
||
operator workflow.
|
||
|
||
## Recommendation
|
||
|
||
For users who want stronger isolation than local Docker without
|
||
rewriting the runtime, this is probably the right answer. Cleaner than
|
||
gVisor (which only adds a syscall barrier on the same kernel), cleaner
|
||
than a Firecracker rewrite (which is weeks of work), cleaner than
|
||
adopting a cloud-sandbox SDK (which trades the v1 design for a vendor
|
||
API). The pre-existing `local-vs-remote-agent-execution.md` decision
|
||
heuristics still apply for *whether* this is worth the operational
|
||
overhead in any given setting.
|
||
|
||
If we wanted to land this as a real project direction:
|
||
|
||
1. Add a short "Running claude-bottle on a remote Docker VM" section
|
||
to the README pointing at this doc.
|
||
2. Optionally: prototype the `--remote=user@host` launcher subcommand.
|
||
3. Update `stronger-isolation-alternatives.md` to mention the remote
|
||
Docker VM as a fourth path, since the survey is otherwise
|
||
incomplete.
|
||
|
||
## Caveats
|
||
|
||
- "Just install Docker" isn't free on every provider; some lock down
|
||
what kernel modules and caps the VM has. Spike-test before committing.
|
||
- Multi-tenant cloud hypervisors (EC2, GCE, Vercel) have their own
|
||
side-channel and supply-chain risk surfaces, separately bounded from
|
||
the laptop-kernel risk this approach addresses.
|
||
- The remote-VM topology still does not protect source code or secrets
|
||
from the cloud provider — it protects them from a kernel exploit
|
||
reaching the developer's laptop. Different fear, different fix.
|
||
- Research conducted 2026-05-10.
|