Files
bot-bottle/docs/research/remote-docker-vm-isolation.md
T
didericis 43453c66ea
test / run tests/run_tests.py (push) Successful in 15s
docs: add research note on remote Docker VM as an isolation upgrade
Argues that running claude-bottle unchanged on a remote Linux VM with
dockerd is the cheapest practical path to stronger isolation than
local Docker — preserves the v1 pipelock topology, requires zero code
changes, and shrinks the agent's blast radius from the developer
laptop to a disposable VM. Cross-references the existing
stronger-isolation-alternatives and local-vs-remote-agent-execution
notes so the research set composes cleanly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 01:07:17 -04:00

9.0 KiB
Raw Blame History

Remote Docker VM as an isolation upgrade for claude-bottle

Note on the cheapest practical path to stronger isolation than local Docker: run claude-bottle unchanged on a remote Linux VM that has dockerd. Complements stronger-isolation-alternatives.md (which surveys runtime swaps like gVisor, Kata, Firecracker, Apple Container) and local-vs-remote-agent-execution.md (which surveys the local-vs-remote decision broadly).

Summary

If the goal is "stronger isolation than Docker-on-my-laptop without rewriting the runtime," the cleanest answer is to keep claude-bottle exactly as it is and run it on a remote Linux VM where you can install dockerd. The v1 design — pipelock as a separate container on a --internal network, ephemeral agent containers, OAuth-token forwarding — works as-is. The only thing that changes is that the "host" is now a disposable VM you provisioned for the session, not your laptop.

This is structurally equivalent to a Firecracker rewrite (Rung 3 in stronger-isolation-alternatives.md), but the cloud provider operates the runtime for you. It is also strictly cheaper than adopting a cloud sandbox SDK (Vercel Sandbox, E2B, Cloudflare Sandbox SDK) because you keep the existing Docker-shaped abstractions instead of swapping them for a vendor API.

The argument

What changes in the threat model

The agent's blast radius shrinks from "developer laptop + everything on the LAN" to "this disposable VM." Concretely, what's no longer reachable on container escape:

  • ~/.ssh/, ~/.aws/credentials, ~/.config/gh, the macOS Keychain
  • Browser cookies and session state
  • Other dev machines on the home/office LAN
  • NAS, printers, smart-home devices, anything else on the local network

What replaces it on the remote side: only what the operator chose to ship to the VM for the session. Typically the OAuth token, optional SSH keys for the bottle, the manifest, and the workspace if the agent needs one. None of which are on the laptop after the VM is destroyed.

Why the boundary is equivalent to v1, not weaker

A natural objection — raised in the design discussion that produced this note — is that running pipelock and the agent on the same VM collapses a network boundary into a kernel-namespace boundary, which sounds weaker. It is not, if you reuse Docker for the inner topology.

Docker on the remote VM gives the agent and pipelock their own network namespaces by default, with the agent attached to a --internal network and pipelock straddling it and an egress bridge. That is the same v1 topology. Bypassing pipelock from the agent requires the same class of attack as bypassing it on a laptop: a kernel-level netns escape inside the VM. The only difference is that the kernel under attack belongs to a disposable VM, not the developer's machine.

In other words: the "weaker because colocated" framing only applies if you naively run agent and pipelock as two processes in the same namespace. With Docker on the VM, you don't.

Why this is cheaper than the alternatives

Path Effort Where the VM-grade boundary comes from
gVisor (runsc) per bottle ~12 days Userspace syscall barrier; not a full VM
Kata Containers per bottle ~12 days, Linux-only Kata's microVM-per-container
Firecracker rewrite 24 weeks Self-operated Firecracker
Apple Container (macOS) ~1 week spike + integration Apple's Virtualization.framework, per-container
Cloud sandbox SDK (Vercel, E2B, …) Daysweeks of API rewrite + lock-in Provider-operated Firecracker / equivalent
Remote Docker VM (this note) 0 lines of code Cloud-provider hypervisor under the VM

The "stronger isolation alternatives" doc concludes that gVisor is the right today-step and Apple Container is probably the right v2. This note adds a third option that sits orthogonal to both: don't change the runtime, change the host. Use it when the failure mode you care about is "agent compromises my laptop" specifically, rather than "agent escapes Docker into a kernel I share with other workloads."

What the provider has to give you

Not every cloud sandbox is suitable. The minimum for this approach to work:

  • Root or rootless-Docker capability inside the VM. Rules out Fargate-style locked-down container hosts and most "function" tier FaaS. Verify before committing — Vercel Sandbox specifically may or may not allow installing dockerd depending on tier; Fly Machines, EC2, GCE, Hetzner, Linode, and self-hosted hypervisors give you full control.
  • Enough disk + RAM to host the claude-bottle image, the agent container, and the pipelock sidecar. Headroom of ~24 GB RAM and ~5 GB disk is comfortable; less works for short sessions.
  • An interactive reach path. SSH is fine. The launcher uses docker exec -it, so any TTY-capable session works.

What you give up

  • Typing latency. Interactive Claude sessions over SSH have visible per-keystroke latency; usually fine on wired/fiber, less fine on Wi-Fi-to-cloud. Mosh helps if it's bothersome.
  • Token shipping. CLAUDE_BOTTLE_OAUTH_TOKEN has to live on the remote box for the launcher to forward it into containers. Use the provider's secret-injection path (cloud-init user-data, flyctl secrets, Tailscale-served local file, etc.). Never echo the token onto the SSH command line; it ends up in the local shell history and possibly the SSH server's auth log.
  • Idle cost. Unless the VM is torn down between sessions, you pay for it sitting idle. Ephemeral provisioning (one VM per session, destroyed on exit) is the cheaper and more secure pattern; see local-vs-remote-agent-execution.md on why ephemeral is also recommended for credential-concentration reasons.
  • Source code goes to the VM. Same as any remote-execution topology. If the project is under NDA, the VM provider matters.
  • Provider trust. Multi-tenancy side channels, supply-chain compromise of the provider, insider risk. Generally smaller than laptop-kernel-CVE risk, but the failure mode (provider-wide breach) is correlated across all your sandboxes.

Operational shape

The minimum-viable workflow, no claude-bottle code changes:

  1. terraform apply / flyctl machine run / gcloud compute instances create — provision a fresh Linux VM.
  2. Install dockerd via the provider's image or a one-liner (curl -fsSL https://get.docker.com | sh).
  3. SSH in.
  4. git clone claude-bottle on the VM, drop a manifest in place, inject CLAUDE_BOTTLE_OAUTH_TOKEN via the provider's secrets path.
  5. ./cli.py start <agent> — the existing launcher handles the rest.
  6. On exit: destroy the VM. No host artifacts persist.

For the "VPN pivot" failure mode, see local-vs-remote-agent-execution.md. Short version: never VPN the remote VM back to your LAN. If the agent needs LAN resources, expose those through a narrow API instead.

Optional ergonomics direction

A future addon — not architecturally necessary, just nicer:

  • cli.py start --remote=user@host <agent> that:
    • rsyncs the manifest and (optionally) cwd to the remote
    • SSHes in with the OAuth token forwarded via SendEnv
    • runs cli.py start <agent> on the remote
    • forwards the TTY for the interactive session
    • on exit, optionally tears down the remote VM via a provider hook (flyctl machine destroy, terraform destroy, etc.)

This is roughly a day of work and would make the remote pattern feel like a single launcher invocation. It is the only piece of remote support that would benefit from being upstreamed; everything else is operator workflow.

Recommendation

For users who want stronger isolation than local Docker without rewriting the runtime, this is probably the right answer. Cleaner than gVisor (which only adds a syscall barrier on the same kernel), cleaner than a Firecracker rewrite (which is weeks of work), cleaner than adopting a cloud-sandbox SDK (which trades the v1 design for a vendor API). The pre-existing local-vs-remote-agent-execution.md decision heuristics still apply for whether this is worth the operational overhead in any given setting.

If we wanted to land this as a real project direction:

  1. Add a short "Running claude-bottle on a remote Docker VM" section to the README pointing at this doc.
  2. Optionally: prototype the --remote=user@host launcher subcommand.
  3. Update stronger-isolation-alternatives.md to mention the remote Docker VM as a fourth path, since the survey is otherwise incomplete.

Caveats

  • "Just install Docker" isn't free on every provider; some lock down what kernel modules and caps the VM has. Spike-test before committing.
  • Multi-tenant cloud hypervisors (EC2, GCE, Vercel) have their own side-channel and supply-chain risk surfaces, separately bounded from the laptop-kernel risk this approach addresses.
  • The remote-VM topology still does not protect source code or secrets from the cloud provider — it protects them from a kernel exploit reaching the developer's laptop. Different fear, different fix.
  • Research conducted 2026-05-10.