diff --git a/docs/research/bash-vs-python-vs-go.md b/docs/research/bash-vs-python-vs-go.md
new file mode 100644
index 0000000..292e8d4
--- /dev/null
+++ b/docs/research/bash-vs-python-vs-go.md
@@ -0,0 +1,128 @@
+# Implementation language: bash vs. Python vs. Go
+
+Research into which runtime claude-bottle should be implemented in, given
+where the project is today (~1250 lines, Python, mostly orchestration of
+`docker` / `flyctl` / `ssh`). The project started in bash and was rewritten
+to Python; this note evaluates whether either of the other two options
+would be a better fit going forward.
+
+## Summary
+
+Stay on Python. Switch to Go if and when distribution friction becomes the
+dominant pain — i.e., when bug reports about Python interpreter / venv
+behavior start outweighing bug reports about claude-bottle itself. Bash is
+not the right tool at the project's current size; reverting would be a
+regression.
+
+The single thing worth doing *now* regardless of language is keeping the
+backend abstraction (local-docker / generic-remote / fly) clean enough that
+a future Go rewrite would be mechanical. If the abstraction is right, the
+language choice is reversible. If it isn't, the cost of switching balloons
+because you're rewriting *and* redesigning at once.
+
+## Axes that matter for this project
+
+The relevant criteria, in roughly the order they bite:
+
+1. **Distribution friction** — how easy is it to install the tool on a
+   new dev machine.
+2. **Orchestration ergonomics** — 90% of the work is shelling out to
+   `docker`, `flyctl`, and `ssh`, so impedance match with subprocess
+   invocation matters.
+3. **JSON manifest handling** — the manifest is structured config with
+   nested fields, validation rules, and per-agent overrides.
+4. **Cross-platform behavior** — must work the same on macOS and Linux,
+   ideally without per-platform shims.
+5. **Test-matrix burden** — local-mac × local-linux × generic-remote × fly
+   is already a wide test surface. The language choice should minimize
+   what it adds to that matrix, not expand it.
+6. **Onboarding new contributors** — single maintainer today, but the
+   project is published and may attract drive-by PRs.
+
+## Comparison
+
+| | Bash | Python | Go |
+|---|---|---|---|
+| Distribution | Zero runtime — `chmod +x && run` | `uv run` is good; bare `python3` is "which one" | Single static binary, `go install` |
+| Orchestration ergonomics | Native — pipes, `set -e`, no marshaling | Verbose (`subprocess.run(...)` × 50) | Verbose (`if err != nil { return err }` × 50) |
+| JSON manifest | Painful past trivial — `jq` for nested writes is ugly | Excellent — `json` + dataclasses/pydantic | Excellent — struct tags, compile-time schema |
+| Cross-platform | macOS bash 3.2 vs Linux 5.x + BSD/GNU userland = real hazard | Mostly a non-issue with `uv` | Best — same binary everywhere |
+| Testability | Hard. No good unit story | Mature (pytest, subprocess mocking) | Mature (table-driven, `exec.Command` is mockable) |
+| Startup time | ~5ms | ~100-200ms cold | ~10ms |
+| Onboarding new contributors | High barrier past ~500 lines | Largest pool | Smaller but technical pool |
+| Rewrite cost from current state | Already done once, regretted | Sunk cost — zero | ~1 focused week for ~1200 lines |
+
+## Bash
+
+Right tool *if the project stays under ~500 lines*. claude-bottle has
+already crossed that threshold (~1250 lines), and the orchestration is no
+longer "stitch CLIs together" — it has manifest validation, env-var
+resolution, network and sidecar lifecycle, and SSH provisioning. Bash
+scales badly for all four:
+
+- **macOS bash 3.2 ceiling.** No associative arrays, no `wait -n`, no
+  `${var,,}`, no namerefs. Anything written for Linux bash 5.x has to be
+  back-ported by hand.
+- **BSD vs GNU userland divergence.** `sed -i`, `date`, `readlink`, `mktemp`
+  flags all behave differently. Every script grows portability shims.
+- **JSON via `jq` is workable but ugly.** Nested writes and per-agent
+  override merges become unreadable.
+- **No real test story.** Black-box integration tests only.
+- **Silent failure modes.** `set -euo pipefail` does not catch every case
+  (e.g. `null` propagating through a pipeline into a `docker run` flag),
+  and command substitutions can lose `set -e`. Subtle, hard to audit.
+
+The original project was written in bash and rewritten to Python for
+exactly these reasons. Reverting would not be a portability win; it would
+be a portability loss.
+
+## Python
+
+Right tool *for where the project is now*. The sunk cost is real and
+accurate: it's testable, the orchestration code reads cleanly, the
+manifest layer benefits from structured types, and `uv` plus PEP 723
+inline metadata makes the historical "which python3 with which deps"
+complaint mostly historical.
+
+Real costs that remain:
+
+- **Startup latency** — ~100-200ms cold is noticeable but not bad for an
+  interactive tool that runs a single command and hands off to `docker
+  exec -it`.
+- **Distribution to non-developer audiences** — has rough edges if the
+  user doesn't have `uv`. Acceptable for the current audience (developers
+  who already have a Python).
+- **Interpreter version drift** — Ubuntu 22.04 ships 3.10, fresh distros
+  ship 3.13. Behavior deltas between minor versions exist but are rare in
+  the standard library surface this project uses.
+
+## Go
+
+Right tool *if and when distribution becomes the dominant pain*. Single
+static binary works identically across macOS arm64, macOS amd64, Linux
+amd64, Linux arm64 — which neutralizes most of the cross-platform leg of
+the test matrix. Startup is fast enough that the tool feels native.
+
+Costs:
+
+- **Rewrite cost** — roughly one focused week for ~1200 lines of mostly
+  mechanical orchestration. Not interesting work.
+- **Verbosity** — `if err != nil { return err }` is similar in volume to
+  Python's `subprocess.run(..., check=True)` plumbing. No win on terseness.
+- **Smaller AI-tooling ecosystem** — most Claude Code-adjacent helpers
+  and skills are Python or JS. Drive-by contributors are a smaller pool.
+  Any future "import a third-party Python skill package" idea gets harder.
+- **Iteration loop** — no "edit the script, rerun" — you build, then run.
+  Minor; not load-bearing for a single maintainer.
+
+## Recommendation
+
+Stay on Python. The signal to watch for, before reconsidering, is bug
+reports about Python interpreter or venv behavior outnumbering bug reports
+about claude-bottle's actual logic. Until that pattern shows up, the Go
+rewrite isn't paying for itself.
+
+Independent of language: invest in the backend abstraction now. A clean
+`Backend` interface (with `run`, `exec`, `cp`, `build`, `network_*`)
+makes the language choice reversible. A leaky abstraction makes it
+expensive in any direction.