docs: add research note comparing bash, Python, and Go for the CLI
test / run tests/run_tests.py (push) Successful in 14s
test / run tests/run_tests.py (push) Successful in 14s
Captures the reasoning for staying on Python, the conditions under which a Go rewrite would pay for itself, and why bash isn't viable at the project's current size. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,128 @@
|
||||
# Implementation language: bash vs. Python vs. Go
|
||||
|
||||
Research into which runtime claude-bottle should be implemented in, given
|
||||
where the project is today (~1250 lines, Python, mostly orchestration of
|
||||
`docker` / `flyctl` / `ssh`). The project started in bash and was rewritten
|
||||
to Python; this note evaluates whether either of the other two options
|
||||
would be a better fit going forward.
|
||||
|
||||
## Summary
|
||||
|
||||
Stay on Python. Switch to Go if and when distribution friction becomes the
|
||||
dominant pain — i.e., when bug reports about Python interpreter / venv
|
||||
behavior start outweighing bug reports about claude-bottle itself. Bash is
|
||||
not the right tool at the project's current size; reverting would be a
|
||||
regression.
|
||||
|
||||
The single thing worth doing *now* regardless of language is keeping the
|
||||
backend abstraction (local-docker / generic-remote / fly) clean enough that
|
||||
a future Go rewrite would be mechanical. If the abstraction is right, the
|
||||
language choice is reversible. If it isn't, the cost of switching balloons
|
||||
because you're rewriting *and* redesigning at once.
|
||||
|
||||
## Axes that matter for this project
|
||||
|
||||
The relevant criteria, in roughly the order they bite:
|
||||
|
||||
1. **Distribution friction** — how easy is it to install the tool on a
|
||||
new dev machine.
|
||||
2. **Orchestration ergonomics** — 90% of the work is shelling out to
|
||||
`docker`, `flyctl`, and `ssh`, so impedance match with subprocess
|
||||
invocation matters.
|
||||
3. **JSON manifest handling** — the manifest is structured config with
|
||||
nested fields, validation rules, and per-agent overrides.
|
||||
4. **Cross-platform behavior** — must work the same on macOS and Linux,
|
||||
ideally without per-platform shims.
|
||||
5. **Test-matrix burden** — local-mac × local-linux × generic-remote × fly
|
||||
is already a wide test surface. The language choice should minimize
|
||||
what it adds to that matrix, not expand it.
|
||||
6. **Onboarding new contributors** — single maintainer today, but the
|
||||
project is published and may attract drive-by PRs.
|
||||
|
||||
## Comparison
|
||||
|
||||
| | Bash | Python | Go |
|
||||
|---|---|---|---|
|
||||
| Distribution | Zero runtime — `chmod +x && run` | `uv run` is good; bare `python3` is "which one" | Single static binary, `go install` |
|
||||
| Orchestration ergonomics | Native — pipes, `set -e`, no marshaling | Verbose (`subprocess.run(...)` × 50) | Verbose (`if err != nil { return err }` × 50) |
|
||||
| JSON manifest | Painful past trivial — `jq` for nested writes is ugly | Excellent — `json` + dataclasses/pydantic | Excellent — struct tags, compile-time schema |
|
||||
| Cross-platform | macOS bash 3.2 vs Linux 5.x + BSD/GNU userland = real hazard | Mostly a non-issue with `uv` | Best — same binary everywhere |
|
||||
| Testability | Hard. No good unit story | Mature (pytest, subprocess mocking) | Mature (table-driven, `exec.Command` is mockable) |
|
||||
| Startup time | ~5ms | ~100-200ms cold | ~10ms |
|
||||
| Onboarding new contributors | High barrier past ~500 lines | Largest pool | Smaller but technical pool |
|
||||
| Rewrite cost from current state | Already done once, regretted | Sunk cost — zero | ~1 focused week for ~1200 lines |
|
||||
|
||||
## Bash
|
||||
|
||||
Right tool *if the project stays under ~500 lines*. claude-bottle has
|
||||
already crossed that threshold (~1250 lines), and the orchestration is no
|
||||
longer "stitch CLIs together" — it has manifest validation, env-var
|
||||
resolution, network and sidecar lifecycle, and SSH provisioning. Bash
|
||||
scales badly for all four:
|
||||
|
||||
- **macOS bash 3.2 ceiling.** No associative arrays, no `wait -n`, no
|
||||
`${var,,}`, no namerefs. Anything written for Linux bash 5.x has to be
|
||||
back-ported by hand.
|
||||
- **BSD vs GNU userland divergence.** `sed -i`, `date`, `readlink`, `mktemp`
|
||||
flags all behave differently. Every script grows portability shims.
|
||||
- **JSON via `jq` is workable but ugly.** Nested writes and per-agent
|
||||
override merges become unreadable.
|
||||
- **No real test story.** Black-box integration tests only.
|
||||
- **Silent failure modes.** `set -euo pipefail` does not catch every case
|
||||
(e.g. `null` propagating through a pipeline into a `docker run` flag),
|
||||
and command substitutions can lose `set -e`. Subtle, hard to audit.
|
||||
|
||||
The original project was written in bash and rewritten to Python for
|
||||
exactly these reasons. Reverting would not be a portability win; it would
|
||||
be a portability loss.
|
||||
|
||||
## Python
|
||||
|
||||
Right tool *for where the project is now*. The sunk cost is real and
|
||||
accurate: it's testable, the orchestration code reads cleanly, the
|
||||
manifest layer benefits from structured types, and `uv` plus PEP 723
|
||||
inline metadata makes the historical "which python3 with which deps"
|
||||
complaint mostly historical.
|
||||
|
||||
Real costs that remain:
|
||||
|
||||
- **Startup latency** — ~100-200ms cold is noticeable but not bad for an
|
||||
interactive tool that runs a single command and hands off to `docker
|
||||
exec -it`.
|
||||
- **Distribution to non-developer audiences** — has rough edges if the
|
||||
user doesn't have `uv`. Acceptable for the current audience (developers
|
||||
who already have a Python).
|
||||
- **Interpreter version drift** — Ubuntu 22.04 ships 3.10, fresh distros
|
||||
ship 3.13. Behavior deltas between minor versions exist but are rare in
|
||||
the standard library surface this project uses.
|
||||
|
||||
## Go
|
||||
|
||||
Right tool *if and when distribution becomes the dominant pain*. Single
|
||||
static binary works identically across macOS arm64, macOS amd64, Linux
|
||||
amd64, Linux arm64 — which neutralizes most of the cross-platform leg of
|
||||
the test matrix. Startup is fast enough that the tool feels native.
|
||||
|
||||
Costs:
|
||||
|
||||
- **Rewrite cost** — roughly one focused week for ~1200 lines of mostly
|
||||
mechanical orchestration. Not interesting work.
|
||||
- **Verbosity** — `if err != nil { return err }` is similar in volume to
|
||||
Python's `subprocess.run(..., check=True)` plumbing. No win on terseness.
|
||||
- **Smaller AI-tooling ecosystem** — most Claude Code-adjacent helpers
|
||||
and skills are Python or JS. Drive-by contributors are a smaller pool.
|
||||
Any future "import a third-party Python skill package" idea gets harder.
|
||||
- **Iteration loop** — no "edit the script, rerun" — you build, then run.
|
||||
Minor; not load-bearing for a single maintainer.
|
||||
|
||||
## Recommendation
|
||||
|
||||
Stay on Python. The signal to watch for, before reconsidering, is bug
|
||||
reports about Python interpreter or venv behavior outnumbering bug reports
|
||||
about claude-bottle's actual logic. Until that pattern shows up, the Go
|
||||
rewrite isn't paying for itself.
|
||||
|
||||
Independent of language: invest in the backend abstraction now. A clean
|
||||
`Backend` interface (with `run`, `exec`, `cp`, `build`, `network_*`)
|
||||
makes the language choice reversible. A leaky abstraction makes it
|
||||
expensive in any direction.
|
||||
Reference in New Issue
Block a user