Assisted-by: Codex
9.0 KiB
Built-in Supervisor Design
Question
Can bot-bottle grow a built-in supervisor — TUI inventory plus PR-feedback routing — without breaking the per-bottle isolation model, and without departing from the Python-stdlib-first, low-dependency posture?
Context
bot-bottle today is a fleet executor: ./cli.py start <agent> brings up one bottle (agent container + pipelock + optional git-gate + optional cred-proxy on a per-bottle internal network), and cli.py tears it down when the session ends. There is no inventory view, no idle-detection, no automated reaction to PR or CI events. In parallel use, a human is the supervisor — opening one terminal per bottle, switching between them, and watching upstream PR state by hand.
A separate survey of the broader ecosystem (agent control dashboards research, mid-2026) sorts dashboards into five tiers (session managers, parallel runners, Kanban boards, mission-control SPAs, observability backends). The earlier first-pass conclusion was that a full SPA tier conflicts with bot-bottle's isolation model. This doc reconsiders the smaller question: a TUI supervisor in the existing Python CLI.
What I got wrong the first time
The earlier framing treated "add a supervisor" as synonymous with "adopt something Composio-AO-shaped" — a Next.js SPA with plugins, dashboards, and a long-running web server. On that framing, the answer is correctly "no, that's too heavy and breaks isolation."
But the framing collapses two different costs that aren't actually coupled:
- The runtime cost of each bottle (already paid: container + 1–3 sidecars + 2 networks).
- The runtime cost of a supervisor that watches and controls bottles.
A supervisor doesn't have to be heavy. A TUI built into the existing Python CLI, reading docker ps and host-side log files, is closer in spirit to tmux-agent-status than to Mission Control. The trust analysis below is what actually matters.
Proposed design
Three layers, each independently useful, in order of ambition:
1. ./cli.py status — read-only inventory
Reads docker ps filtered by a bottle label and tails each bottle's session log. Reports per bottle: name, agent, uptime, last-activity timestamp, token spend if available, associated PR/branch if recorded.
No new daemons. No new ports. No new credentials. ~100 lines.
2. ./cli.py watch — TUI over the same data
Same data as status, rendered with auto-refresh and keyboard shortcuts that shell out to the existing cli.py attach / stop / start commands.
Library choice: prefer the stdlib curses module to stay stdlib-first; fall back to rich or textual only if the curses path proves painful. Both rich and textual are single-purpose, pure-Python deps with no transitive bloat, but they are still new deps and per the project conventions warrant a deliberate decision.
This is the Claude Squad / tmux-agent-status pattern, applied to bottles instead of tmux sessions. The whole category exists because a TUI is the lightweight shape that doesn't require what the SPA tier requires.
3. ./cli.py supervise — PR feedback router
The optional, more ambitious layer. The bottle manifest gains an optional field:
pr_watch:
upstream: gitea.dideric.is/didericis/myproject
branch: agent/task-42
./cli.py supervise polls the named upstream for new review comments and CI failures on branch. When one fires, it surfaces as a desktop notification or a flash in the TUI. The human decides what to do with the feedback — there is no autonomous loop that feeds the comment back into a bottle's next prompt (see "Where to be conservative" for why).
The polling token is a host token (the same GH_PAT / Gitea token the host already keeps in shell env), not a bottle credential. The supervisor never holds bottle secrets.
Why this doesn't break the trust model
The load-bearing question is whether the supervisor introduces the privileged-channel-into-every-bottle problem that disqualifies the SPA tier. It does not, for four reasons:
| Concern | Mitigation |
|---|---|
| Reaching into running bottles | Supervisor reads docker ps and host-side log files. The host already sees both — Docker is the trust boundary, the supervisor is on the host side of it. |
| Holding bottle credentials | The polling token is a host token. The supervisor never receives bottle.cred_proxy.routes entries; it has no path to them. |
| Bridging between bottles | The supervisor does not relay state from bottle A to bottle B. It relays upstream PR state to a bottle's next prompt — and only if the manifest opts in. |
| New attack surface | All "control" actions go through ./cli.py start <agent>, which already enforces the manifest. The supervisor is an automated caller of the existing CLI, not a parallel control plane. |
The boundary stays at the bottle wall. The supervisor looks outward at git/PR state and downward at Docker; it does not look inward through pipelock.
This also doesn't conflict with the "lean on git history for auditing" non-goal. The supervisor is using git/PR state as the input to its loop, not constructing a separate audit log. Git history remains the source of truth for what happened.
Where to be conservative
A few design defaults worth holding:
- No auto-respawn. The supervisor surfaces PR feedback to a human, never to the bottle's next prompt. The autonomous flow (review-comment → tear down → relaunch with the comment prepended) was considered and rejected: in a public-ish repo, any commenter could inject content that the next launch would treat as system instructions, with the agent's full bottle privileges. Available mitigations — commenter allowlists, prompt-injection regex screens, private-repo defaults — are all soft. The load-bearing defense is to keep the human between the review comment and any agent prompt. Notify-only is the only mode.
- Idle detection is harder than it looks. Last-log-line-age works ~80% of the time. Codeman's Ralph Loop tracker (watching for
<promise>tags) is more accurate but adds complexity and tooling-coupling. Start with the dumb version; add heuristics only when actual confusion arises. - No web UI. A browser UI reintroduces the privileged-channel problem — the browser talks to a server that talks to all bottles. TUI sidesteps it because the supervisor runs in the user's own shell context, not as a long-running daemon serving multiple consumers.
- State file in
~/.bot-bottle/, not inside any bottle. The mapping of bottle → PR → status lives next to the manifest. Nothing about the supervisor's bookkeeping enters a bottle. - No new credentials on bottles. PR-watch is a host-side concern. A bottle's manifest names the upstream/branch to watch; it does not grant the bottle the ability to read PR state itself.
Trust-model edge cases worth flagging
- Cross-host supervisor. If the supervisor ever grows to coordinate bottles on multiple hosts, the trust analysis changes — the polling token now has to travel, and the "host" boundary is no longer one machine. Out of scope for v1; flag in any future design doc that contemplates it.
- Native Claude Code OpenTelemetry as an alternative observability path. Setting
CLAUDE_CODE_ENABLE_TELEMETRY=1inside a bottle would emit OTel data — but the bottle's pipelock allowlist must then include the OTel collector's host, and the trace data is generated by the (untrusted) agent rather than by the (trusted) host-side perimeter. The supervisor-via-Docker-plus-pipelock-logs path is the cleaner observability spine for this project's threat model; in-bottle OTel is an extra signal worth weighing case-by-case, not the default.
Scope estimate
The full status / watch / supervise trio is plausibly ~500 lines of Python on top of the existing CLI, no new runtimes, no new daemons, no new ports, and (with curses) no new deps. That fits "Low dependencies by default. The project is Python, stdlib-first; ask before adding new tools, runtimes, or package managers" without requiring an exception.
Phased: status first (purely additive, no design decisions), then watch (the design decisions are mostly UX, not architecture), then supervise (the only layer that introduces a new behavioral default and warrants a PRD of its own).
Conclusion
A supervisor that respects the bottle wall is a small natural extension of what bot-bottle already is, not a category shift toward Mission Control / Codeman / Composio AO. The mistake in earlier framing was treating "supervisor" as synonymous with "dashboard SPA." The trust-model question that disqualifies the SPA tier (privileged channel into every bottle) does not apply to a TUI that reads host-side signals and shells out to the existing CLI.
Recommendation: build status and watch opportunistically when the pain is felt; treat supervise as a separate PRD before implementation, scoped to notify-only (no autonomous loop from review comment to next agent prompt — see "Where to be conservative").