From 4cce5350089349f0f2293a676304da7ea4231b95 Mon Sep 17 00:00:00 2001
From: didericis <eric@dideric.is>
Date: Mon, 25 May 2026 00:50:41 -0400
Subject: [PATCH] docs(research): drop auto-respawn from the supervisor design
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The autonomous "review comment → respawn bottle with comment as
next prompt" loop is the one feature that opens a prompt-injection
vector the bottle wall can't close (a public commenter would get
to issue instructions inside the agent's perimeter on every
launch). The available mitigations — commenter allowlists,
prompt-injection regex screens, private-repo defaults — are all
soft. The durable defense is to keep the human between the
review comment and any next agent prompt.

So `supervise` is now strictly notify-only. The `auto_respawn`
manifest field, the "with auto_respawn: true" behavior paragraph,
and the matching trust-model edge case all go. The reasoning
stays in the "Where to be conservative" bullet so the decision
isn't re-litigated later.
---
 docs/research/built-in-supervisor-design.md | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/docs/research/built-in-supervisor-design.md b/docs/research/built-in-supervisor-design.md
index 9d3313d..724ef41 100644
--- a/docs/research/built-in-supervisor-design.md
+++ b/docs/research/built-in-supervisor-design.md
@@ -47,10 +47,9 @@ The optional, more ambitious layer. The bottle manifest gains an optional field:
 pr_watch:
   upstream: gitea.dideric.is/didericis/myproject
   branch: agent/task-42
-auto_respawn: false   # opt-in; default notify-only
 ```
 
-`./cli.py supervise` polls the named upstream for new review comments and CI failures on `branch`. When one fires, default behavior is a desktop notification (or a flash in the TUI). With `auto_respawn: true`, the supervisor tears down the bottle and re-runs it with the feedback prepended to the next prompt.
+`./cli.py supervise` polls the named upstream for new review comments and CI failures on `branch`. When one fires, it surfaces as a desktop notification or a flash in the TUI. The human decides what to do with the feedback — there is no autonomous loop that feeds the comment back into a bottle's next prompt (see "Where to be conservative" for why).
 
 The polling token is a **host** token (the same `GH_PAT` / Gitea token the host already keeps in shell env), not a bottle credential. The supervisor never holds bottle secrets.
 
@@ -73,7 +72,7 @@ This also doesn't conflict with the "lean on git history for auditing" non-goal.
 
 A few design defaults worth holding:
 
-- **Default to notify, not auto-respawn.** Review-comment → re-run is a fully autonomous flow. Make it opt-in per bottle. The notify-only mode is safe enough to be the default; the auto-respawn mode crosses into agent-autonomy territory that deserves an explicit decision per agent.
+- **No auto-respawn.** The supervisor surfaces PR feedback to a human, never to the bottle's next prompt. The autonomous flow (review-comment → tear down → relaunch with the comment prepended) was considered and rejected: in a public-ish repo, any commenter could inject content that the next launch would treat as system instructions, with the agent's full bottle privileges. Available mitigations — commenter allowlists, prompt-injection regex screens, private-repo defaults — are all soft. The load-bearing defense is to keep the human between the review comment and any agent prompt. Notify-only is the only mode.
 - **Idle detection is harder than it looks.** Last-log-line-age works ~80% of the time. Codeman's Ralph Loop tracker (watching for `<promise>` tags) is more accurate but adds complexity and tooling-coupling. Start with the dumb version; add heuristics only when actual confusion arises.
 - **No web UI.** A browser UI reintroduces the privileged-channel problem — the browser talks to a server that talks to all bottles. TUI sidesteps it because the supervisor runs in the user's own shell context, not as a long-running daemon serving multiple consumers.
 - **State file in `~/.claude-bottle/`, not inside any bottle.** The mapping of bottle → PR → status lives next to the manifest. Nothing about the supervisor's bookkeeping enters a bottle.
@@ -82,7 +81,6 @@ A few design defaults worth holding:
 ## Trust-model edge cases worth flagging
 
 - **Cross-host supervisor.** If the supervisor ever grows to coordinate bottles on multiple hosts, the trust analysis changes — the polling token now has to travel, and the "host" boundary is no longer one machine. Out of scope for v1; flag in any future design doc that contemplates it.
-- **Auto-respawn with attacker-controlled review content.** If `auto_respawn: true` is on and the upstream is one where untrusted parties can post review comments, those comments become a prompt-injection vector. Mitigations: limit auto-respawn to private repos by default; document the threat; consider a per-route allowlist of trusted commenters.
 - **Native Claude Code OpenTelemetry as an alternative observability path.** Setting `CLAUDE_CODE_ENABLE_TELEMETRY=1` inside a bottle would emit OTel data — but the bottle's pipelock allowlist must then include the OTel collector's host, and the trace data is generated by the (untrusted) agent rather than by the (trusted) host-side perimeter. The supervisor-via-Docker-plus-pipelock-logs path is the cleaner observability spine for this project's threat model; in-bottle OTel is an extra signal worth weighing case-by-case, not the default.
 
 ## Scope estimate
@@ -95,4 +93,4 @@ Phased: `status` first (purely additive, no design decisions), then `watch` (the
 
 A supervisor that respects the bottle wall is a small natural extension of what claude-bottle already is, not a category shift toward Mission Control / Codeman / Composio AO. The mistake in earlier framing was treating "supervisor" as synonymous with "dashboard SPA." The trust-model question that disqualifies the SPA tier (privileged channel into every bottle) does not apply to a TUI that reads host-side signals and shells out to the existing CLI.
 
-Recommendation: build `status` and `watch` opportunistically when the pain is felt; treat `supervise` as a separate PRD before implementation, with `auto_respawn` defaulting to off.
+Recommendation: build `status` and `watch` opportunistically when the pain is felt; treat `supervise` as a separate PRD before implementation, scoped to notify-only (no autonomous loop from review comment to next agent prompt — see "Where to be conservative").