PRD 0013: supervise plane foundation #19

Merged
didericis merged 7 commits from prd-0013-supervise-foundation into main 2026-05-25 04:35:57 -04:00
Owner

Summary

Adds PRD 0013 as the shared foundation for the stuck-agent recovery flow (overview in PRD 0012). Defines the MCP sidecar, the three tool definitions (cred-proxy-block, pipelock-block, capability-block), the host-mounted proposal queue, the read-only current-config mount in the agent container, the minimal TUI dashboard, and the audit log format.

Approval handlers are deliberately no-ops in this PRD — after 0013 the operator can see proposals and approve/reject them but no host-side config change happens. The actual remediations land in PRDs 0014 (cred-proxy block), 0015 (pipelock block), and 0016 (capability block).

0013 is a hard prerequisite for 0014–0016.

## Summary Adds PRD 0013 as the shared foundation for the stuck-agent recovery flow (overview in PRD 0012). Defines the MCP sidecar, the three tool definitions (`cred-proxy-block`, `pipelock-block`, `capability-block`), the host-mounted proposal queue, the read-only current-config mount in the agent container, the minimal TUI dashboard, and the audit log format. Approval handlers are deliberately no-ops in this PRD — after 0013 the operator can see proposals and approve/reject them but no host-side config change happens. The actual remediations land in PRDs 0014 (cred-proxy block), 0015 (pipelock block), and 0016 (capability block). 0013 is a hard prerequisite for 0014–0016.
didericis added 6 commits 2026-05-25 04:21:08 -04:00
Adds PRD 0013, the shared foundation for the stuck-agent recovery flow
(overview in PRD 0012). Defines the MCP sidecar, the three tool
definitions, the proposal queue, the read-only current-config mount,
the minimal TUI, and the audit log format. Approval handlers are
deliberately no-ops; the actual remediations land in PRDs 0014, 0015,
and 0016.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Phase 1 of PRD 0013. Adds claude_bottle/supervise.py with:

- Proposal / Response / AuditEntry dataclasses
- Per-bottle queue dir under ~/.claude-bottle/queue/<slug>/
- write/read/list/archive proposal helpers + wait_for_response
- Audit log writer (JSON-Lines under ~/.claude-bottle/audit/)
- Unified-diff rendering + sha256 helper for stale-proposal detection

Stdlib-only; in-container code (Phase 2) and Docker lifecycle
(Phase 3) follow. Tests cover queue, audit, and diff/hash helpers.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Phase 2 of PRD 0013. Adds the in-container MCP server:

- claude_bottle/supervise_server.py: minimal JSON-RPC over HTTP MCP
  server. Handles initialize / notifications/initialized / tools/list /
  tools/call. Each tools/call validates the proposed file syntactically,
  writes a Proposal to the host-mounted queue, blocks waiting for a
  Response, archives both files, returns the operator's {status, notes}
  wrapped in MCP content.
- Three tool definitions with JSON Schema inputs: cred-proxy-block
  (routes.json), pipelock-block (allowlist), capability-block
  (Dockerfile).
- Dockerfile.supervise mirroring the cred-proxy pattern: same pinned
  python:3.13-alpine, copies supervise.py + supervise_server.py into
  /app, exposes port 9100.

Stdlib-only. Tests cover JSON-RPC parsing, per-tool validation, all
three handlers, the queue round-trip via a background responder
thread, and an end-to-end HTTP sanity check on a random port.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Phase 3 of PRD 0013. Wires the supervise sidecar into bottle launch:

- Manifest: bottle.supervise (bool, default False). Opt-in for v1 so
  existing bottles are unchanged.
- supervise.py: adds SupervisePlan + abstract Supervise(ABC) with a
  prepare template that stages the per-bottle queue dir on the host
  and the current-config dir under stage_dir (routes.json + allowlist
  + Dockerfile). Stdlib-only so it still runs as the in-container
  shared helper.
- backend/docker/supervise.py: DockerSupervise concrete start/stop.
  No egress network (the sidecar doesn't make outbound calls); just
  the bottle's internal network with network-alias "supervise" and a
  bind-mount of the host queue dir at /run/supervise/queue.
- Prepare wires supervise.prepare into the DockerBottlePlan, derives
  routes_content from cred_proxy_plan, allowlist_content from
  pipelock_effective_allowlist, and dockerfile_content from the
  repo's Dockerfile. supervise sidecar added to the orphan probe.
- Launch starts the supervise sidecar after pipelock + cred-proxy
  but before the agent (so DNS resolution for `supervise` is up on
  the agent's first tool call).
- Agent container gets a read-only bind-mount of the current-config
  dir at /etc/claude-bottle/current-config when supervise is enabled.
- bottle_plan print + to_dict surface the supervise state.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Phase 4 of PRD 0013. Adds `claude-bottle dashboard` subcommand:

- discover_pending() walks ~/.claude-bottle/queue/* and gathers
  pending proposals across all bottles, sorted FIFO by arrival.
- approve / approve-with-final-file / reject helpers write the
  Response file the sidecar polls, and append an AuditEntry for
  cred-proxy and pipelock tools. capability-block proposals don't
  write to an audit log here (PRD 0016 captures via rebuild record).
- Stdlib-curses TUI: list view, detail view, $EDITOR shellout for
  modify-then-approve, inline prompt for reject reason.
- `dashboard --once` dumps pending proposals to stdout without
  bringing up curses — useful for scripted checks and tests.

For 0013 the audit entry's diff field is render_diff("", proposed)
because we don't yet have access to the live on-disk current file;
PRDs 0014 / 0015 fill in real before→after diffs once they own the
host-side config writes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
test(supervise): docker integration test for the sidecar (PRD 0013)
test / unit (pull_request) Successful in 16s
test / integration (pull_request) Failing after 1m25s
9f445d61be
Phase 5 of PRD 0013. End-to-end integration test against real Docker:

- Brings up the supervise sidecar on a per-bottle internal network.
- A curl-image "agent" on the same network does tools/list and gets
  back the three PRD 0013 tool names over real MCP wire format.
- A tools/call round-trips through the queue: agent blocks on the
  call, host watches the queue, dashboard.approve writes a Response,
  agent receives the approval payload (status, notes) in MCP content.
- Documents the orphan-sidecar name-collision behavior so a future
  auto-cleanup change can flip the assertion.

Skips if docker is unreachable, matching the existing integration
pattern.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
didericis force-pushed prd-0013-supervise-foundation from 7b1d280088 to 9f445d61be 2026-05-25 04:21:08 -04:00 Compare
didericis added 1 commit 2026-05-25 04:26:10 -04:00
test(supervise): skip queue round-trip test in docker-in-docker (PRD 0013)
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 41s
92fee89e20
The integration test test_tools_call_round_trips_through_queue
relies on a host bind-mount to share the queue dir between the
sidecar (writing proposals) and the test process (approving via
dashboard helpers). In the Gitea Actions runner the docker socket
forwards to the outer host's daemon, so bind-mount paths are
resolved against the outer host's fs — not the runner container's.
The sidecar writes its proposal where the test can't see it; the
test times out.

Add a one-shot probe that does docker run -v <tmp>:<container> and
checks both directions of fs visibility. Skip the round-trip test
when the probe fails. tools_list and the orphan-name test are
unaffected — they don't touch the queue.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
didericis merged commit 609c1a6448 into main 2026-05-25 04:35:57 -04:00
didericis deleted branch prd-0013-supervise-foundation 2026-05-25 04:47:50 -04:00
Sign in to join this conversation.