Commit Graph

698 Commits

Author SHA1 Message Date
didericis 27b05f9452 Merge pull request 'PRD 0016: capability block remediation' (#22) from prd-0016-capability-block into main
test / unit (push) Successful in 18s
test / integration (push) Successful in 1m34s
2026-05-25 06:14:39 -04:00
didericis 4032e04a9c feat(bottle): random-suffix identity + cli.py resume <identity>
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m30s
Replaces the cwd-hash identity with a random 5-char base36 suffix
per launch, so two simultaneous `start <agent>` invocations against
the same cwd no longer collide on container names. Each launch is
its own bottle.

State carries metadata: every prepare step writes
~/.claude-bottle/state/<identity>/metadata.json with the
(agent_name, cwd, copy_cwd, started_at) the bottle was launched
with. The new `cli.py resume <identity>` reads this metadata and
re-launches a bottle pinned to the same identity — picking up the
per-bottle Dockerfile (from a prior capability-block apply) and
the transcript snapshot under the same state dir.

- bottle_state.py: bottle_identity(agent_name) drops the cwd param
  and gains a random suffix; BottleMetadata dataclass +
  read/write/metadata_path helpers.
- BottleSpec gains an optional identity field — resume sets it to
  pin the identity; start leaves it empty so prepare mints fresh.
- prepare.py: writes metadata at launch time; uses spec.identity if
  provided (resume) else bottle_identity(agent_name) (fresh start).
- start.py: extracted _launch_bottle from cmd_start so resume can
  share the launch core; prints `./cli.py resume <identity>` hint
  at session end.
- cli/resume.py (new): reads metadata, reconstructs BottleSpec
  with the recorded identity + cwd, delegates to _launch_bottle.
  Errors clearly when no state exists for the given identity.
- cli/__init__.py: registers `resume` in COMMANDS + usage.
- dashboard.py: capability-block approval status line now appends
  the `resume <identity>` hint so the operator can copy-paste the
  rebuild command without leaving the TUI.

Closes the rebuild loop in PRD 0016: agent calls capability-block →
operator approves → bottle torn down with state preserved → status
line shows resume command → operator runs it → replacement bottle
boots with the new Dockerfile and prior transcript.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 06:09:45 -04:00
didericis e996f72532 fix(bottle): identity-key all per-bottle resources by (agent, cwd)
test / unit (pull_request) Successful in 16s
test / integration (pull_request) Successful in 1m30s
The single point that computed `slug = slugify(agent_name)` in
prepare.py is now `slug = bottle_identity(agent_name, cwd)`. With
--cwd the identity has a sha256(resolved-cwd)[:12] suffix, so the
same agent against different projects gets distinct container
names, network names, queue dir, audit log paths, and per-bottle
state (Dockerfile + transcript). Without --cwd the identity is
just slugify(agent_name), unchanged from before — no-cwd bottles
look the same as today.

The downstream `slug` field on DockerBottlePlan keeps its name —
every module already threads it under "slug" and the value flowing
through is now the bottle's full identity. A comment in prepare.py
flags the change.

Fixes the bug surfaced in PR #22 review: running the same agent
against project-A's cwd then project-B's would silently share
project-A's per-bottle Dockerfile + transcript snapshot, container
name (forcing serialized runs), and queue/audit history.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 05:46:26 -04:00
didericis ac8f14ae6f test(capability): integration test for apply_capability_change (PRD 0016)
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m30s
Phase 4 of PRD 0016. End-to-end test against real Docker:

- Stages a fake bottle: alpine:latest container named
  claude-bottle-<slug> with a marker file at
  /home/node/.claude/sessions.json, plus a fake supervise sidecar.
- Calls apply_capability_change with a new Dockerfile.
- Verifies: per-bottle Dockerfile written, agent + sidecars
  removed, networks removed, transcript snapshot dir on host
  contains the marker file (proving docker cp transferred bytes).
- Subsequent-apply test proves the per-bottle Dockerfile state
  persists across rebuilds (before-diff uses the prior override,
  not the repo Dockerfile).
- Teardown-idempotent test: apply against a never-started bottle
  doesn't raise.

docker exec / cp / rm / network rm work fine across the docker
socket boundary, so this runs in DinD too — no act_runner skip
needed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 05:30:04 -04:00
didericis d9c47d0fbe feat(dashboard): wire capability-block approval to real apply (PRD 0016)
Phase 3 of PRD 0016. dashboard.approve() now dispatches to
apply_capability_change when the proposal is a capability-block:

  cred-proxy-block → apply_routes_change
  pipelock-block   → apply_allowlist_change
  capability-block → apply_capability_change   (new in PRD 0016)

CapabilityApplyError joins the ApplyError tuple, so the TUI's key
handlers catch it the same way and surface failures in the status
line.

After a successful capability-block apply, dashboard archives the
proposal+response itself — the supervise sidecar was torn down by
apply_capability_change and can't archive its own queue file.
Without this, dashboard.discover_pending would keep surfacing the
resolved proposal forever.

No audit log for capability-block per PRD 0013 — its record lives
in the per-bottle Dockerfile state + transcript snapshot.

Tests stub apply_capability_change at the dashboard module level,
add TestCapabilityApplyWiring (call wiring, failure-keeps-pending,
no-audit invariant, archive-after-apply), and update TestApproveReject
to stub the capability path too so it stays docker-independent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 05:28:35 -04:00
didericis 0899a898e0 feat(capability): host-side apply_capability_change orchestrator (PRD 0016)
Phase 2 of PRD 0016. New module
claude_bottle/backend/docker/capability_apply.py:

- apply_capability_change(slug, new_dockerfile): snapshot transcript
  → push working tree → write per-bottle Dockerfile → teardown.
  Returns (before, after) for the dashboard's audit/diff render.
- fetch_current_dockerfile(slug): per-bottle Dockerfile if set,
  else the repo's Dockerfile.
- Internal helpers _snapshot_transcript, _push_working_tree are
  best-effort (log + return on failure); _teardown_bottle is
  idempotent (force-rm + network rm silently ignore missing names).

Fire-and-forget from the agent's perspective: by the time the
dashboard writes the response file the supervise sidecar is already
gone (it was torn down), so the agent's tool call connection drops
without receiving the response. The replacement agent (next manual
`cli.py start <agent>`) sees the new per-bottle Dockerfile and the
transcript snapshot for resume. v1 does not auto-relaunch.

Tests cover sequencing (snapshot → push → teardown order), the
per-bottle vs repo Dockerfile fallback chain, empty-input rejection,
and the per-bottle-Dockerfile write. The docker exec / cp / rm
plumbing is covered by the Phase 4 integration test.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 05:26:38 -04:00
didericis 02811e0417 feat(bottle): per-bottle Dockerfile state + image build hook (PRD 0016)
Phase 1 of PRD 0016. Lays the per-bottle state plumbing that
capability-block remediation will write into:

- claude_bottle/backend/docker/bottle_state.py: bottle_state_dir,
  per_bottle_dockerfile (read), write_per_bottle_dockerfile,
  per_bottle_image_tag (unique per slug), transcript_snapshot_dir.
  Stores under ~/.claude-bottle/state/<slug>/.
- prepare.py: when a per-bottle Dockerfile exists, use
  per_bottle_image_tag(slug) as the base image and pass the
  per-bottle Dockerfile path through DockerBottlePlan.dockerfile_path.
  --cwd still layers a derived image on top.
- launch.py: passes plan.dockerfile_path to build_image so the
  per-bottle Dockerfile is what docker build reads.
- DockerBottlePlan gains dockerfile_path field; print() surfaces it
  in the preflight summary so the operator can see at-a-glance that
  this bottle is running on a rebuilt image.

Phase 2 will write to write_per_bottle_dockerfile (capability-block
approval); Phase 3 wires it into the dashboard.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 05:23:31 -04:00
didericis de87f21ff8 docs(prd-0016): capability block remediation
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m13s
Adds PRD 0016, the heaviest of the three remediation engines in the
stuck-agent recovery flow (overview in PRD 0012, foundation in PRD
0013). Wires the capability block path: rebuild orchestrator,
state-preservation helper, capability-block end-to-end. On approval
the orchestrator tears down the bottle, builds from the new
Dockerfile, and starts a replacement on the same branch via
state-preservation.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 05:15:32 -04:00
didericis b5d6320320 Merge pull request 'PRD 0015: pipelock block remediation' (#21) from prd-0015-pipelock-block into main
test / unit (push) Successful in 17s
test / integration (push) Successful in 1m9s
2026-05-25 05:15:16 -04:00
didericis 4fada1651b test(pipelock): integration test for apply_allowlist_change (PRD 0015)
test / unit (pull_request) Successful in 16s
test / integration (pull_request) Successful in 1m8s
Phase 4 of PRD 0015. End-to-end test against real Docker:

- Brings up a real pipelock sidecar via the production
  DockerPipelockProxy bring-up + pipelock_tls_init.
- Calls apply_allowlist_change to add a new host.
- Polls the live /etc/pipelock.yaml until the new host shows up
  (bridging the docker-restart window).
- Verifies api_allowlist contains both old + new hosts and
  tls_interception block is preserved.
- Smaller cases: invalid hostname raises, missing sidecar raises,
  fetch_current_allowlist returns one-per-line format.

Skipped under GITEA_ACTIONS because pipelock_tls_init bind-mounts a
host path that doesn't share fs in the runner, matching the
existing pipelock smoke test's skip pattern.

Drive-by fix: fetch_current_yaml now uses `docker cp` (daemon-API
tarball copy) instead of `docker exec cat` because the pipelock
image is distroless and has no shell utilities.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 05:07:26 -04:00
didericis 1d58d62c47 feat(dashboard): pipelock edit TUI verb (PRD 0015)
Phase 3 of PRD 0015. Adds the proactive `pipelock edit` path,
mirroring routes edit from PRD 0014:

- discover_pipelock_slugs() lists running pipelock sidecars.
- operator_edit_allowlist(slug, new) wraps apply_allowlist_change
  and writes an audit entry tagged ACTION_OPERATOR_EDIT.
- New 'p' keybinding in the main TUI: discover slugs, prompt if
  multiple, fetch current allowlist, open in $EDITOR, apply on
  save.
- Extracts shared scaffolding into _operator_edit_flow used by
  both routes-edit and pipelock-edit — DRY without sacrificing
  the per-verb status-line copy.
- Footer updated.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 05:03:20 -04:00
didericis 5a6c4be342 feat(dashboard): wire pipelock-block approval to real apply (PRD 0015)
Phase 2 of PRD 0015. dashboard.approve() now dispatches on the
proposal's tool:

  cred-proxy-block → apply_routes_change   (from PRD 0014)
  pipelock-block   → apply_allowlist_change (new in PRD 0015)
  capability-block → no-op (lands in PRD 0016)

PipelockApplyError joins CredProxyApplyError under the ApplyError
tuple the TUI catches: failures keep the proposal pending and the
status line surfaces the message; no response is written and no
audit entry is appended.

Tests: existing TestApproveReject stubs both apply paths; new
TestPipelockApplyWiring covers the call wiring, failure-propagation,
and real-diff-in-audit invariants.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 05:01:18 -04:00
didericis c05457fbef feat(pipelock): host-side apply_allowlist_change helper (PRD 0015)
Phase 1 of PRD 0015. New module
claude_bottle/backend/docker/pipelock_apply.py:

- fetch_current_yaml(slug): docker exec cat of the live
  /etc/pipelock.yaml.
- fetch_current_allowlist(slug): parses the yaml, extracts
  api_allowlist, renders as one-per-line for the operator/agent.
- parse_allowlist_content / render_allowlist_content: one-per-line
  with `#` comments + blank-line tolerance, conservative hostname
  validation.
- apply_allowlist_change(slug, new): parses new hosts, fetches +
  parses current yaml, swaps api_allowlist, re-renders via
  pipelock_render_yaml, docker cp into sidecar, docker restart.
  Returns (before, after) as one-per-line strings for the audit diff.
- PipelockApplyError: caller surfaces to operator without crashing
  the dashboard.

v1 uses restart, not SIGHUP — pipelock has no in-process reload
hook; adding one is the PRD's open question. Restart drops in-flight
outbound calls and the agent retries pick up the restarted proxy.

Yaml roundtrip is covered by tests: parse(render(cfg)) preserves
all fields pipelock_render_yaml emits, including tls_interception
+ passthrough_domains.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 04:59:13 -04:00
didericis 0197599e49 docs(prd-0015): pipelock block remediation
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m12s
Adds PRD 0015, the second remediation engine in the stuck-agent
recovery flow (overview in PRD 0012, foundation in PRD 0013). Wires
the pipelock block path with restart-based reload: supervisor writes
the new allowlist on approval and restarts pipelock, proactive
pipelock edit TUI verb, pipelock audit log filled in. SIGHUP reload
for pipelock is deferred to a follow-up.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 04:54:25 -04:00
didericis 31ceac0436 Merge pull request 'PRD 0014: cred-proxy block remediation' (#20) from prd-0014-cred-proxy-block into main
test / unit (push) Successful in 17s
test / integration (push) Successful in 1m8s
2026-05-25 04:54:05 -04:00
didericis 70f43d8c4f test(cred-proxy): integration test for SIGHUP + apply round-trip (PRD 0014)
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m12s
Phase 5 of PRD 0014. End-to-end test against real Docker:

- Brings up a cred-proxy sidecar with route /a/ → unreachable
  upstream (so 502 = route matched, 404 = no route).
- Calls apply_routes_change to swap to /b/ only.
- Polls until the route table flips: /a/ now 404s, /b/ now 502s.
- Separately verifies fetch_current_routes returns the live file,
  apply with invalid JSON raises, and apply against a non-existent
  sidecar raises.

No fake-upstream container needed: unreachable hostnames give the
502 signal directly. apply_routes_change uses docker exec / cp / kill
(not bind mounts), so this should work in docker-in-docker too —
no DinD skip needed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 04:50:29 -04:00
didericis 81277e9d81 feat(dashboard): routes edit TUI verb for operator-initiated changes (PRD 0014)
Phase 4 of PRD 0014. Adds the proactive routes-edit path that
doesn't require a pending proposal:

- discover_cred_proxy_slugs() lists running cred-proxy sidecars by
  parsing docker ps output. Returns [] when docker is unreachable
  or not installed (no exception escapes).
- operator_edit_routes(slug, new_content) wraps apply_routes_change
  and writes an audit entry tagged ACTION_OPERATOR_EDIT (so a
  future reader can distinguish operator-initiated changes from
  agent-proposal approvals in the log).
- New 'e' keybinding in the main TUI: discover slugs, prompt if
  multiple (or use the only one directly), fetch current routes,
  open in $EDITOR, apply on save. CredProxyApplyError lands in the
  status line; the operator can retry.

Tests cover audit-entry shape, failure path, and docker-missing
recovery for slug discovery.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 04:47:22 -04:00
didericis f3a1b4d667 feat(dashboard): wire cred-proxy-block approval to real apply (PRD 0014)
Phase 3 of PRD 0014. dashboard.approve() now does the real
remediation for cred-proxy-block proposals:

- Calls apply_routes_change(slug, file_to_apply) which fetches the
  current routes.json from the running sidecar, validates the new
  JSON, docker cp's it in, and SIGHUPs the sidecar.
- Audit entry's diff is now the real before→after from the apply
  return — not the empty-string placeholder 0013 wrote.
- On apply failure (CredProxyApplyError): no response file, no
  audit entry. Proposal stays pending so the operator can fix the
  input and retry. The TUI's key handlers catch the exception and
  surface the message in the status line.
- pipelock-block + capability-block remain no-op approvals; their
  remediation lands in PRDs 0015 + 0016 and the audit diff stays
  empty until then.
- reject path unchanged: no apply, audit entry with empty diff.

Tests stub apply_routes_change at the dashboard module level so the
unit suite doesn't need a running sidecar; integration test in
Phase 5 covers the real docker exec/cp/SIGHUP plumbing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 04:44:33 -04:00
didericis f7f1a7d5da feat(cred-proxy): host-side apply_routes_change helper (PRD 0014)
Phase 2 of PRD 0014. New module
claude_bottle/backend/docker/cred_proxy_apply.py:

- fetch_current_routes(slug): docker exec cat of the live
  routes.json from the running cred-proxy sidecar.
- validate_routes_json(content): syntactic check before SIGHUP so
  failures keep the old routes live and surface a clearer error
  than 'reload failed' in the sidecar logs.
- apply_routes_change(slug, new): fetch current → validate new →
  write to temp → docker cp into sidecar → docker kill --signal HUP.
  Returns (before, after) so the caller can render a real audit diff.
- CredProxyApplyError: caller surfaces to operator without crashing
  the dashboard.

docker exec / cp / kill paths are covered by the integration test
in Phase 5; unit tests here cover the validator.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 04:41:18 -04:00
didericis ee60b09816 feat(cred-proxy): SIGHUP reload of routes.json (PRD 0014)
Phase 1 of PRD 0014. Adds the in-sidecar SIGHUP signal handler that
re-reads routes.json + re-resolves tokens from env without dropping
in-flight connections:

- reload_routes(server, path, environ=...) does the atomic swap.
  Returns (ok, message) so the caller can log/surface failures.
  On failure (bad JSON, missing file) the server keeps serving the
  old routes rather than dying — typos shouldn't crash the sidecar.
- install_sighup_handler wires SIGHUP → reload_routes. No-op on
  platforms without SIGHUP (Windows).
- serve() now installs the handler at startup.

Atomicity: Python attribute reassignment is atomic, and the request
handler reads server.routes/tokens once at the top of _proxy() so
an in-flight request keeps the version it captured.

Tests cover successful reload, JSON-parse failure, and missing-file
failure (both verify the old routes survive).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 04:39:54 -04:00
didericis 76a9bd2586 docs(prd-0014): cred-proxy block remediation
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 45s
Adds PRD 0014, the first end-to-end remediation engine in the
stuck-agent recovery flow (overview in PRD 0012, foundation in PRD
0013). Wires the cred-proxy block path: SIGHUP-based hot reload of
routes.json on cred-proxy, supervisor write-on-approval, proactive
routes edit TUI verb, cred-proxy audit log filled in.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 04:37:09 -04:00
didericis 609c1a6448 Merge pull request 'PRD 0013: supervise plane foundation' (#19) from prd-0013-supervise-foundation into main
test / unit (push) Successful in 16s
test / integration (push) Successful in 41s
2026-05-25 04:35:56 -04:00
didericis 92fee89e20 test(supervise): skip queue round-trip test in docker-in-docker (PRD 0013)
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 41s
The integration test test_tools_call_round_trips_through_queue
relies on a host bind-mount to share the queue dir between the
sidecar (writing proposals) and the test process (approving via
dashboard helpers). In the Gitea Actions runner the docker socket
forwards to the outer host's daemon, so bind-mount paths are
resolved against the outer host's fs — not the runner container's.
The sidecar writes its proposal where the test can't see it; the
test times out.

Add a one-shot probe that does docker run -v <tmp>:<container> and
checks both directions of fs visibility. Skip the round-trip test
when the probe fails. tools_list and the orphan-name test are
unaffected — they don't touch the queue.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 04:26:06 -04:00
didericis 9f445d61be test(supervise): docker integration test for the sidecar (PRD 0013)
test / unit (pull_request) Successful in 16s
test / integration (pull_request) Failing after 1m25s
Phase 5 of PRD 0013. End-to-end integration test against real Docker:

- Brings up the supervise sidecar on a per-bottle internal network.
- A curl-image "agent" on the same network does tools/list and gets
  back the three PRD 0013 tool names over real MCP wire format.
- A tools/call round-trips through the queue: agent blocks on the
  call, host watches the queue, dashboard.approve writes a Response,
  agent receives the approval payload (status, notes) in MCP content.
- Documents the orphan-sidecar name-collision behavior so a future
  auto-cleanup change can flip the assertion.

Skips if docker is unreachable, matching the existing integration
pattern.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 04:20:57 -04:00
didericis 0aecb41e33 feat(supervise): TUI dashboard for approve/modify/reject (PRD 0013)
Phase 4 of PRD 0013. Adds `claude-bottle dashboard` subcommand:

- discover_pending() walks ~/.claude-bottle/queue/* and gathers
  pending proposals across all bottles, sorted FIFO by arrival.
- approve / approve-with-final-file / reject helpers write the
  Response file the sidecar polls, and append an AuditEntry for
  cred-proxy and pipelock tools. capability-block proposals don't
  write to an audit log here (PRD 0016 captures via rebuild record).
- Stdlib-curses TUI: list view, detail view, $EDITOR shellout for
  modify-then-approve, inline prompt for reject reason.
- `dashboard --once` dumps pending proposals to stdout without
  bringing up curses — useful for scripted checks and tests.

For 0013 the audit entry's diff field is render_diff("", proposed)
because we don't yet have access to the live on-disk current file;
PRDs 0014 / 0015 fill in real before→after diffs once they own the
host-side config writes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 04:20:57 -04:00
didericis 4b2dbcdefd feat(supervise): Docker lifecycle + bottle integration (PRD 0013)
Phase 3 of PRD 0013. Wires the supervise sidecar into bottle launch:

- Manifest: bottle.supervise (bool, default False). Opt-in for v1 so
  existing bottles are unchanged.
- supervise.py: adds SupervisePlan + abstract Supervise(ABC) with a
  prepare template that stages the per-bottle queue dir on the host
  and the current-config dir under stage_dir (routes.json + allowlist
  + Dockerfile). Stdlib-only so it still runs as the in-container
  shared helper.
- backend/docker/supervise.py: DockerSupervise concrete start/stop.
  No egress network (the sidecar doesn't make outbound calls); just
  the bottle's internal network with network-alias "supervise" and a
  bind-mount of the host queue dir at /run/supervise/queue.
- Prepare wires supervise.prepare into the DockerBottlePlan, derives
  routes_content from cred_proxy_plan, allowlist_content from
  pipelock_effective_allowlist, and dockerfile_content from the
  repo's Dockerfile. supervise sidecar added to the orphan probe.
- Launch starts the supervise sidecar after pipelock + cred-proxy
  but before the agent (so DNS resolution for `supervise` is up on
  the agent's first tool call).
- Agent container gets a read-only bind-mount of the current-config
  dir at /etc/claude-bottle/current-config when supervise is enabled.
- bottle_plan print + to_dict surface the supervise state.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 04:20:57 -04:00
didericis d5ba253878 feat(supervise): MCP sidecar HTTP server + Dockerfile (PRD 0013)
Phase 2 of PRD 0013. Adds the in-container MCP server:

- claude_bottle/supervise_server.py: minimal JSON-RPC over HTTP MCP
  server. Handles initialize / notifications/initialized / tools/list /
  tools/call. Each tools/call validates the proposed file syntactically,
  writes a Proposal to the host-mounted queue, blocks waiting for a
  Response, archives both files, returns the operator's {status, notes}
  wrapped in MCP content.
- Three tool definitions with JSON Schema inputs: cred-proxy-block
  (routes.json), pipelock-block (allowlist), capability-block
  (Dockerfile).
- Dockerfile.supervise mirroring the cred-proxy pattern: same pinned
  python:3.13-alpine, copies supervise.py + supervise_server.py into
  /app, exposes port 9100.

Stdlib-only. Tests cover JSON-RPC parsing, per-tool validation, all
three handlers, the queue round-trip via a background responder
thread, and an end-to-end HTTP sanity check on a random port.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 04:20:57 -04:00
didericis 2e06090464 feat(supervise): host-side queue + audit log primitives (PRD 0013)
Phase 1 of PRD 0013. Adds claude_bottle/supervise.py with:

- Proposal / Response / AuditEntry dataclasses
- Per-bottle queue dir under ~/.claude-bottle/queue/<slug>/
- write/read/list/archive proposal helpers + wait_for_response
- Audit log writer (JSON-Lines under ~/.claude-bottle/audit/)
- Unified-diff rendering + sha256 helper for stale-proposal detection

Stdlib-only; in-container code (Phase 2) and Docker lifecycle
(Phase 3) follow. Tests cover queue, audit, and diff/hash helpers.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 04:20:57 -04:00
didericis 578363bea3 docs(prd-0013): supervise plane foundation
Adds PRD 0013, the shared foundation for the stuck-agent recovery flow
(overview in PRD 0012). Defines the MCP sidecar, the three tool
definitions, the proposal queue, the read-only current-config mount,
the minimal TUI, and the audit log format. Approval handlers are
deliberately no-ops; the actual remediations land in PRDs 0014, 0015,
and 0016.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 04:20:57 -04:00
didericis 4079678ceb docs(prd-0012): split into overview + 4 implementation PRDs
test / unit (push) Successful in 13s
test / integration (push) Successful in 22s
PRD 0012 becomes the cross-cutting overview (stuck categories taxonomy,
sidecar-vs-in-container rationale, implementation chunk pointers).
Implementation detail moves into four follow-on PRDs that 0012
references: 0013 (supervise plane foundation), 0014 (cred-proxy block
remediation), 0015 (pipelock block remediation), 0016 (capability
block remediation).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 04:19:50 -04:00
didericis 58acdcac87 docs(prd-0012): explain why the MCP server is a sidecar, not in-container
Captures the rationale for placing the MCP server outside the agent
container. The bottle wall doesn't strictly require it (the operator
TUI is the actual gate), but pattern consistency, audit metadata
trust, connection lifecycle, future enforcement headroom, and
pipelock cleanliness all argue for sidecar placement.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 04:19:50 -04:00
didericis 6e4bb3ba8d docs(prd-0012): switch /stuck to three structured MCP tool calls
Replaces the text-only /supervise/notify protocol with three MCP tools
the agent calls directly: cred-proxy-block, pipelock-block, and
capability-block. Each tool carries the agent's proposed config file
(routes.json, pipelock allowlist, or Dockerfile) plus a justification.
Adds a new MCP sidecar, a read-only current-config mount in the agent
container, and renames "capability gap" to "capability block" to match
the tool name. The text-only-vs-structured tradeoff is captured as an
Open question with pros/cons on both sides.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 04:19:50 -04:00
didericis 66fc29c72e docs(prd-0012): name the three stuck categories and add pipelock path
Introduces cred-proxy block, pipelock block, and capability gap as the
three named categories of stuck. Adds pipelock-edit support (restart-
based for v1) parallel to the existing cred-proxy routes-edit path,
plus a pipelock audit log. Broadens Goals to cover all three paths.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 04:19:50 -04:00
didericis a6222aaa57 docs(prd-0012): adopt text-only notify protocol + SIGHUP routes reload
Rewrites Scope, Proposed Design, Data model, and Open questions to
match the model where /supervise/notify is text-in/text-out, routes
edits + SIGHUP reload are supervisor-side tooling, and manifest
rebuilds are the heavy path. Adds the per-bottle routes-edit audit log.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 04:19:50 -04:00
didericis 4cce535008 docs(research): drop auto-respawn from the supervisor design
The autonomous "review comment → respawn bottle with comment as
next prompt" loop is the one feature that opens a prompt-injection
vector the bottle wall can't close (a public commenter would get
to issue instructions inside the agent's perimeter on every
launch). The available mitigations — commenter allowlists,
prompt-injection regex screens, private-repo defaults — are all
soft. The durable defense is to keep the human between the
review comment and any next agent prompt.

So `supervise` is now strictly notify-only. The `auto_respawn`
manifest field, the "with auto_respawn: true" behavior paragraph,
and the matching trust-model edge case all go. The reasoning
stays in the "Where to be conservative" bullet so the decision
isn't re-litigated later.
2026-05-25 04:19:50 -04:00
didericis afbb77b040 docs(research): built-in supervisor design (TUI + PR feedback) 2026-05-25 04:19:50 -04:00
didericis 1f9722ae27 docs(research): add Betterleaks switching analysis
test / unit (pull_request) Successful in 13s
test / integration (pull_request) Successful in 28s
2026-05-24 23:59:42 -04:00
didericis c33930290f docs(research): survey gitleaks dashboards + add baseline-file primitive
test / unit (pull_request) Successful in 13s
test / integration (pull_request) Successful in 24s
2026-05-24 23:54:46 -04:00
didericis a74dd2b97f docs: research on git-gate commit approval; link from PRD 0012
test / unit (pull_request) Successful in 12s
test / integration (pull_request) Successful in 22s
2026-05-24 23:39:17 -04:00
didericis 83756fa8c9 docs(prd-0012): open question for gitlock/pipelock exception flow
test / unit (pull_request) Successful in 12s
test / integration (pull_request) Successful in 22s
2026-05-24 23:12:55 -04:00
didericis b4c9e149b0 docs: add PRD 0012 — stuck-agent recovery flow
test / unit (pull_request) Successful in 12s
test / integration (pull_request) Successful in 22s
2026-05-24 23:10:30 -04:00
didericis b0581e60d7 Merge pull request 'PRD 0011: Per-file Markdown manifest' (#17) from md-manifest into main
test / unit (push) Successful in 12s
test / integration (push) Successful in 22s
2026-05-24 22:43:44 -04:00
didericis 958a8845a6 docs: rewrite README manifest section + ship MD examples (PRD 0011)
test / unit (pull_request) Successful in 12s
test / integration (pull_request) Successful in 22s
The "Manifest" section now describes the per-file MD layout under
~/.claude-bottle/{bottles,agents}/, the filename-as-key convention,
the YAML subset constraints, and the trust boundary (bottles are
home-only by filesystem layout). Includes a working bottle example
with comments inside the frontmatter and a working agent example
showing the Markdown body as the system prompt.

Drops claude-bottle.example.json. The new examples/ tree —
examples/bottles/dev.md, examples/agents/implementer.md,
examples/agents/researcher.md — verifies the parser end-to-end via
Manifest.from_md_dirs(examples/, None).
2026-05-24 22:19:44 -04:00
didericis 6ba5f9a9d3 feat(manifest): per-file MD directory loader (PRD 0011)
test / unit (pull_request) Successful in 13s
test / integration (pull_request) Successful in 22s
Manifest.resolve walks $HOME/.claude-bottle/{bottles,agents}/ and
$CWD/.claude-bottle/agents/ instead of reading claude-bottle.json.
A bottles/ subdir under $CWD is logged as a warn and ignored —
the filesystem layout IS the trust boundary, no resolver check
needed.

If claude-bottle.json exists alongside no .claude-bottle/ dir at
either location, dies with a clear pointer at the README — the
manifest format changed and we don't silently fall back.

Manifest.from_md_dirs(home, cwd) is the programmatic entry point
tests use to build a Manifest from fixture directories without
touching os.environ. Manifest.from_json_obj is preserved for
tests that still want to build manifests in-memory.

Bottle / agent frontmatter goes through Bottle.from_dict /
Agent.from_dict — same validators as today's JSON path. Unknown
top-level frontmatter keys die with a "did you mean" pointer
listing accepted keys. Filenames that don't match [a-z][a-z0-9-]*
are skipped with a warn.

Agent files accept the Claude Code subagent passthrough fields
(name, description, model, color, memory) so the same file can
drop into ~/.claude/agents/ — claude-bottle ignores them at
launch but doesn't reject.

The dry-run integration test ships a real MD fixture tree now;
all 200 unit + 17 integration tests stay green.
2026-05-24 22:15:02 -04:00
didericis 8c1e4d0220 feat(yaml_subset): hand-rolled YAML-subset + frontmatter parser
test / unit (pull_request) Successful in 12s
test / integration (pull_request) Successful in 25s
claude_bottle/yaml_subset.py — stdlib-only, ~450 lines. Parses the
bounded shape claude-bottle's manifest files use:

  - Block mappings (top-level + nested via indentation)
  - Block lists (under a key, items can be scalars or block-style
    mappings whose keys align with the rest after the dash)
  - Inline lists `[a, b]` and inline dicts `{a: 1}` for one-level
    leaves
  - Quoted (single + double) and bare strings
  - Scalars: string, int, true/false, null/~

Rejects, each with a clear pointer at the line number:

  - `yes`/`no`/`on`/`off`/`Y`/`N`/`TRUE`/`FALSE` — only literal
    `true` / `false` are bools (the Norway problem stays solved by
    "quote your strings if they look like bools")
  - Bare strings that look like dates / octals / hex / floats
  - Anchors (`&`/`*`), aliases, YAML tags (`!!str`)
  - Multi-line block scalars (`|`, `>`)
  - Tabs in indentation
  - Nested flow style (only one level allowed)

Public API:

  parse_yaml_subset(text) -> dict[str, object]
    Top level must be a mapping.

  parse_frontmatter(text) -> (dict, body_text)
    Strips `---` delimiters, parses content as YAML subset, returns
    the verbatim body text after the closing fence.

46 unit tests covering every construct the real manifest files use
(the cred_proxy.routes structure, role-as-inline-list, nested
ExtraHosts dicts) plus every rejection case listed in PRD 0011.
2026-05-24 21:59:34 -04:00
didericis afa8ca67a4 docs(prd-0011): drop the migration command requirement
test / unit (pull_request) Successful in 13s
test / integration (pull_request) Successful in 23s
claude-bottle has a single primary user today; an automated
JSON → MD migration tool is overkill. Hand-rewriting one file
is the migration cost. The resolver still dies with a pointer
at the README's manifest section if a stale claude-bottle.json
is found alongside no .claude-bottle/ directory, so the breaking
change isn't silent.

Drops: SC #6 (migration tool), the "Migration command" In Scope
sub-bullet, the migrate_manifest.py / cli wiring entries from
Existing code touched, the tests/integration/test_migrate_manifest.py
entry from Tests, the destructive-vs-additive open question.
Renumbers the remaining success criteria 6, 7 (formerly 7, 8).
Backward-compat section rewritten around hand-rewrite.
2026-05-24 21:46:22 -04:00
didericis 894bdea288 docs: add PRD 0011 — per-file Markdown manifest
test / unit (pull_request) Successful in 12s
test / integration (pull_request) Successful in 22s
Specs the implementation chosen in the PR #16 closing comment:
per-file MD-with-YAML-frontmatter layout for both bottles and
agents, with a hand-rolled YAML subset parser (no PyYAML).

Layout:
- $HOME/.claude-bottle/bottles/<name>.md   (home-only)
- $HOME/.claude-bottle/agents/<name>.md    (home agents)
- $CWD/.claude-bottle/agents/<name>.md     (repo-supplied agents)

The trust boundary that PRD-0011-v1 (closed PR #15) tried to
enforce in the resolver now falls out of filesystem layout —
$CWD/.claude-bottle/ has no bottles/ subdir, the loader doesn't
look there. Filesystem layout IS the enforcement.

Eight success criteria, including: stdlib-only (no new runtime
dep), idempotent migration command, agent files shaped close to
Claude Code's existing subagent spec so the same file can drop
into ~/.claude/agents/.

PRD-only; no implementation in this commit. PRD slot 0011 is
intentionally reused — the v1 file was never merged to main.
2026-05-24 21:39:58 -04:00
didericis b6046df5fb Merge pull request 'Research: manifest format + grouping options' (#16) from manifest-format-research into main
test / unit (push) Successful in 13s
test / integration (push) Successful in 21s
2026-05-24 21:31:45 -04:00
didericis da969a503d docs(research): manifest format + grouping options
test / unit (pull_request) Successful in 12s
test / integration (pull_request) Successful in 25s
Captures the two open questions surfaced by PRD 0011: should bottles and agents stay grouped in one file or split per file, and should the format stay JSON or move to YAML / MD-with-frontmatter.

Recommends per-file MD-with-frontmatter (with agents shaped close to Claude Code's subagent spec so they can drop into ~/.claude/agents/ as a side effect), explicitly flags the PyYAML runtime dependency as a user-decision crossing the project's "low deps by default" line, and leaves several other choices (hidden dotdir vs visible, migration tooling) as open questions.

Companion to docs/prds/0011-cwd-manifest-trust-boundary.md (which solves the trust problem at the resolver layer); this doc explores a structural alternative that would make the boundary self-documenting on disk.
2026-05-24 21:12:43 -04:00
didericis 93aaa29158 Merge pull request 'PRD 0010: Credential proxy for agent-bound API tokens' (#14) from cred-proxy into main
test / unit (push) Successful in 12s
test / integration (push) Successful in 23s
2026-05-24 14:24:51 -04:00