bot-bottle

Author	SHA1	Message	Date
didericis-codex	18e3b62b72	docs: rename CLAUDE.md to AGENTS.md and rebrand provider-agnostic test / unit (pull_request) Successful in 28s Details test / integration (pull_request) Successful in 40s Details test / unit (push) Successful in 31s Details test / integration (push) Successful in 44s Details Delete CLAUDE.md in favor of AGENTS.md as the orientation doc, rebrand the project from Codex-bottle to provider-agnostic bot-bottle, and repoint every CLAUDE.md reference across PRDs, research notes, the implementer agent example, and the yaml_subset comment. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-28 20:36:47 -04:00
didericis-codex	cdb1870b1c	docs(agent): clarify claude oauth env test / unit (pull_request) Successful in 29s Details test / integration (pull_request) Successful in 43s Details	2026-05-28 18:20:09 -04:00
didericis-codex	cacba087c9	docs(agent): document provider base bottles test / unit (pull_request) Successful in 34s Details test / integration (pull_request) Successful in 53s Details Assisted-by: Codex	2026-05-28 18:00:38 -04:00
didericis-codex	1cbedc91c0	refactor(agent): use agent-neutral runtime names Assisted-by: Codex	2026-05-28 17:59:24 -04:00
didericis-codex	c08b09dc9f	refactor!: rename project to bot-bottle Assisted-by: Codex	2026-05-28 17:56:14 -04:00
didericis-codex	500fd910c4	feat(agent): add provider templates test / unit (pull_request) Successful in 28s Details test / integration (pull_request) Successful in 40s Details Assisted-by: Codex	2026-05-28 02:18:53 -04:00
didericis-codex	e03d90962d	docs(prd): scaffold PRD 0026 — Agent Provider Templates test / unit (pull_request) Successful in 27s Details test / integration (pull_request) Successful in 45s Details Assisted-by: Codex	2026-05-28 02:05:09 -04:00
didericis-codex	59ee32cc8d	refactor(manifest): key git config by host test / unit (pull_request) Successful in 33s Details test / integration (pull_request) Successful in 42s Details	2026-05-28 00:49:34 -04:00
didericis-claude	4f7a506a9e	docs(prd): 0025 — bottle composition via `extends:` (issue #88 ) test / unit (pull_request) Successful in 27s Details test / integration (pull_request) Successful in 40s Details	2026-05-27 23:27:04 -04:00
didericis-claude	7eda2a66ec	feat(smolmachines): patch smolvm state DB to actually enforce per-bottle allowlist test / unit (pull_request) Successful in 26s Details test / integration (pull_request) Successful in 44s Details Earlier commit framed this PR as "infrastructure landed, TSI enforcement blocked on upstream smolvm 0.8.0." Found a clean workaround that lets us enforce now. Smolvm persists each machine's config (including `allowed_cidrs`) as a JSON BLOB in `~/Library/Application Support/smolvm/server/smolvm.db`, `vms.data`. `machine create --allow-cidr X/32` silently writes `allowed_cidrs: null` to that row when combined with `--from`, but smolvm reads the row at `machine start` — so patching the row between create and start sets the allowlist for real. New `loopback_alias.force_allowlist(machine_name, cidrs)` opens the SQLite DB, JSON-decodes the row, sets `allowed_cidrs`, and writes back as BLOB (Text type silently corrupts smolvm's later reads). launch.py calls it immediately after `machine_create` and before `machine_start`. Verified end-to-end on macOS / Docker Desktop: VM allowlist after start: ["127.0.0.16/32"] VM → 127.0.0.1:3000 → BLOCKED (Permission denied) VM → 8.8.8.8:53 → BLOCKED (Permission denied) VM → 127.0.0.16:<bundle> → CONNECTED The DB-patch hack is correct only because smolvm reads `allowed_cidrs` from the row at start time (not derived in- process). When upstream honors `--allow-cidr` with `--from`, the call becomes redundant — drop the call and the workaround is gone. Tests: 4 new for `force_allowlist` (BLOB round-trip; Linux no-op; missing DB; missing row). Total 593 unit tests pass. README + PRD updated to reflect the fix landed (no longer "infrastructure pending upstream"). gitea#75 can close. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 16:55:03 -04:00
didericis-claude	a919268d5e	docs: honest framing of upstream smolvm 0.8.0 allowlist bug test / unit (pull_request) Successful in 26s Details test / integration (pull_request) Successful in 40s Details PR #76 originally claimed the per-bottle alias scoping closed gitea#75 ("agent can reach host loopback"). Verified empirically that's not actually true: `smolvm 0.8.0 machine create --from <smolmachine> --net --allow-cidr X/32` silently drops the allowlist (`agent.config.json` shows `allowed_cidrs: null`, and the running VM reaches all of `127.0.0.0/8` regardless). So the alias-allocation + alias-bind infrastructure is correct pre-work, but the actual TSI enforcement is blocked on an upstream smolvm bug. README + PRD 0023 + the module docstring get reworded to say so plainly. gitea#75 stays open. Workarounds tried (all dead-ends): - `machine update --allow-cidr` doesn't exist - stop-edit-`agent.config.json`-restart fails (smolvm removes the file on stop) - `--smolfile` is mutually exclusive with `--from` - `--image localhost:<port>/...` fails because smolvm's agent process can't reach host loopback during pull When upstream lands a fix, our existing code (alias allocation, port-bind, --allow-cidr in launch) will scope correctly without further changes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 16:37:56 -04:00
didericis-claude	2edc1abb9a	feat(smolmachines): per-bottle loopback alias scopes TSI to single /32 test / unit (pull_request) Successful in 27s Details test / integration (pull_request) Successful in 41s Details PR #74's Docker-Desktop fix routed the agent through `127.0.0.1:<random>` loopback forwards, but TSI filters by IP only — so the allowlist `127.0.0.1/32` let the agent VM reach any host service on macOS loopback (postgres, dev servers, other bottles' published ports, mDNSResponder, ...). Real downgrade vs the docker backend's `--internal` network. Resolution: per-bottle loopback alias. - New `loopback_alias` module manages a pool of `127.0.0.16` .. `127.0.0.31` on `lo0`. macOS only routes `127.0.0.1` by default; the extras need `sudo ifconfig lo0 alias`. `ensure_pool()` lazily adds the missing entries via one sudo prompt on first launch per reboot — aliases persist on `lo0` until reboot, so subsequent launches skip the prompt entirely. - `allocate(slug)` picks the lowest-numbered unused alias by inspecting running bundle containers' port-binding HostIps. No on-disk reservation — docker is the source of truth. - Bundle bringup binds published ports to the allocated alias (`docker run -p <alias>::<port>`) instead of `127.0.0.1`. - TSI allowlist becomes the alias's /32 — narrows reachability to this bottle's bundle only. - Linux native daemons share the host's network namespace; `127.0.0.0/8` works without aliases, so the module no-ops on non-Darwin and returns `127.0.0.1` from `allocate`. Tracking issue closed: gitea/issues/75. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 16:23:17 -04:00
didericis-claude	45c821a8f3	docs(smolmachines): note loopback-scope limitation + tracking issue test / unit (pull_request) Successful in 26s Details test / integration (pull_request) Successful in 43s Details PR #74's Docker-Desktop pivot widened the smolmachines TSI allowlist from `<bundle-ip>/32` to `127.0.0.1/32` (TSI can't filter by port, and docker bridge IPs aren't reachable from macOS networking). The agent VM can therefore reach any service on macOS's loopback while the bottle is running — not just the bundle's published ports. README gets a "Smolmachines backend" subsection under Quickstart spelling this out as a known v1 limitation. PRD 0023 grows a new open question #8 with the proposed v2 fix (per-bottle loopback alias + TSI allowlist scoped to that /32, via sudo `ifconfig lo0 alias`). Tracking issue: gitea.dideric.is/didericis/claude-bottle/issues/75. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 15:58:30 -04:00
didericis-claude	1fa17d1822	feat(smolmachines): build agent image from repo Dockerfile (PRD 0023 chunk 4c) test / unit (pull_request) Successful in 21s Details test / unit (push) Successful in 21s Details test / integration (push) Successful in 42s Details test / integration (pull_request) Successful in 41s Details Replaces the alpine:latest placeholder with a real claude-bottle agent image, converted into a .smolmachine artifact via an ephemeral local OCI registry. Why the registry hop: smolvm pack create only accepts OCI registry refs. Empirically it rejects docker-daemon://, oci-layout://, docker-archive: tarballs, and every other transport tested — the crane backend treats anything with a scheme prefix as a registry hostname. To convert a locally-built docker image into a .smolmachine we have to push it somewhere smolvm can pull from. Smallest path: bring up registry:2.8.3 bound to 127.0.0.1:<random>, docker tag + docker push into it, smolvm pack create --image localhost:<port>/claude-bottle:<id>, tear down the registry. The .smolmachine is cached under ~/.cache/claude-bottle/smolmachines/ keyed by the docker image ID (first 16 hex chars of the sha256), so a Dockerfile change picks up a new image ID and invalidates the cache. Unchanged rebuilds skip the whole build → registry → pack pipeline. This puts `docker build` in smolmachines prepare (the docker backend defers it to launch). Necessary because pack_create needs the image ID to derive the cache key, and prepare is the only hook ahead of launch that runs once per slug. Adds: - claude_bottle/backend/docker/util.py: image_id / tag / push helpers (thin docker CLI wrappers). - claude_bottle/backend/smolmachines/local_registry.py: ephemeral_registry() context manager; pins registry:2.8.3 by digest, binds 127.0.0.1::5000 (loopback-only), force-removes on exit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 13:51:02 -04:00
didericis-claude	5929caa219	docs(prd-0023): pivot to smolvm + TSI single-IP allowlist test / unit (pull_request) Successful in 22s Details test / integration (pull_request) Successful in 43s Details Chunk-1's empirical spike against smolvm 0.8.0 contradicted the research note that motivated the gvproxy network design: smolvm exposes no virtio-net-over-unixgram attachment. The first draft's "why gvproxy, not TSI" argument turns out to apply only to `--outbound-localhost-only`, not to TSI generally. New design: - Bundle (PRD 0024) runs on a dedicated per-bottle docker bridge with a pinned IP. Smolfile sets `[network] allow_cidrs = ["<bundle-ip>/32"]` and nothing else. Agent can reach the bundle and nothing else — host loopback, LAN, public internet directly are all refused at the VMM (TSI) layer. - Bind-address mitigation: egress binds 127.0.0.1:9099 inside the bundle (pipelock-internal); pipelock / git-gate / supervise bind 0.0.0.0 so the agent (across the TSI allowlist) can reach them. This is the port-granularity TSI's IP-only allowlist doesn't provide. - Smolfile renderer rewritten in chunk 2 to smolvm 0.8.0's actual schema (image / entrypoint / cmd / env / [network] allow_cidrs). The chunk-1 renderer (name= / [[net]]= under the gvproxy design) emits the wrong shape and will be replaced. - Drop gvproxy + VZFileHandleNetworkDeviceAttachment + the PyObjC fallback. Backend layout loses gvproxy_config.py, gvproxy.py, vfkit_attach.py. - Acceptance plan adds an egress-port-bypass probe in addition to the localhost-reach probe. - Chunks reshape: chunk 1 stays (renderer rewrite is part of chunk 2's cost); chunk 2 covers VM lifecycle + bundle + new Smolfile renderer; chunk 3 is the bundle bind-address change; chunks 4-5 unchanged in spirit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 03:47:03 -04:00
didericis	bce1ea21db	Merge pull request 'docs(prd-0023): smolmachines bottle backend' (#53 ) from prd-0023-smolmachines-backend into main test / unit (push) Successful in 21s Details test / integration (push) Successful in 40s Details	2026-05-27 02:16:11 -04:00
didericis	539234f29e	refactor(sidecars): drop vestigial start/stop methods (PRD 0024 chunk 3) test / unit (pull_request) Successful in 21s Details test / integration (pull_request) Successful in 41s Details Compose-up has owned per-container lifecycle since PRD 0018 ch3; the .start() / .stop() methods on DockerPipelockProxy / DockerEgress / DockerGitGate / DockerSupervise (and their abstractmethod declarations in the four base ABCs) were already documented as vestigial. With the bundle path in flight (PRD 0024 ch2), they are truly dead — collapse to nothing. Changes: - Removed start/stop methods from the four DockerSidecar classes. Plan dataclasses, image/path constants, container-name helpers, and the .prepare() methods all stay (the renderer + apply path still need them). - Removed the matching @abstractmethod declarations in the base ABCs so concrete subclasses don't have to stub them. - launch.launch() and prepare.resolve_plan() no longer take proxy/git_gate/egress/supervise instance parameters. backend.py loses the four instance attributes it threaded through. prepare.resolve_plan() instantiates the four classes itself to call their .prepare() methods. - Deleted four integration tests that only exercised the removed lifecycle: test_pipelock_sidecar_smoke, test_supervise_sidecar, test_git_gate_sidecar, test_git_gate_mirror. - Dropped the .stop-idempotency case in test_orphan_cleanup; the network-cleanup cases stay (those test real production code). - Marked test_pipelock_apply @skip pending chunk 4 — its bringup helper used .start; chunk 4 rewrites it with direct `docker run`. Dockerfile deletion deferred to chunk 5 (when the bundle flag default flips) — the legacy compose path still needs Dockerfile.{egress,git-gate,supervise} until then. Net: 708 lines removed, 80 added. 533 unit tests + 27 integration tests passing (5 skipped: the chunk-4-pending case + existing GITEA_ACTIONS guards). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 01:01:10 -04:00
didericis	62109a1caf	fix(sidecars): child death no longer tears down the bundle test / unit (pull_request) Successful in 20s Details test / integration (pull_request) Successful in 1m8s Details Reverses chunk 1's "any unexpected child death tears down the rest" policy. New behavior: a daemon dying is logged but does NOT initiate shutdown — the surviving daemons keep running and whatever the dead one served starts failing visibly on the agent side. The supervisor exits only when (a) it receives SIGTERM/SIGINT, or (b) every child has died on its own. Eventual design is restart-the-dead-daemon plus a notification to the supervise sidecar so the operator sees the event explicitly; this commit ships only the "log and leave alone" half. PRD 0024 open question 1 updated to reflect the new intent. Tests updated: replaced "crash propagates exit code via auto-teardown" with three cases that exercise the new policy (crash without shutdown leaves survivors up, crash-then-signal surfaces the nonzero code, all-children-die-unattended still converges the loop). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 00:19:50 -04:00
didericis	1894f621dd	docs(prd-0024): consolidate per-bottle sidecars into a single bundle test / unit (pull_request) Successful in 17s Details test / integration (pull_request) Successful in 1m11s Details Replace pipelock + egress + git-gate + supervise as four separate containers with one bundle image (claude-bottle-sidecars) running all four daemons under a small stdlib Python init supervisor. Compose file collapses from five services to two; same daemons, same ports, same protocols, one container. Sized: bundle image + init → renderer collapse (feature-flagged) → backend Python trim → integration sweep → flag removal. Prerequisite for PRD 0023 chunk 3 (smolmachines backend reuses the same bundle as its sole host-side sidecar container). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-26 23:54:29 -04:00
didericis	4e00430c6e	docs(prd-0023): consume PRD 0024's bundle as the single sidecar test / unit (pull_request) Successful in 18s Details test / integration (pull_request) Successful in 1m11s Details Replace the four host-side sidecar processes (pipelock + egress + git-gate + supervise) with a single bundled container per bottle, defined in PRD 0024 and consumed here. egress is internal to the bundle as pipelock's upstream; only pipelock, git-gate, and supervise are externally addressable, and only when the bottle uses them. gvproxy port_forwards collapse from one-per-process to one-per- external-port, all pointing into the one bundle container. Sizing: chunk 3 becomes "sidecar bundle lifecycle" and depends on PRD 0024 having landed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-26 23:51:57 -04:00
didericis	041da1d7af	docs(prd-0023): make gvproxy the network primitive; reject TSI test / unit (pull_request) Successful in 19s Details test / integration (pull_request) Successful in 1m9s Details TSI's --outbound-localhost-only is permissive on all of 127.0.0.0/8 with no destination-port filter, so any host loopback service (local Postgres, IDE plugins, another bottle's sidecar) is reachable from the guest. That's the wrong default for the malicious-agent threat model. Reworked the network design around gvproxy + VFKT unixgram attachment: the guest gets a virtio-net device, gvproxy is the userspace TCP/IP stack on the host side, and the only thing reachable from the guest is the explicit port-forward list (typically just pipelock). Host LAN, host loopback, and the public internet directly are gone by construction. VMM choice (smolmachines vs PyObjC + Virtualization.framework) is an open question contingent on whether libkrun's virtio-net mode lets us point at a custom unixgram socket. Backend name stays "smolmachines" either way per the original spec. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-26 23:41:32 -04:00
didericis	a2ac124d5c	docs(prd-0023): smolmachines bottle backend test / unit (pull_request) Successful in 18s Details test / integration (pull_request) Successful in 1m7s Details Specs a second concrete BottleBackend selectable via CLAUDE_BOTTLE_BACKEND=smolmachines: per-agent libkrun microVM on macOS, sidecars relocated to host-side loopback ports plumbed via Smolfile env, PRD 0022's sandbox-escape suite as the acceptance gate (the env-var flip is the only change required). Docker backend ships unchanged and remains default. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-26 23:19:08 -04:00
didericis	1111ced04d	docs(prd-0022): resolve remaining open Qs test / unit (pull_request) Successful in 18s Details test / integration (pull_request) Successful in 1m7s Details All seven open questions now have decisions baked in: - Q1 (HTTP-exfil scope): authoritative. Every shape MUST block; chunk 3 expands into remediation sub-PRDs if any of path/query/header leak today. - Q3 (fake secret): multiple shapes, parameterized. Three env vars (TEST_SECRET_ANTHROPIC, _AWS, _GENERIC); test 5 loops via subTest. Resilient to gitleaks rule renames. - Q6 (missing backend): die. `get_bottle_backend()`'s current behavior surfaces clearly; surprise-skips are worse than loud failures for new-backend branches. - Q7 (tool deps): preflight check. setUpClass runs `which curl && which git && which dig`; SkipTest with the missing list catches future backends shipping thinner base images. Updated implementation chunks + test-5 sketch to match. No remaining open questions.	2026-05-26 22:11:32 -04:00
didericis	73939861f9	docs(prd-0022): resolve open Qs 2, 4, 5 (DNS, gitleaks order, CI) test / unit (pull_request) Successful in 18s Details test / integration (pull_request) Successful in 1m7s Details User feedback: - Q2 (direct DNS resolver test): yes — test 4 grows a second sub-assertion verifying `dig @8.8.8.8` from the agent has no path out, alongside the existing crafted-subdomain check. - Q4 (gitleaks ordering): test 5 grows an ordering check — asserts the rejection mentions `gitleaks` AND does NOT mention upstream-network-phase phrases (resolve / refused / unreachable / upstream). Confirms gitleaks rejects BEFORE git-gate tries any upstream push. - Q5 (CI): try it, accept fallback. New chunk 6 adds a Gitea Actions job marked `continue-on-error: true` — runs the suite if the runner can host compose, doesn't block the workflow if docker-in-docker prevents it. Three open questions remain (1: pipelock's actual DLP coverage for non-body shapes; 3: realistic fake secret shape vs. gitleaks regex; 6+7: backend-agnostic invocation + required tools — for the smolmachines work).	2026-05-26 22:04:46 -04:00
didericis	62f6716e8d	docs(prd-0022): end-to-end sandbox-escape integration test test / unit (pull_request) Successful in 19s Details test / integration (pull_request) Successful in 1m9s Details Draft a PRD for a composite integration test that brings up a real bottle with a known allowlist + planted secret and runs five attacks from inside the agent container: 1. Request to non-allowlisted hostname 2. Request to non-allowlisted IP (incl. host-header spoof) 3. Secret exfil via HTTP — path / query / body / headers 4. Secret exfil via crafted DNS subdomain 5. Secret exfil via README link pushed through git-gate Each attack passes only when blocked with a permissions error. The suite is backend-agnostic — runs against whatever CLAUDE_BOTTLE_BACKEND selects — so it becomes the gate the upcoming smolmachines spike has to pass before that backend can substitute for Docker. Sized into 5 chunks (fixture → attacks 1+2 → attack 3 → attack 4 → attack 5). Seven open questions called out, biggest being: today's pipelock probably leaks via header / path / query because DLP only scans bodies — the test will expose this as a real gap (chunk 3 lands with `expectedFailure` markers if so).	2026-05-26 21:52:24 -04:00
didericis	e5316be454	docs(prd-0021): rewrite as standalone — no references to closed PR #48 test / unit (pull_request) Successful in 18s Details test / integration (pull_request) Successful in 1m6s Details PR #48 closed; treat the implementation as starting from main, where no tmux integration exists yet. The PRD now describes the full design (including the `_in_tmux` detection + helper scaffolding) as fresh work. Sized into 4 chunks: `claude_docker_argv` refactor → tmux helpers + pane state + `_attach_to_bottle` dispatch → new-agent flow → stop + indicator. Same design as before — opt-in by `\$TMUX`, split-window-then- respawn, falls back to handoff on tmux failure or missing binary. No external references to PR #48.	2026-05-26 14:18:24 -04:00
didericis	8b8d668602	docs(prd-0021): dashboard as left tmux pane, selected agent as right pane test / unit (pull_request) Successful in 18s Details test / integration (pull_request) Successful in 1m8s Details Draft a PRD that tightens PR #48's tmux integration from "one new window per attach" to "one persistent right pane that the dashboard's selection drives." Inside tmux (`\$TMUX` set): dashboard in the left pane; pressing Enter or `n` spawns claude in the right pane via `tmux split-window` on first attach, then `tmux respawn-pane` on subsequent attaches so the operator-focused agent is always the visible one. Outside tmux: falls back to today's handoff. Opt-in by environment; no flag. Sized into 4 chunks (pane state + create → respawn → stop integration → supersede PR #48's new-window). Seven open questions called out, the biggest being whether the dashboard should auto-exec into a fresh tmux session when launched outside one (v1 says no — operators start tmux themselves).	2026-05-26 14:14:02 -04:00
didericis	26322bdfd5	docs(prd-0020): record answers to open questions, switch to no-teardown-on-quit	2026-05-26 03:10:26 -04:00
didericis	ec20293c0a	docs(prd-0020): start + attach to agents from the dashboard test / unit (pull_request) Successful in 18s Details test / integration (pull_request) Successful in 1m7s Details Draft a PRD that turns the dashboard into the operator's single surface — collapses today's two-terminal workflow (one for `./cli.py start`, one for `./cli.py dashboard`) into a single dashboard invocation that can spin up new agents, re-attach to ones it already spun up, and explicitly stop them. Picks the "handoff" mechanism from `docs/research/claude-code- pane-in-dashboard.md` (curses.endwin → docker exec -it claude → stdscr.refresh) and crucially decouples the bottle's lifetime from any single claude session: exit claude → back to dashboard with the bottle still running; quit dashboard → tear down every bottle the dashboard owns. Sized into 5 chunks (refactor → picker + new-agent → re-attach → explicit stop → quit-cleanup). Seven open questions called out, the biggest being modal-vs-drop-and-resume for the preflight Y/N inside curses.	2026-05-26 02:59:42 -04:00
didericis	8cd867f3d2	docs(research): claude-code pane in the dashboard test / integration (pull_request) Successful in 1m8s Details test / unit (pull_request) Successful in 17s Details test / unit (push) Successful in 17s Details test / integration (push) Successful in 1m2s Details Survey the three realistic ways to surface a claude-code session inside the dashboard TUI: 1. Handoff — drop curses, foreground claude, restore on exit (the existing `e`/`p` pattern, extended). Minimal code, side-by-time rather than side-by-side. 2. Embedded emulator — own a PTY, parse claude-code's ANSI stream via `pyte`, paint it into a curses pane. Real "pane in the dashboard" but a six-week build with one new dep and several integration trap-doors (alt-screen, resize, input routing, multi-PTY state). 3. External multiplexer — delegate pane creation to tmux / iTerm / wezterm when detected. Tiny code, but splits the operator's mental model and gives up layout control. Recommendation: ship Option 1 first; defer Option 2 to "only if Option 1 is observably insufficient"; treat Option 3 as a niche augmentation for power users. Calls out four followups worth verifying before committing (PTY behavior at small sizes, attach-to-existing-exec, SIGWINCH handling, `-it` vs `-i` for the embedded path).	2026-05-26 02:51:08 -04:00
didericis	9c9c32a941	docs(prd-0019): drop e/p fallback — selection-only, no-op otherwise test / unit (pull_request) Successful in 18s Details test / integration (pull_request) Successful in 1m6s Details When no agent is selected, `e` / `p` do nothing (status line shows "no agent selected") rather than falling back to today's global discover-and-prompt. The discover-and-prompt scaffolding in `_operator_edit_routes_flow` / `_operator_edit_allowlist_flow` comes out entirely — selection in the agents pane is now the only way to scope an edit. Old open-question #4 (single-bottle shortcut behavior in proposals-pane mode) is moot and removed.	2026-05-26 01:03:23 -04:00
didericis	9539982d3f	docs(prd-0019): active agents in dashboard + agent-scoped edit verbs test / unit (pull_request) Successful in 17s Details test / integration (pull_request) Successful in 1m3s Details Draft a PRD that adds an "active agents" pane to the dashboard TUI (below the existing proposals pane) and reshapes the operator `routes edit` (e) / `pipelock edit` (p) verbs to be agent-scoped when the cursor is in the agents pane — no more global discover + disambiguation prompt on every press. Tab toggles which pane nav keys move through. Sized into 4 chunks (discovery helper → render pane → selection state → agent-scoped verbs). Six open questions called out, the biggest being whether per-bottle `compose ps` on every 1s tick scales for hosts with many bottles (answer leans toward one label-filtered `docker ps`).	2026-05-26 00:58:34 -04:00
didericis	3386cabe62	docs(prd-0018): resolve TTY open question — keep exec -it test / unit (pull_request) Successful in 18s Details test / integration (pull_request) Successful in 1m3s Details	2026-05-25 22:34:26 -04:00
didericis	3251ee1394	docs(prd-0018): one compose project per bottle instance test / unit (pull_request) Successful in 16s Details test / integration (pull_request) Successful in 1m3s Details Draft a PRD that replaces the chain of per-sidecar docker SDK calls in `claude-bottle start` with a single `docker compose` project per instance. Each `state/<slug>/` dir gets a self-describing set of artifacts: metadata.json, docker-compose.yml, compose.log, and the existing transcript/ + live-config/.	2026-05-25 22:15:32 -04:00
didericis	9cd583fbbb	feat(egress-proxy): retarget remediation at egress-proxy (PRD 0017 chunk 3) test / unit (pull_request) Successful in 19s Details test / integration (pull_request) Successful in 1m6s Details Finishes PRD 0017. The `cred-proxy-block` MCP tool is renamed and its remediation apply path is repointed at egress-proxy. - `claude_bottle/supervise.py` — `TOOL_CRED_PROXY_BLOCK` → `TOOL_EGRESS_PROXY_BLOCK`; `COMPONENT_FOR_TOOL` maps the new tool ID to `egress-proxy` for audit-log routing. - `claude_bottle/supervise_server.py` — tool definition renamed + description rewritten: "Call when egress-proxy refused your HTTPS request ... Read the current routes.yaml from /etc/ claude-bottle/current-config/routes.yaml, compose a modified version, pass the full new file plus a justification." The syntactic validator dispatches on the new tool ID. - `claude_bottle/backend/docker/egress_proxy_apply.py` — renamed from `cred_proxy_apply.py`. Reads routes.yaml from /etc/egress-proxy/routes.yaml via `docker exec cat`; validates via `egress_proxy_addon_core.load_routes` (so both sides use the same parser); writes via `docker cp`; SIGHUPs egress-proxy with `docker kill --signal HUP`. `EgressProxyApplyError` replaces `CredProxyApplyError`. - `claude_bottle/cli/dashboard.py` — wires the new apply + `discover_egress_proxy_slugs` helper; the operator-initiated `routes edit <bottle>` verb now writes to egress-proxy with `.yaml` suffix. Stale follow-up comment about path-aware filtering removed — PRD 0017 settled that question. - `tests/integration/test_supervise_sidecar.py` — restores the approval round-trip test (chunk 2 had switched it to a reject path because no cred-proxy existed). Approval stubs `apply_routes_change` so the test focuses on the supervise queue/response plumbing rather than docker-exec into a real egress-proxy sidecar (that's covered separately). - `tests/unit/test_egress_proxy_apply.py` — rewritten against the new validator; covers JSON shape, missing routes key, partial-auth-pair rejection (the addon-core parser catches these before SIGHUP). - PRDs 0010 + 0014 — status headers updated to Superseded / Retargeted with a callout block pointing at PRD 0017's migration section. Historical text preserved. 384 unit + integration tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 15:13:44 -04:00
didericis	a79b2b7be0	docs(prd-0017): nest auth.scheme + auth.token_ref under optional `auth` test / unit (pull_request) Successful in 18s Details test / integration (pull_request) Successful in 1m38s Details Earlier draft had `auth_scheme: "none"` as the unauthenticated signal — awkward sentinel. Nest the two credential-injection fields under an optional `auth` key instead. Presence of the key = authenticated; absence = unauthenticated. Empty `auth: {}` is an error (omission is what means "no auth"). Touches: scope bullet, manifest example, mitmproxy addon description's auth-handling step. Two trailing `auth_scheme: "none"` references kept as historical context for what the new shape replaces. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 13:35:47 -04:00
didericis	b0d9802469	docs(prd-0017): pivot to mitmproxy-based egress-proxy test / unit (pull_request) Successful in 18s Details test / integration (pull_request) Successful in 1m34s Details Significant rewrite of PRD 0017 based on PR #25 design discussion. Original draft proposed adding `path_allowlist` to the existing cred-proxy. That bought opt-in path filtering for tools that voluntarily routed through cred-proxy (Claude Code, git, npm) — but raw `curl https://github.com/foo` from the agent goes to HTTPS_PROXY=pipelock and bypasses cred-proxy entirely, so any universal enforcement claim was a lie. New design: replace cred-proxy with a mitmproxy-based egress-proxy that becomes the agent's HTTP_PROXY/HTTPS_PROXY. Every agent HTTP/HTTPS request flows through it before reaching pipelock. Path-level allow/deny enforcement is universal because the proxy is on every leg. The proxy also absorbs cred-proxy's credential injection role (mitmproxy addon hooks request → strip + inject Authorization). Net sidecar count: unchanged. cred-proxy is replaced 1:1 by egress-proxy. Pipelock stays as hostname allow + DLP downstream of egress-proxy. Decisions baked in per PR-#25 discussion: - Tool: mitmproxy (designed for this; Python addons; well-maintained). - CA custody: egress-proxy holds the per-bottle MITM CA key (concentration accepted; documented in trust-domain section). - Migration: hard cutover. Existing `bottle.cred_proxy.routes[]` manifests fail-fast at load time with a pointer at this PRD. Open questions retained for the implementation PRs: addon distribution (bake vs mount), prefix-vs-glob match, double-strip of Authorization between egress-proxy and pipelock, whether pipelock keeps TLS interception or stays hostname-only post-cutover, performance under two-MITM-hops. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 13:28:53 -04:00
didericis	5b925a6699	docs(prd-0017): path-aware egress filtering via cred-proxy test / unit (pull_request) Successful in 17s Details test / integration (pull_request) Successful in 1m34s Details Extends cred-proxy to filter (not just route) paths, including for unauthenticated upstreams via a new `auth_scheme: "none"` mode and `path_allowlist` field per route. Pipelock keeps its hostname allowlist + DLP role; cred-proxy adds path-level enforcement for routes that opt in. Motivated by PR #25's follow-up note in _apply_pipelock_url: pipelock 2.3.0's api_allowlist is hostname-only, so approving pipelock-block opens the entire host. For shared platforms (github.com, gitlab.com, public registries) operators usually want narrower-than-host granularity. Draft status; open questions on match semantics, allow-route-with- empty-allowlist edge case, and the eventual MCP tool shape for agent-proposed path additions. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 08:33:01 -04:00
didericis	5e8ca21669	docs: replace stale bash-first framing with Python-stdlib-first test / unit (pull_request) Successful in 16s Details test / integration (pull_request) Successful in 1m32s Details The project started life as bash scripts and got rewritten to Python (documented in docs/research/bash-vs-python-vs-go.md). Several docs still carried the old "bash-first" framing — misleading for anyone reading them now (8.7k lines of Python vs. ~130 lines of bash, all in scripts/demo*.sh). - CLAUDE.md "What this is" + "Conventions": orchestrator is Python, posture is stdlib-first. - docs/prds/0010-cred-proxy.md, docs/research/manifest-format-and- grouping.md: quoted CLAUDE.md's old wording — re-quote. - docs/research/built-in-supervisor-design.md, landscape-containerized- claude.md, agent-sandbox-landscape.md, pipelock-assessment.md, network-egress-guard.md: drop "bash-first" claims about the project, keep accurate descriptions of external tools' bash usage. Leaves untouched: bash code-fence syntax in examples, README's literal `bash scripts/demo.sh` invocation (the demo IS bash), Claude Code's "Bash tool" references, IVIJL/devbox bash description (that project actually is bash), and the bash-vs-python-vs-go research note that records the rewrite decision. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 06:32:42 -04:00
didericis	de87f21ff8	docs(prd-0016): capability block remediation test / unit (pull_request) Successful in 17s Details test / integration (pull_request) Successful in 1m13s Details Adds PRD 0016, the heaviest of the three remediation engines in the stuck-agent recovery flow (overview in PRD 0012, foundation in PRD 0013). Wires the capability block path: rebuild orchestrator, state-preservation helper, capability-block end-to-end. On approval the orchestrator tears down the bottle, builds from the new Dockerfile, and starts a replacement on the same branch via state-preservation. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 05:15:32 -04:00
didericis	0197599e49	docs(prd-0015): pipelock block remediation test / unit (pull_request) Successful in 17s Details test / integration (pull_request) Successful in 1m12s Details Adds PRD 0015, the second remediation engine in the stuck-agent recovery flow (overview in PRD 0012, foundation in PRD 0013). Wires the pipelock block path with restart-based reload: supervisor writes the new allowlist on approval and restarts pipelock, proactive pipelock edit TUI verb, pipelock audit log filled in. SIGHUP reload for pipelock is deferred to a follow-up. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 04:54:25 -04:00
didericis	76a9bd2586	docs(prd-0014): cred-proxy block remediation test / unit (pull_request) Successful in 17s Details test / integration (pull_request) Successful in 45s Details Adds PRD 0014, the first end-to-end remediation engine in the stuck-agent recovery flow (overview in PRD 0012, foundation in PRD 0013). Wires the cred-proxy block path: SIGHUP-based hot reload of routes.json on cred-proxy, supervisor write-on-approval, proactive routes edit TUI verb, cred-proxy audit log filled in. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 04:37:09 -04:00
didericis	578363bea3	docs(prd-0013): supervise plane foundation Adds PRD 0013, the shared foundation for the stuck-agent recovery flow (overview in PRD 0012). Defines the MCP sidecar, the three tool definitions, the proposal queue, the read-only current-config mount, the minimal TUI, and the audit log format. Approval handlers are deliberately no-ops; the actual remediations land in PRDs 0014, 0015, and 0016. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 04:20:57 -04:00
didericis	4079678ceb	docs(prd-0012): split into overview + 4 implementation PRDs test / unit (push) Successful in 13s Details test / integration (push) Successful in 22s Details PRD 0012 becomes the cross-cutting overview (stuck categories taxonomy, sidecar-vs-in-container rationale, implementation chunk pointers). Implementation detail moves into four follow-on PRDs that 0012 references: 0013 (supervise plane foundation), 0014 (cred-proxy block remediation), 0015 (pipelock block remediation), 0016 (capability block remediation). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 04:19:50 -04:00
didericis	58acdcac87	docs(prd-0012): explain why the MCP server is a sidecar, not in-container Captures the rationale for placing the MCP server outside the agent container. The bottle wall doesn't strictly require it (the operator TUI is the actual gate), but pattern consistency, audit metadata trust, connection lifecycle, future enforcement headroom, and pipelock cleanliness all argue for sidecar placement. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 04:19:50 -04:00
didericis	6e4bb3ba8d	docs(prd-0012): switch /stuck to three structured MCP tool calls Replaces the text-only /supervise/notify protocol with three MCP tools the agent calls directly: cred-proxy-block, pipelock-block, and capability-block. Each tool carries the agent's proposed config file (routes.json, pipelock allowlist, or Dockerfile) plus a justification. Adds a new MCP sidecar, a read-only current-config mount in the agent container, and renames "capability gap" to "capability block" to match the tool name. The text-only-vs-structured tradeoff is captured as an Open question with pros/cons on both sides. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 04:19:50 -04:00
didericis	66fc29c72e	docs(prd-0012): name the three stuck categories and add pipelock path Introduces cred-proxy block, pipelock block, and capability gap as the three named categories of stuck. Adds pipelock-edit support (restart- based for v1) parallel to the existing cred-proxy routes-edit path, plus a pipelock audit log. Broadens Goals to cover all three paths. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 04:19:50 -04:00
didericis	a6222aaa57	docs(prd-0012): adopt text-only notify protocol + SIGHUP routes reload Rewrites Scope, Proposed Design, Data model, and Open questions to match the model where /supervise/notify is text-in/text-out, routes edits + SIGHUP reload are supervisor-side tooling, and manifest rebuilds are the heavy path. Adds the per-bottle routes-edit audit log. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 04:19:50 -04:00
didericis	4cce535008	docs(research): drop auto-respawn from the supervisor design The autonomous "review comment → respawn bottle with comment as next prompt" loop is the one feature that opens a prompt-injection vector the bottle wall can't close (a public commenter would get to issue instructions inside the agent's perimeter on every launch). The available mitigations — commenter allowlists, prompt-injection regex screens, private-repo defaults — are all soft. The durable defense is to keep the human between the review comment and any next agent prompt. So `supervise` is now strictly notify-only. The `auto_respawn` manifest field, the "with auto_respawn: true" behavior paragraph, and the matching trust-model edge case all go. The reasoning stays in the "Where to be conservative" bullet so the decision isn't re-litigated later.	2026-05-25 04:19:50 -04:00
didericis	afbb77b040	docs(research): built-in supervisor design (TUI + PR feedback)	2026-05-25 04:19:50 -04:00

1 2 3

107 Commits