Commit Graph

100 Commits

Author SHA1 Message Date
didericis-codex 59ee32cc8d refactor(manifest): key git config by host
test / unit (pull_request) Successful in 33s
test / integration (pull_request) Successful in 42s
2026-05-28 00:49:34 -04:00
didericis-claude 4f7a506a9e docs(prd): 0025 — bottle composition via extends: (issue #88)
test / unit (pull_request) Successful in 27s
test / integration (pull_request) Successful in 40s
2026-05-27 23:27:04 -04:00
didericis-claude 7eda2a66ec feat(smolmachines): patch smolvm state DB to actually enforce per-bottle allowlist
test / unit (pull_request) Successful in 26s
test / integration (pull_request) Successful in 44s
Earlier commit framed this PR as "infrastructure landed, TSI
enforcement blocked on upstream smolvm 0.8.0." Found a clean
workaround that lets us enforce now.

Smolvm persists each machine's config (including
`allowed_cidrs`) as a JSON BLOB in
`~/Library/Application Support/smolvm/server/smolvm.db`,
`vms.data`. `machine create --allow-cidr X/32` silently writes
`allowed_cidrs: null` to that row when combined with `--from`,
but smolvm reads the row at `machine start` — so patching the
row between create and start sets the allowlist for real.

New `loopback_alias.force_allowlist(machine_name, cidrs)` opens
the SQLite DB, JSON-decodes the row, sets `allowed_cidrs`, and
writes back as BLOB (Text type silently corrupts smolvm's
later reads). launch.py calls it immediately after
`machine_create` and before `machine_start`.

Verified end-to-end on macOS / Docker Desktop:

  VM allowlist after start: ["127.0.0.16/32"]
  VM → 127.0.0.1:3000      → BLOCKED (Permission denied)
  VM → 8.8.8.8:53          → BLOCKED (Permission denied)
  VM → 127.0.0.16:<bundle> → CONNECTED

The DB-patch hack is correct only because smolvm reads
`allowed_cidrs` from the row at start time (not derived in-
process). When upstream honors `--allow-cidr` with `--from`,
the call becomes redundant — drop the call and the workaround
is gone.

Tests: 4 new for `force_allowlist` (BLOB round-trip; Linux
no-op; missing DB; missing row). Total 593 unit tests pass.

README + PRD updated to reflect the fix landed (no longer
"infrastructure pending upstream"). gitea#75 can close.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 16:55:03 -04:00
didericis-claude a919268d5e docs: honest framing of upstream smolvm 0.8.0 allowlist bug
test / unit (pull_request) Successful in 26s
test / integration (pull_request) Successful in 40s
PR #76 originally claimed the per-bottle alias scoping closed
gitea#75 ("agent can reach host loopback"). Verified
empirically that's not actually true: `smolvm 0.8.0 machine
create --from <smolmachine> --net --allow-cidr X/32` silently
drops the allowlist (`agent.config.json` shows `allowed_cidrs:
null`, and the running VM reaches all of `127.0.0.0/8`
regardless).

So the alias-allocation + alias-bind infrastructure is correct
pre-work, but the actual TSI enforcement is blocked on an
upstream smolvm bug. README + PRD 0023 + the module docstring
get reworded to say so plainly. gitea#75 stays open.

Workarounds tried (all dead-ends):
- `machine update --allow-cidr` doesn't exist
- stop-edit-`agent.config.json`-restart fails (smolvm removes
  the file on stop)
- `--smolfile` is mutually exclusive with `--from`
- `--image localhost:<port>/...` fails because smolvm's agent
  process can't reach host loopback during pull

When upstream lands a fix, our existing code (alias allocation,
port-bind, --allow-cidr in launch) will scope correctly without
further changes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 16:37:56 -04:00
didericis-claude 2edc1abb9a feat(smolmachines): per-bottle loopback alias scopes TSI to single /32
test / unit (pull_request) Successful in 27s
test / integration (pull_request) Successful in 41s
PR #74's Docker-Desktop fix routed the agent through
`127.0.0.1:<random>` loopback forwards, but TSI filters by IP
only — so the allowlist `127.0.0.1/32` let the agent VM reach
**any** host service on macOS loopback (postgres, dev servers,
other bottles' published ports, mDNSResponder, ...). Real
downgrade vs the docker backend's `--internal` network.

Resolution: per-bottle loopback alias.

- New `loopback_alias` module manages a pool of
  `127.0.0.16` .. `127.0.0.31` on `lo0`. macOS only routes
  `127.0.0.1` by default; the extras need `sudo ifconfig lo0
  alias`. `ensure_pool()` lazily adds the missing entries via
  one sudo prompt on first launch per reboot — aliases persist
  on `lo0` until reboot, so subsequent launches skip the
  prompt entirely.
- `allocate(slug)` picks the lowest-numbered unused alias by
  inspecting running bundle containers' port-binding HostIps.
  No on-disk reservation — docker is the source of truth.
- Bundle bringup binds published ports to the allocated alias
  (`docker run -p <alias>::<port>`) instead of `127.0.0.1`.
- TSI allowlist becomes the alias's /32 — narrows reachability
  to this bottle's bundle only.
- Linux native daemons share the host's network namespace;
  `127.0.0.0/8` works without aliases, so the module no-ops on
  non-Darwin and returns `127.0.0.1` from `allocate`.

Tracking issue closed: gitea/issues/75.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 16:23:17 -04:00
didericis-claude 45c821a8f3 docs(smolmachines): note loopback-scope limitation + tracking issue
test / unit (pull_request) Successful in 26s
test / integration (pull_request) Successful in 43s
PR #74's Docker-Desktop pivot widened the smolmachines TSI
allowlist from `<bundle-ip>/32` to `127.0.0.1/32` (TSI can't
filter by port, and docker bridge IPs aren't reachable from
macOS networking). The agent VM can therefore reach any service
on macOS's loopback while the bottle is running — not just the
bundle's published ports.

README gets a "Smolmachines backend" subsection under Quickstart
spelling this out as a known v1 limitation. PRD 0023 grows a new
open question #8 with the proposed v2 fix (per-bottle loopback
alias + TSI allowlist scoped to that /32, via sudo
`ifconfig lo0 alias`).

Tracking issue: gitea.dideric.is/didericis/claude-bottle/issues/75.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 15:58:30 -04:00
didericis-claude 1fa17d1822 feat(smolmachines): build agent image from repo Dockerfile (PRD 0023 chunk 4c)
test / unit (pull_request) Successful in 21s
test / unit (push) Successful in 21s
test / integration (push) Successful in 42s
test / integration (pull_request) Successful in 41s
Replaces the alpine:latest placeholder with a real claude-bottle
agent image, converted into a .smolmachine artifact via an
ephemeral local OCI registry.

Why the registry hop: smolvm pack create only accepts OCI registry
refs. Empirically it rejects docker-daemon://, oci-layout://,
docker-archive: tarballs, and every other transport tested — the
crane backend treats anything with a scheme prefix as a registry
hostname. To convert a locally-built docker image into a
.smolmachine we have to push it somewhere smolvm can pull from.
Smallest path: bring up registry:2.8.3 bound to 127.0.0.1:<random>,
docker tag + docker push into it, smolvm pack create --image
localhost:<port>/claude-bottle:<id>, tear down the registry.

The .smolmachine is cached under
~/.cache/claude-bottle/smolmachines/ keyed by the docker image ID
(first 16 hex chars of the sha256), so a Dockerfile change picks
up a new image ID and invalidates the cache. Unchanged rebuilds
skip the whole build → registry → pack pipeline.

This puts `docker build` in smolmachines prepare (the docker
backend defers it to launch). Necessary because pack_create needs
the image ID to derive the cache key, and prepare is the only
hook ahead of launch that runs once per slug.

Adds:
- claude_bottle/backend/docker/util.py: image_id / tag / push
  helpers (thin docker CLI wrappers).
- claude_bottle/backend/smolmachines/local_registry.py:
  ephemeral_registry() context manager; pins registry:2.8.3 by
  digest, binds 127.0.0.1::5000 (loopback-only), force-removes on
  exit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 13:51:02 -04:00
didericis-claude 5929caa219 docs(prd-0023): pivot to smolvm + TSI single-IP allowlist
test / unit (pull_request) Successful in 22s
test / integration (pull_request) Successful in 43s
Chunk-1's empirical spike against smolvm 0.8.0 contradicted the
research note that motivated the gvproxy network design: smolvm
exposes no virtio-net-over-unixgram attachment. The first draft's
"why gvproxy, not TSI" argument turns out to apply only to
`--outbound-localhost-only`, not to TSI generally.

New design:

- Bundle (PRD 0024) runs on a dedicated per-bottle docker bridge
  with a pinned IP. Smolfile sets `[network] allow_cidrs =
  ["<bundle-ip>/32"]` and nothing else. Agent can reach the bundle
  and nothing else — host loopback, LAN, public internet directly
  are all refused at the VMM (TSI) layer.
- Bind-address mitigation: egress binds 127.0.0.1:9099 inside the
  bundle (pipelock-internal); pipelock / git-gate / supervise
  bind 0.0.0.0 so the agent (across the TSI allowlist) can reach
  them. This is the port-granularity TSI's IP-only allowlist
  doesn't provide.
- Smolfile renderer rewritten in chunk 2 to smolvm 0.8.0's actual
  schema (image / entrypoint / cmd / env / [network] allow_cidrs).
  The chunk-1 renderer (name= / [[net]]= under the gvproxy
  design) emits the wrong shape and will be replaced.
- Drop gvproxy + VZFileHandleNetworkDeviceAttachment + the
  PyObjC fallback. Backend layout loses gvproxy_config.py,
  gvproxy.py, vfkit_attach.py.
- Acceptance plan adds an egress-port-bypass probe in addition
  to the localhost-reach probe.
- Chunks reshape: chunk 1 stays (renderer rewrite is part of
  chunk 2's cost); chunk 2 covers VM lifecycle + bundle + new
  Smolfile renderer; chunk 3 is the bundle bind-address change;
  chunks 4-5 unchanged in spirit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 03:47:03 -04:00
didericis bce1ea21db Merge pull request 'docs(prd-0023): smolmachines bottle backend' (#53) from prd-0023-smolmachines-backend into main
test / unit (push) Successful in 21s
test / integration (push) Successful in 40s
2026-05-27 02:16:11 -04:00
didericis 539234f29e refactor(sidecars): drop vestigial start/stop methods (PRD 0024 chunk 3)
test / unit (pull_request) Successful in 21s
test / integration (pull_request) Successful in 41s
Compose-up has owned per-container lifecycle since PRD 0018 ch3;
the .start() / .stop() methods on DockerPipelockProxy /
DockerEgress / DockerGitGate / DockerSupervise (and their
abstractmethod declarations in the four base ABCs) were already
documented as vestigial. With the bundle path in flight
(PRD 0024 ch2), they are truly dead — collapse to nothing.

Changes:
- Removed start/stop methods from the four DockerSidecar
  classes. Plan dataclasses, image/path constants,
  container-name helpers, and the .prepare() methods all stay
  (the renderer + apply path still need them).
- Removed the matching @abstractmethod declarations in the
  base ABCs so concrete subclasses don't have to stub them.
- launch.launch() and prepare.resolve_plan() no longer take
  proxy/git_gate/egress/supervise instance parameters. backend.py
  loses the four instance attributes it threaded through.
  prepare.resolve_plan() instantiates the four classes itself
  to call their .prepare() methods.
- Deleted four integration tests that only exercised the
  removed lifecycle: test_pipelock_sidecar_smoke,
  test_supervise_sidecar, test_git_gate_sidecar,
  test_git_gate_mirror.
- Dropped the .stop-idempotency case in test_orphan_cleanup;
  the network-cleanup cases stay (those test real production
  code).
- Marked test_pipelock_apply @skip pending chunk 4 — its
  bringup helper used .start; chunk 4 rewrites it with direct
  `docker run`.

Dockerfile deletion deferred to chunk 5 (when the bundle flag
default flips) — the legacy compose path still needs
Dockerfile.{egress,git-gate,supervise} until then.

Net: 708 lines removed, 80 added.

533 unit tests + 27 integration tests passing (5 skipped: the
chunk-4-pending case + existing GITEA_ACTIONS guards).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 01:01:10 -04:00
didericis 62109a1caf fix(sidecars): child death no longer tears down the bundle
test / unit (pull_request) Successful in 20s
test / integration (pull_request) Successful in 1m8s
Reverses chunk 1's "any unexpected child death tears down the
rest" policy. New behavior: a daemon dying is logged but does
NOT initiate shutdown — the surviving daemons keep running and
whatever the dead one served starts failing visibly on the
agent side. The supervisor exits only when (a) it receives
SIGTERM/SIGINT, or (b) every child has died on its own.

Eventual design is restart-the-dead-daemon plus a notification
to the supervise sidecar so the operator sees the event
explicitly; this commit ships only the "log and leave alone"
half. PRD 0024 open question 1 updated to reflect the new
intent.

Tests updated: replaced "crash propagates exit code via
auto-teardown" with three cases that exercise the new policy
(crash without shutdown leaves survivors up, crash-then-signal
surfaces the nonzero code, all-children-die-unattended still
converges the loop).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 00:19:50 -04:00
didericis 1894f621dd docs(prd-0024): consolidate per-bottle sidecars into a single bundle
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m11s
Replace pipelock + egress + git-gate + supervise as four
separate containers with one bundle image
(claude-bottle-sidecars) running all four daemons under a small
stdlib Python init supervisor. Compose file collapses from five
services to two; same daemons, same ports, same protocols, one
container.

Sized: bundle image + init → renderer collapse (feature-flagged)
→ backend Python trim → integration sweep → flag removal.

Prerequisite for PRD 0023 chunk 3 (smolmachines backend reuses
the same bundle as its sole host-side sidecar container).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 23:54:29 -04:00
didericis 4e00430c6e docs(prd-0023): consume PRD 0024's bundle as the single sidecar
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m11s
Replace the four host-side sidecar processes (pipelock + egress +
git-gate + supervise) with a single bundled container per bottle,
defined in PRD 0024 and consumed here. egress is internal to the
bundle as pipelock's upstream; only pipelock, git-gate, and
supervise are externally addressable, and only when the bottle
uses them.

gvproxy port_forwards collapse from one-per-process to one-per-
external-port, all pointing into the one bundle container.
Sizing: chunk 3 becomes "sidecar bundle lifecycle" and depends
on PRD 0024 having landed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 23:51:57 -04:00
didericis 041da1d7af docs(prd-0023): make gvproxy the network primitive; reject TSI
test / unit (pull_request) Successful in 19s
test / integration (pull_request) Successful in 1m9s
TSI's --outbound-localhost-only is permissive on all of
127.0.0.0/8 with no destination-port filter, so any host
loopback service (local Postgres, IDE plugins, another bottle's
sidecar) is reachable from the guest. That's the wrong default
for the malicious-agent threat model.

Reworked the network design around gvproxy + VFKT unixgram
attachment: the guest gets a virtio-net device, gvproxy is the
userspace TCP/IP stack on the host side, and the only thing
reachable from the guest is the explicit port-forward list
(typically just pipelock). Host LAN, host loopback, and the
public internet directly are gone by construction.

VMM choice (smolmachines vs PyObjC + Virtualization.framework)
is an open question contingent on whether libkrun's virtio-net
mode lets us point at a custom unixgram socket. Backend name
stays "smolmachines" either way per the original spec.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 23:41:32 -04:00
didericis a2ac124d5c docs(prd-0023): smolmachines bottle backend
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m7s
Specs a second concrete BottleBackend selectable via
CLAUDE_BOTTLE_BACKEND=smolmachines: per-agent libkrun microVM on
macOS, sidecars relocated to host-side loopback ports plumbed via
Smolfile env, PRD 0022's sandbox-escape suite as the acceptance
gate (the env-var flip is the only change required). Docker
backend ships unchanged and remains default.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 23:19:08 -04:00
didericis 1111ced04d docs(prd-0022): resolve remaining open Qs
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m7s
All seven open questions now have decisions baked in:

  - Q1 (HTTP-exfil scope): authoritative. Every shape MUST
    block; chunk 3 expands into remediation sub-PRDs if
    any of path/query/header leak today.
  - Q3 (fake secret): multiple shapes, parameterized.
    Three env vars (TEST_SECRET_ANTHROPIC, _AWS, _GENERIC);
    test 5 loops via subTest. Resilient to gitleaks rule
    renames.
  - Q6 (missing backend): die. `get_bottle_backend()`'s
    current behavior surfaces clearly; surprise-skips are
    worse than loud failures for new-backend branches.
  - Q7 (tool deps): preflight check. setUpClass runs
    `which curl && which git && which dig`; SkipTest with
    the missing list catches future backends shipping
    thinner base images.

Updated implementation chunks + test-5 sketch to match.
No remaining open questions.
2026-05-26 22:11:32 -04:00
didericis 73939861f9 docs(prd-0022): resolve open Qs 2, 4, 5 (DNS, gitleaks order, CI)
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m7s
User feedback:

  - Q2 (direct DNS resolver test): yes — test 4 grows a
    second sub-assertion verifying `dig @8.8.8.8` from the
    agent has no path out, alongside the existing
    crafted-subdomain check.
  - Q4 (gitleaks ordering): test 5 grows an ordering check
    — asserts the rejection mentions `gitleaks` AND does
    NOT mention upstream-network-phase phrases (resolve /
    refused / unreachable / upstream). Confirms gitleaks
    rejects BEFORE git-gate tries any upstream push.
  - Q5 (CI): try it, accept fallback. New chunk 6 adds a
    Gitea Actions job marked `continue-on-error: true` —
    runs the suite if the runner can host compose, doesn't
    block the workflow if docker-in-docker prevents it.

Three open questions remain (1: pipelock's actual DLP
coverage for non-body shapes; 3: realistic fake secret
shape vs. gitleaks regex; 6+7: backend-agnostic invocation
+ required tools — for the smolmachines work).
2026-05-26 22:04:46 -04:00
didericis 62f6716e8d docs(prd-0022): end-to-end sandbox-escape integration test
test / unit (pull_request) Successful in 19s
test / integration (pull_request) Successful in 1m9s
Draft a PRD for a composite integration test that brings up
a real bottle with a known allowlist + planted secret and
runs five attacks from inside the agent container:

  1. Request to non-allowlisted hostname
  2. Request to non-allowlisted IP (incl. host-header spoof)
  3. Secret exfil via HTTP — path / query / body / headers
  4. Secret exfil via crafted DNS subdomain
  5. Secret exfil via README link pushed through git-gate

Each attack passes only when blocked with a permissions
error. The suite is backend-agnostic — runs against
whatever CLAUDE_BOTTLE_BACKEND selects — so it becomes the
gate the upcoming smolmachines spike has to pass before that
backend can substitute for Docker.

Sized into 5 chunks (fixture → attacks 1+2 → attack 3 →
attack 4 → attack 5). Seven open questions called out,
biggest being: today's pipelock probably leaks via header /
path / query because DLP only scans bodies — the test will
expose this as a real gap (chunk 3 lands with
`expectedFailure` markers if so).
2026-05-26 21:52:24 -04:00
didericis e5316be454 docs(prd-0021): rewrite as standalone — no references to closed PR #48
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m6s
PR #48 closed; treat the implementation as starting from
main, where no tmux integration exists yet. The PRD now
describes the full design (including the `_in_tmux` detection
+ helper scaffolding) as fresh work. Sized into 4 chunks:
`claude_docker_argv` refactor → tmux helpers + pane state +
`_attach_to_bottle` dispatch → new-agent flow → stop +
indicator.

Same design as before — opt-in by `\$TMUX`, split-window-then-
respawn, falls back to handoff on tmux failure or missing
binary. No external references to PR #48.
2026-05-26 14:18:24 -04:00
didericis 8b8d668602 docs(prd-0021): dashboard as left tmux pane, selected agent as right pane
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m8s
Draft a PRD that tightens PR #48's tmux integration from
"one new window per attach" to "one persistent right pane that
the dashboard's selection drives." Inside tmux (`\$TMUX` set):
dashboard in the left pane; pressing Enter or `n` spawns
claude in the right pane via `tmux split-window` on first
attach, then `tmux respawn-pane` on subsequent attaches so the
operator-focused agent is always the visible one.

Outside tmux: falls back to today's handoff. Opt-in by
environment; no flag.

Sized into 4 chunks (pane state + create → respawn → stop
integration → supersede PR #48's new-window). Seven open
questions called out, the biggest being whether the dashboard
should auto-exec into a fresh tmux session when launched
outside one (v1 says no — operators start tmux themselves).
2026-05-26 14:14:02 -04:00
didericis 26322bdfd5 docs(prd-0020): record answers to open questions, switch to no-teardown-on-quit 2026-05-26 03:10:26 -04:00
didericis ec20293c0a docs(prd-0020): start + attach to agents from the dashboard
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m7s
Draft a PRD that turns the dashboard into the operator's single
surface — collapses today's two-terminal workflow (one for
`./cli.py start`, one for `./cli.py dashboard`) into a single
dashboard invocation that can spin up new agents, re-attach to
ones it already spun up, and explicitly stop them.

Picks the "handoff" mechanism from `docs/research/claude-code-
pane-in-dashboard.md` (curses.endwin → docker exec -it claude
→ stdscr.refresh) and crucially decouples the bottle's lifetime
from any single claude session: exit claude → back to dashboard
with the bottle still running; quit dashboard → tear down every
bottle the dashboard owns.

Sized into 5 chunks (refactor → picker + new-agent → re-attach
→ explicit stop → quit-cleanup). Seven open questions called
out, the biggest being modal-vs-drop-and-resume for the
preflight Y/N inside curses.
2026-05-26 02:59:42 -04:00
didericis 8cd867f3d2 docs(research): claude-code pane in the dashboard
test / integration (pull_request) Successful in 1m8s
test / unit (pull_request) Successful in 17s
test / unit (push) Successful in 17s
test / integration (push) Successful in 1m2s
Survey the three realistic ways to surface a claude-code session
inside the dashboard TUI:

  1. Handoff — drop curses, foreground claude, restore on exit
     (the existing `e`/`p` pattern, extended). Minimal code,
     side-by-time rather than side-by-side.
  2. Embedded emulator — own a PTY, parse claude-code's ANSI
     stream via `pyte`, paint it into a curses pane. Real
     "pane in the dashboard" but a six-week build with one new
     dep and several integration trap-doors (alt-screen, resize,
     input routing, multi-PTY state).
  3. External multiplexer — delegate pane creation to tmux /
     iTerm / wezterm when detected. Tiny code, but splits the
     operator's mental model and gives up layout control.

Recommendation: ship Option 1 first; defer Option 2 to "only if
Option 1 is observably insufficient"; treat Option 3 as a
niche augmentation for power users.

Calls out four followups worth verifying before committing
(PTY behavior at small sizes, attach-to-existing-exec, SIGWINCH
handling, `-it` vs `-i` for the embedded path).
2026-05-26 02:51:08 -04:00
didericis 9c9c32a941 docs(prd-0019): drop e/p fallback — selection-only, no-op otherwise
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m6s
When no agent is selected, `e` / `p` do nothing (status line
shows "no agent selected") rather than falling back to today's
global discover-and-prompt. The discover-and-prompt scaffolding
in `_operator_edit_routes_flow` / `_operator_edit_allowlist_flow`
comes out entirely — selection in the agents pane is now the
only way to scope an edit. Old open-question #4 (single-bottle
shortcut behavior in proposals-pane mode) is moot and removed.
2026-05-26 01:03:23 -04:00
didericis 9539982d3f docs(prd-0019): active agents in dashboard + agent-scoped edit verbs
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m3s
Draft a PRD that adds an "active agents" pane to the dashboard
TUI (below the existing proposals pane) and reshapes the operator
`routes edit` (e) / `pipelock edit` (p) verbs to be agent-scoped
when the cursor is in the agents pane — no more global discover
+ disambiguation prompt on every press. Tab toggles which pane
nav keys move through.

Sized into 4 chunks (discovery helper → render pane → selection
state → agent-scoped verbs). Six open questions called out, the
biggest being whether per-bottle `compose ps` on every 1s tick
scales for hosts with many bottles (answer leans toward one
label-filtered `docker ps`).
2026-05-26 00:58:34 -04:00
didericis 3386cabe62 docs(prd-0018): resolve TTY open question — keep exec -it
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m3s
2026-05-25 22:34:26 -04:00
didericis 3251ee1394 docs(prd-0018): one compose project per bottle instance
test / unit (pull_request) Successful in 16s
test / integration (pull_request) Successful in 1m3s
Draft a PRD that replaces the chain of per-sidecar docker SDK calls
in `claude-bottle start` with a single `docker compose` project per
instance. Each `state/<slug>/` dir gets a self-describing set of
artifacts: metadata.json, docker-compose.yml, compose.log, and the
existing transcript/ + live-config/.
2026-05-25 22:15:32 -04:00
didericis 9cd583fbbb feat(egress-proxy): retarget remediation at egress-proxy (PRD 0017 chunk 3)
test / unit (pull_request) Successful in 19s
test / integration (pull_request) Successful in 1m6s
Finishes PRD 0017. The `cred-proxy-block` MCP tool is renamed and
its remediation apply path is repointed at egress-proxy.

  - `claude_bottle/supervise.py` — `TOOL_CRED_PROXY_BLOCK` →
    `TOOL_EGRESS_PROXY_BLOCK`; `COMPONENT_FOR_TOOL` maps the new
    tool ID to `egress-proxy` for audit-log routing.

  - `claude_bottle/supervise_server.py` — tool definition renamed
    + description rewritten: "Call when egress-proxy refused your
    HTTPS request ... Read the current routes.yaml from /etc/
    claude-bottle/current-config/routes.yaml, compose a modified
    version, pass the full new file plus a justification." The
    syntactic validator dispatches on the new tool ID.

  - `claude_bottle/backend/docker/egress_proxy_apply.py` — renamed
    from `cred_proxy_apply.py`. Reads routes.yaml from
    /etc/egress-proxy/routes.yaml via `docker exec cat`; validates
    via `egress_proxy_addon_core.load_routes` (so both sides use
    the same parser); writes via `docker cp`; SIGHUPs egress-proxy
    with `docker kill --signal HUP`. `EgressProxyApplyError`
    replaces `CredProxyApplyError`.

  - `claude_bottle/cli/dashboard.py` — wires the new apply +
    `discover_egress_proxy_slugs` helper; the operator-initiated
    `routes edit <bottle>` verb now writes to egress-proxy with
    `.yaml` suffix. Stale follow-up comment about path-aware
    filtering removed — PRD 0017 settled that question.

  - `tests/integration/test_supervise_sidecar.py` — restores the
    approval round-trip test (chunk 2 had switched it to a reject
    path because no cred-proxy existed). Approval stubs
    `apply_routes_change` so the test focuses on the supervise
    queue/response plumbing rather than docker-exec into a real
    egress-proxy sidecar (that's covered separately).

  - `tests/unit/test_egress_proxy_apply.py` — rewritten against
    the new validator; covers JSON shape, missing routes key,
    partial-auth-pair rejection (the addon-core parser catches
    these before SIGHUP).

  - PRDs 0010 + 0014 — status headers updated to
    Superseded / Retargeted with a callout block pointing at PRD
    0017's migration section. Historical text preserved.

384 unit + integration tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 15:13:44 -04:00
didericis a79b2b7be0 docs(prd-0017): nest auth.scheme + auth.token_ref under optional auth
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m38s
Earlier draft had `auth_scheme: "none"` as the unauthenticated
signal — awkward sentinel. Nest the two credential-injection
fields under an optional `auth` key instead. Presence of the key
= authenticated; absence = unauthenticated. Empty `auth: {}` is
an error (omission is what means "no auth").

Touches: scope bullet, manifest example, mitmproxy addon
description's auth-handling step. Two trailing `auth_scheme:
"none"` references kept as historical context for what the new
shape replaces.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 13:35:47 -04:00
didericis b0d9802469 docs(prd-0017): pivot to mitmproxy-based egress-proxy
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m34s
Significant rewrite of PRD 0017 based on PR #25 design discussion.

Original draft proposed adding `path_allowlist` to the existing
cred-proxy. That bought opt-in path filtering for tools that
voluntarily routed through cred-proxy (Claude Code, git, npm) —
but raw `curl https://github.com/foo` from the agent goes to
HTTPS_PROXY=pipelock and bypasses cred-proxy entirely, so any
universal enforcement claim was a lie.

New design: replace cred-proxy with a mitmproxy-based egress-proxy
that becomes the agent's HTTP_PROXY/HTTPS_PROXY. Every agent
HTTP/HTTPS request flows through it before reaching pipelock.
Path-level allow/deny enforcement is universal because the proxy
is on every leg. The proxy also absorbs cred-proxy's credential
injection role (mitmproxy addon hooks request → strip + inject
Authorization).

Net sidecar count: unchanged. cred-proxy is replaced 1:1 by
egress-proxy. Pipelock stays as hostname allow + DLP downstream
of egress-proxy.

Decisions baked in per PR-#25 discussion:
- Tool: mitmproxy (designed for this; Python addons; well-maintained).
- CA custody: egress-proxy holds the per-bottle MITM CA key
  (concentration accepted; documented in trust-domain section).
- Migration: hard cutover. Existing `bottle.cred_proxy.routes[]`
  manifests fail-fast at load time with a pointer at this PRD.

Open questions retained for the implementation PRs: addon
distribution (bake vs mount), prefix-vs-glob match, double-strip
of Authorization between egress-proxy and pipelock, whether
pipelock keeps TLS interception or stays hostname-only post-cutover,
performance under two-MITM-hops.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 13:28:53 -04:00
didericis 5b925a6699 docs(prd-0017): path-aware egress filtering via cred-proxy
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m34s
Extends cred-proxy to filter (not just route) paths, including for
unauthenticated upstreams via a new `auth_scheme: "none"` mode and
`path_allowlist` field per route. Pipelock keeps its hostname
allowlist + DLP role; cred-proxy adds path-level enforcement for
routes that opt in.

Motivated by PR #25's follow-up note in _apply_pipelock_url: pipelock
2.3.0's api_allowlist is hostname-only, so approving pipelock-block
opens the entire host. For shared platforms (github.com, gitlab.com,
public registries) operators usually want narrower-than-host
granularity.

Draft status; open questions on match semantics, allow-route-with-
empty-allowlist edge case, and the eventual MCP tool shape for
agent-proposed path additions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 08:33:01 -04:00
didericis 5e8ca21669 docs: replace stale bash-first framing with Python-stdlib-first
test / unit (pull_request) Successful in 16s
test / integration (pull_request) Successful in 1m32s
The project started life as bash scripts and got rewritten to Python
(documented in docs/research/bash-vs-python-vs-go.md). Several docs
still carried the old "bash-first" framing — misleading for anyone
reading them now (8.7k lines of Python vs. ~130 lines of bash, all
in scripts/demo*.sh).

- CLAUDE.md "What this is" + "Conventions": orchestrator is Python,
  posture is stdlib-first.
- docs/prds/0010-cred-proxy.md, docs/research/manifest-format-and-
  grouping.md: quoted CLAUDE.md's old wording — re-quote.
- docs/research/built-in-supervisor-design.md, landscape-containerized-
  claude.md, agent-sandbox-landscape.md, pipelock-assessment.md,
  network-egress-guard.md: drop "bash-first" claims about the project,
  keep accurate descriptions of external tools' bash usage.

Leaves untouched: bash code-fence syntax in examples, README's
literal `bash scripts/demo.sh` invocation (the demo IS bash),
Claude Code's "Bash tool" references, IVIJL/devbox bash description
(that project actually is bash), and the bash-vs-python-vs-go
research note that records the rewrite decision.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 06:32:42 -04:00
didericis de87f21ff8 docs(prd-0016): capability block remediation
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m13s
Adds PRD 0016, the heaviest of the three remediation engines in the
stuck-agent recovery flow (overview in PRD 0012, foundation in PRD
0013). Wires the capability block path: rebuild orchestrator,
state-preservation helper, capability-block end-to-end. On approval
the orchestrator tears down the bottle, builds from the new
Dockerfile, and starts a replacement on the same branch via
state-preservation.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 05:15:32 -04:00
didericis 0197599e49 docs(prd-0015): pipelock block remediation
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m12s
Adds PRD 0015, the second remediation engine in the stuck-agent
recovery flow (overview in PRD 0012, foundation in PRD 0013). Wires
the pipelock block path with restart-based reload: supervisor writes
the new allowlist on approval and restarts pipelock, proactive
pipelock edit TUI verb, pipelock audit log filled in. SIGHUP reload
for pipelock is deferred to a follow-up.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 04:54:25 -04:00
didericis 76a9bd2586 docs(prd-0014): cred-proxy block remediation
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 45s
Adds PRD 0014, the first end-to-end remediation engine in the
stuck-agent recovery flow (overview in PRD 0012, foundation in PRD
0013). Wires the cred-proxy block path: SIGHUP-based hot reload of
routes.json on cred-proxy, supervisor write-on-approval, proactive
routes edit TUI verb, cred-proxy audit log filled in.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 04:37:09 -04:00
didericis 578363bea3 docs(prd-0013): supervise plane foundation
Adds PRD 0013, the shared foundation for the stuck-agent recovery flow
(overview in PRD 0012). Defines the MCP sidecar, the three tool
definitions, the proposal queue, the read-only current-config mount,
the minimal TUI, and the audit log format. Approval handlers are
deliberately no-ops; the actual remediations land in PRDs 0014, 0015,
and 0016.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 04:20:57 -04:00
didericis 4079678ceb docs(prd-0012): split into overview + 4 implementation PRDs
test / unit (push) Successful in 13s
test / integration (push) Successful in 22s
PRD 0012 becomes the cross-cutting overview (stuck categories taxonomy,
sidecar-vs-in-container rationale, implementation chunk pointers).
Implementation detail moves into four follow-on PRDs that 0012
references: 0013 (supervise plane foundation), 0014 (cred-proxy block
remediation), 0015 (pipelock block remediation), 0016 (capability
block remediation).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 04:19:50 -04:00
didericis 58acdcac87 docs(prd-0012): explain why the MCP server is a sidecar, not in-container
Captures the rationale for placing the MCP server outside the agent
container. The bottle wall doesn't strictly require it (the operator
TUI is the actual gate), but pattern consistency, audit metadata
trust, connection lifecycle, future enforcement headroom, and
pipelock cleanliness all argue for sidecar placement.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 04:19:50 -04:00
didericis 6e4bb3ba8d docs(prd-0012): switch /stuck to three structured MCP tool calls
Replaces the text-only /supervise/notify protocol with three MCP tools
the agent calls directly: cred-proxy-block, pipelock-block, and
capability-block. Each tool carries the agent's proposed config file
(routes.json, pipelock allowlist, or Dockerfile) plus a justification.
Adds a new MCP sidecar, a read-only current-config mount in the agent
container, and renames "capability gap" to "capability block" to match
the tool name. The text-only-vs-structured tradeoff is captured as an
Open question with pros/cons on both sides.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 04:19:50 -04:00
didericis 66fc29c72e docs(prd-0012): name the three stuck categories and add pipelock path
Introduces cred-proxy block, pipelock block, and capability gap as the
three named categories of stuck. Adds pipelock-edit support (restart-
based for v1) parallel to the existing cred-proxy routes-edit path,
plus a pipelock audit log. Broadens Goals to cover all three paths.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 04:19:50 -04:00
didericis a6222aaa57 docs(prd-0012): adopt text-only notify protocol + SIGHUP routes reload
Rewrites Scope, Proposed Design, Data model, and Open questions to
match the model where /supervise/notify is text-in/text-out, routes
edits + SIGHUP reload are supervisor-side tooling, and manifest
rebuilds are the heavy path. Adds the per-bottle routes-edit audit log.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 04:19:50 -04:00
didericis 4cce535008 docs(research): drop auto-respawn from the supervisor design
The autonomous "review comment → respawn bottle with comment as
next prompt" loop is the one feature that opens a prompt-injection
vector the bottle wall can't close (a public commenter would get
to issue instructions inside the agent's perimeter on every
launch). The available mitigations — commenter allowlists,
prompt-injection regex screens, private-repo defaults — are all
soft. The durable defense is to keep the human between the
review comment and any next agent prompt.

So `supervise` is now strictly notify-only. The `auto_respawn`
manifest field, the "with auto_respawn: true" behavior paragraph,
and the matching trust-model edge case all go. The reasoning
stays in the "Where to be conservative" bullet so the decision
isn't re-litigated later.
2026-05-25 04:19:50 -04:00
didericis afbb77b040 docs(research): built-in supervisor design (TUI + PR feedback) 2026-05-25 04:19:50 -04:00
didericis 1f9722ae27 docs(research): add Betterleaks switching analysis
test / unit (pull_request) Successful in 13s
test / integration (pull_request) Successful in 28s
2026-05-24 23:59:42 -04:00
didericis c33930290f docs(research): survey gitleaks dashboards + add baseline-file primitive
test / unit (pull_request) Successful in 13s
test / integration (pull_request) Successful in 24s
2026-05-24 23:54:46 -04:00
didericis a74dd2b97f docs: research on git-gate commit approval; link from PRD 0012
test / unit (pull_request) Successful in 12s
test / integration (pull_request) Successful in 22s
2026-05-24 23:39:17 -04:00
didericis 83756fa8c9 docs(prd-0012): open question for gitlock/pipelock exception flow
test / unit (pull_request) Successful in 12s
test / integration (pull_request) Successful in 22s
2026-05-24 23:12:55 -04:00
didericis b4c9e149b0 docs: add PRD 0012 — stuck-agent recovery flow
test / unit (pull_request) Successful in 12s
test / integration (pull_request) Successful in 22s
2026-05-24 23:10:30 -04:00
didericis afa8ca67a4 docs(prd-0011): drop the migration command requirement
test / unit (pull_request) Successful in 13s
test / integration (pull_request) Successful in 23s
claude-bottle has a single primary user today; an automated
JSON → MD migration tool is overkill. Hand-rewriting one file
is the migration cost. The resolver still dies with a pointer
at the README's manifest section if a stale claude-bottle.json
is found alongside no .claude-bottle/ directory, so the breaking
change isn't silent.

Drops: SC #6 (migration tool), the "Migration command" In Scope
sub-bullet, the migrate_manifest.py / cli wiring entries from
Existing code touched, the tests/integration/test_migrate_manifest.py
entry from Tests, the destructive-vs-additive open question.
Renumbers the remaining success criteria 6, 7 (formerly 7, 8).
Backward-compat section rewritten around hand-rewrite.
2026-05-24 21:46:22 -04:00
didericis 894bdea288 docs: add PRD 0011 — per-file Markdown manifest
test / unit (pull_request) Successful in 12s
test / integration (pull_request) Successful in 22s
Specs the implementation chosen in the PR #16 closing comment:
per-file MD-with-YAML-frontmatter layout for both bottles and
agents, with a hand-rolled YAML subset parser (no PyYAML).

Layout:
- $HOME/.claude-bottle/bottles/<name>.md   (home-only)
- $HOME/.claude-bottle/agents/<name>.md    (home agents)
- $CWD/.claude-bottle/agents/<name>.md     (repo-supplied agents)

The trust boundary that PRD-0011-v1 (closed PR #15) tried to
enforce in the resolver now falls out of filesystem layout —
$CWD/.claude-bottle/ has no bottles/ subdir, the loader doesn't
look there. Filesystem layout IS the enforcement.

Eight success criteria, including: stdlib-only (no new runtime
dep), idempotent migration command, agent files shaped close to
Claude Code's existing subagent spec so the same file can drop
into ~/.claude/agents/.

PRD-only; no implementation in this commit. PRD slot 0011 is
intentionally reused — the v1 file was never merged to main.
2026-05-24 21:39:58 -04:00