Commit Graph

119 Commits

Author SHA1 Message Date
didericis 2ea73e40a8 docs(decisions): ADR 0003 — system prompts stay user-directed
test / integration (pull_request) Successful in 41s
test / integration (push) Successful in 42s
test / unit (pull_request) Successful in 28s
test / unit (push) Successful in 26s
Record that we considered auto-generating an agent's system prompt from
its bottle's egress/git config (so it would know its access up front)
but opted to keep prompts operator-authored: we may want to withhold
that information from the agent directly, and the agent can infer its
access on its own regardless.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 00:40:19 -04:00
didericis ae1531835d docs: drop "forge" jargon for concrete Gitea wording
test / integration (pull_request) Successful in 53s
test / integration (push) Successful in 57s
test / unit (pull_request) Successful in 33s
test / unit (push) Successful in 36s
We use Gitea, not an abstract forge. Reword the docs added in this
branch: "forge thread" -> "Gitea thread", and the research note's
generic "forge" -> "Gitea" / "hosting provider" as context demands,
keeping its portability argument coherent.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-28 23:05:02 -04:00
didericis 5c5f576df0 docs(research): add README describing research notes
Document what research notes are (opinionated investigations of a
question/design space), their unnumbered kebab-case naming, and their
loose verdict-first shape — explicitly freeform, not a template. Point
the AGENTS.md research line at it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-28 23:05:02 -04:00
didericis d329e511fd docs: drop docs/INDEX.md, add PRD README with format
Remove the one-line docs/INDEX.md (its directory pointers are covered
by docs/README.md's "when to write which document" table). Add
docs/prds/README.md documenting the PRD naming, Status lifecycle, and
section format. Repoint the AGENTS.md repository-layout list at the
new READMEs and add the decisions/ dir.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-28 23:05:02 -04:00
didericis 1308e61c7e docs: hoist "when to write which document" to docs/README.md
Move the document-type comparison out of docs/decisions/README.md
(where it only surfaced if you were already in the decisions dir) up
to a new docs/README.md, renamed "When to write which document".
Leave a pointer from the decisions README.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-28 23:05:02 -04:00
didericis 2141a85884 docs(decisions): drop hand-maintained index from README
Per review on PR #97: an index that lists every ADR is a sync
burden. The files in docs/decisions/ are the index.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-28 23:05:02 -04:00
didericis ccbed97776 docs(prd): inline #88 rationale into PRD 0025
Add an "Alternatives considered" section enumerating the design
options from issue #88 (duplicate bottles / agent-side bottle_config
/ bottle-side extends) and why extends won, so the PRD stands without
the forge thread. Repoint the two phrases that depended on the #88
comment thread at the new section.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-28 23:05:02 -04:00
didericis 1df78ee77f docs(decisions): add ADR-lite decision log
Add docs/decisions/ with a convention README and back-fill two
decisions that previously had no in-repo home: merging PRs with
rebase (ADR 0001) and the agent-identity claimed-not-vouched trust
posture from PRD 0027 (ADR 0002). Point docs/INDEX.md at it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-28 23:05:02 -04:00
didericis c840182d12 docs(research): issue tracking vs in-repo decision history
Analyze tracking feature requests in Gitea against the project's
in-repo PRDs/research notes, given the goal of keeping decision
history portable and not provider-locked. Recommends demoting issues
to an ephemeral inbox and reifying durable rationale into the repo.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-28 23:05:02 -04:00
didericis 7b4c1cd091 docs: drop "forge" jargon for concrete wording
test / unit (push) Successful in 28s
test / integration (push) Successful in 42s
test / unit (pull_request) Successful in 26s
test / integration (pull_request) Successful in 43s
We use Gitea, not an abstract forge. Reword the pre-existing research
and PRD docs: the generic "Forge-API gate"/"forge tokens" become
"Git-host-API gate"/"Git-host tokens" (the gate still spans Gitea /
GitHub / GitLab), "Git/forge history" -> "Git/Gitea history", and the
KNOWN_FORGE_HOSTS / forge: manifest-field examples -> KNOWN_GIT_HOSTS
/ git_host:. Meaning preserved; only the word "forge" is dropped.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-28 22:57:20 -04:00
didericis 47c3ba63f8 docs(prd): mark merged PRDs as Active
test / unit (pull_request) Successful in 36s
test / integration (pull_request) Successful in 58s
test / integration (push) Successful in 54s
test / unit (push) Successful in 32s
Flip Status: Draft -> Active for the 23 PRDs whose work has shipped to
main (including 0027, now that PR #95 has merged). Leaves the
terminal-status PRDs unchanged: 0007 and 0010 (Superseded) and 0014
(Retargeted) were replaced, not shipped as-is.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-28 22:12:03 -04:00
didericis f9e3b6adda docs(prd): add PRD 0027 agent-level git user identity
test / unit (pull_request) Successful in 27s
test / integration (pull_request) Successful in 43s
Lift git.user (name/email) to the agent layer with a per-field
overlay onto the referenced bottle, mirroring the extends: merge.
git.remotes stays bottle-only. Includes identity provenance in
preflight/info and an example collapse.

Refs #94

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-28 20:58:00 -04:00
didericis-codex 18e3b62b72 docs: rename CLAUDE.md to AGENTS.md and rebrand provider-agnostic
test / unit (pull_request) Successful in 28s
test / integration (pull_request) Successful in 40s
test / unit (push) Successful in 31s
test / integration (push) Successful in 44s
Delete CLAUDE.md in favor of AGENTS.md as the orientation doc, rebrand
the project from Codex-bottle to provider-agnostic bot-bottle, and
repoint every CLAUDE.md reference across PRDs, research notes, the
implementer agent example, and the yaml_subset comment.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-28 20:36:47 -04:00
didericis-codex cdb1870b1c docs(agent): clarify claude oauth env
test / unit (pull_request) Successful in 29s
test / integration (pull_request) Successful in 43s
2026-05-28 18:20:09 -04:00
didericis-codex cacba087c9 docs(agent): document provider base bottles
test / unit (pull_request) Successful in 34s
test / integration (pull_request) Successful in 53s
Assisted-by: Codex
2026-05-28 18:00:38 -04:00
didericis-codex 1cbedc91c0 refactor(agent): use agent-neutral runtime names
Assisted-by: Codex
2026-05-28 17:59:24 -04:00
didericis-codex c08b09dc9f refactor!: rename project to bot-bottle
Assisted-by: Codex
2026-05-28 17:56:14 -04:00
didericis-codex 500fd910c4 feat(agent): add provider templates
test / unit (pull_request) Successful in 28s
test / integration (pull_request) Successful in 40s
Assisted-by: Codex
2026-05-28 02:18:53 -04:00
didericis-codex e03d90962d docs(prd): scaffold PRD 0026 — Agent Provider Templates
test / unit (pull_request) Successful in 27s
test / integration (pull_request) Successful in 45s
Assisted-by: Codex
2026-05-28 02:05:09 -04:00
didericis-codex 59ee32cc8d refactor(manifest): key git config by host
test / unit (pull_request) Successful in 33s
test / integration (pull_request) Successful in 42s
2026-05-28 00:49:34 -04:00
didericis-claude 4f7a506a9e docs(prd): 0025 — bottle composition via extends: (issue #88)
test / unit (pull_request) Successful in 27s
test / integration (pull_request) Successful in 40s
2026-05-27 23:27:04 -04:00
didericis-claude 7eda2a66ec feat(smolmachines): patch smolvm state DB to actually enforce per-bottle allowlist
test / unit (pull_request) Successful in 26s
test / integration (pull_request) Successful in 44s
Earlier commit framed this PR as "infrastructure landed, TSI
enforcement blocked on upstream smolvm 0.8.0." Found a clean
workaround that lets us enforce now.

Smolvm persists each machine's config (including
`allowed_cidrs`) as a JSON BLOB in
`~/Library/Application Support/smolvm/server/smolvm.db`,
`vms.data`. `machine create --allow-cidr X/32` silently writes
`allowed_cidrs: null` to that row when combined with `--from`,
but smolvm reads the row at `machine start` — so patching the
row between create and start sets the allowlist for real.

New `loopback_alias.force_allowlist(machine_name, cidrs)` opens
the SQLite DB, JSON-decodes the row, sets `allowed_cidrs`, and
writes back as BLOB (Text type silently corrupts smolvm's
later reads). launch.py calls it immediately after
`machine_create` and before `machine_start`.

Verified end-to-end on macOS / Docker Desktop:

  VM allowlist after start: ["127.0.0.16/32"]
  VM → 127.0.0.1:3000      → BLOCKED (Permission denied)
  VM → 8.8.8.8:53          → BLOCKED (Permission denied)
  VM → 127.0.0.16:<bundle> → CONNECTED

The DB-patch hack is correct only because smolvm reads
`allowed_cidrs` from the row at start time (not derived in-
process). When upstream honors `--allow-cidr` with `--from`,
the call becomes redundant — drop the call and the workaround
is gone.

Tests: 4 new for `force_allowlist` (BLOB round-trip; Linux
no-op; missing DB; missing row). Total 593 unit tests pass.

README + PRD updated to reflect the fix landed (no longer
"infrastructure pending upstream"). gitea#75 can close.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 16:55:03 -04:00
didericis-claude a919268d5e docs: honest framing of upstream smolvm 0.8.0 allowlist bug
test / unit (pull_request) Successful in 26s
test / integration (pull_request) Successful in 40s
PR #76 originally claimed the per-bottle alias scoping closed
gitea#75 ("agent can reach host loopback"). Verified
empirically that's not actually true: `smolvm 0.8.0 machine
create --from <smolmachine> --net --allow-cidr X/32` silently
drops the allowlist (`agent.config.json` shows `allowed_cidrs:
null`, and the running VM reaches all of `127.0.0.0/8`
regardless).

So the alias-allocation + alias-bind infrastructure is correct
pre-work, but the actual TSI enforcement is blocked on an
upstream smolvm bug. README + PRD 0023 + the module docstring
get reworded to say so plainly. gitea#75 stays open.

Workarounds tried (all dead-ends):
- `machine update --allow-cidr` doesn't exist
- stop-edit-`agent.config.json`-restart fails (smolvm removes
  the file on stop)
- `--smolfile` is mutually exclusive with `--from`
- `--image localhost:<port>/...` fails because smolvm's agent
  process can't reach host loopback during pull

When upstream lands a fix, our existing code (alias allocation,
port-bind, --allow-cidr in launch) will scope correctly without
further changes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 16:37:56 -04:00
didericis-claude 2edc1abb9a feat(smolmachines): per-bottle loopback alias scopes TSI to single /32
test / unit (pull_request) Successful in 27s
test / integration (pull_request) Successful in 41s
PR #74's Docker-Desktop fix routed the agent through
`127.0.0.1:<random>` loopback forwards, but TSI filters by IP
only — so the allowlist `127.0.0.1/32` let the agent VM reach
**any** host service on macOS loopback (postgres, dev servers,
other bottles' published ports, mDNSResponder, ...). Real
downgrade vs the docker backend's `--internal` network.

Resolution: per-bottle loopback alias.

- New `loopback_alias` module manages a pool of
  `127.0.0.16` .. `127.0.0.31` on `lo0`. macOS only routes
  `127.0.0.1` by default; the extras need `sudo ifconfig lo0
  alias`. `ensure_pool()` lazily adds the missing entries via
  one sudo prompt on first launch per reboot — aliases persist
  on `lo0` until reboot, so subsequent launches skip the
  prompt entirely.
- `allocate(slug)` picks the lowest-numbered unused alias by
  inspecting running bundle containers' port-binding HostIps.
  No on-disk reservation — docker is the source of truth.
- Bundle bringup binds published ports to the allocated alias
  (`docker run -p <alias>::<port>`) instead of `127.0.0.1`.
- TSI allowlist becomes the alias's /32 — narrows reachability
  to this bottle's bundle only.
- Linux native daemons share the host's network namespace;
  `127.0.0.0/8` works without aliases, so the module no-ops on
  non-Darwin and returns `127.0.0.1` from `allocate`.

Tracking issue closed: gitea/issues/75.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 16:23:17 -04:00
didericis-claude 45c821a8f3 docs(smolmachines): note loopback-scope limitation + tracking issue
test / unit (pull_request) Successful in 26s
test / integration (pull_request) Successful in 43s
PR #74's Docker-Desktop pivot widened the smolmachines TSI
allowlist from `<bundle-ip>/32` to `127.0.0.1/32` (TSI can't
filter by port, and docker bridge IPs aren't reachable from
macOS networking). The agent VM can therefore reach any service
on macOS's loopback while the bottle is running — not just the
bundle's published ports.

README gets a "Smolmachines backend" subsection under Quickstart
spelling this out as a known v1 limitation. PRD 0023 grows a new
open question #8 with the proposed v2 fix (per-bottle loopback
alias + TSI allowlist scoped to that /32, via sudo
`ifconfig lo0 alias`).

Tracking issue: gitea.dideric.is/didericis/claude-bottle/issues/75.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 15:58:30 -04:00
didericis-claude 1fa17d1822 feat(smolmachines): build agent image from repo Dockerfile (PRD 0023 chunk 4c)
test / unit (pull_request) Successful in 21s
test / unit (push) Successful in 21s
test / integration (push) Successful in 42s
test / integration (pull_request) Successful in 41s
Replaces the alpine:latest placeholder with a real claude-bottle
agent image, converted into a .smolmachine artifact via an
ephemeral local OCI registry.

Why the registry hop: smolvm pack create only accepts OCI registry
refs. Empirically it rejects docker-daemon://, oci-layout://,
docker-archive: tarballs, and every other transport tested — the
crane backend treats anything with a scheme prefix as a registry
hostname. To convert a locally-built docker image into a
.smolmachine we have to push it somewhere smolvm can pull from.
Smallest path: bring up registry:2.8.3 bound to 127.0.0.1:<random>,
docker tag + docker push into it, smolvm pack create --image
localhost:<port>/claude-bottle:<id>, tear down the registry.

The .smolmachine is cached under
~/.cache/claude-bottle/smolmachines/ keyed by the docker image ID
(first 16 hex chars of the sha256), so a Dockerfile change picks
up a new image ID and invalidates the cache. Unchanged rebuilds
skip the whole build → registry → pack pipeline.

This puts `docker build` in smolmachines prepare (the docker
backend defers it to launch). Necessary because pack_create needs
the image ID to derive the cache key, and prepare is the only
hook ahead of launch that runs once per slug.

Adds:
- claude_bottle/backend/docker/util.py: image_id / tag / push
  helpers (thin docker CLI wrappers).
- claude_bottle/backend/smolmachines/local_registry.py:
  ephemeral_registry() context manager; pins registry:2.8.3 by
  digest, binds 127.0.0.1::5000 (loopback-only), force-removes on
  exit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 13:51:02 -04:00
didericis-claude 5929caa219 docs(prd-0023): pivot to smolvm + TSI single-IP allowlist
test / unit (pull_request) Successful in 22s
test / integration (pull_request) Successful in 43s
Chunk-1's empirical spike against smolvm 0.8.0 contradicted the
research note that motivated the gvproxy network design: smolvm
exposes no virtio-net-over-unixgram attachment. The first draft's
"why gvproxy, not TSI" argument turns out to apply only to
`--outbound-localhost-only`, not to TSI generally.

New design:

- Bundle (PRD 0024) runs on a dedicated per-bottle docker bridge
  with a pinned IP. Smolfile sets `[network] allow_cidrs =
  ["<bundle-ip>/32"]` and nothing else. Agent can reach the bundle
  and nothing else — host loopback, LAN, public internet directly
  are all refused at the VMM (TSI) layer.
- Bind-address mitigation: egress binds 127.0.0.1:9099 inside the
  bundle (pipelock-internal); pipelock / git-gate / supervise
  bind 0.0.0.0 so the agent (across the TSI allowlist) can reach
  them. This is the port-granularity TSI's IP-only allowlist
  doesn't provide.
- Smolfile renderer rewritten in chunk 2 to smolvm 0.8.0's actual
  schema (image / entrypoint / cmd / env / [network] allow_cidrs).
  The chunk-1 renderer (name= / [[net]]= under the gvproxy
  design) emits the wrong shape and will be replaced.
- Drop gvproxy + VZFileHandleNetworkDeviceAttachment + the
  PyObjC fallback. Backend layout loses gvproxy_config.py,
  gvproxy.py, vfkit_attach.py.
- Acceptance plan adds an egress-port-bypass probe in addition
  to the localhost-reach probe.
- Chunks reshape: chunk 1 stays (renderer rewrite is part of
  chunk 2's cost); chunk 2 covers VM lifecycle + bundle + new
  Smolfile renderer; chunk 3 is the bundle bind-address change;
  chunks 4-5 unchanged in spirit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 03:47:03 -04:00
didericis bce1ea21db Merge pull request 'docs(prd-0023): smolmachines bottle backend' (#53) from prd-0023-smolmachines-backend into main
test / unit (push) Successful in 21s
test / integration (push) Successful in 40s
2026-05-27 02:16:11 -04:00
didericis 539234f29e refactor(sidecars): drop vestigial start/stop methods (PRD 0024 chunk 3)
test / unit (pull_request) Successful in 21s
test / integration (pull_request) Successful in 41s
Compose-up has owned per-container lifecycle since PRD 0018 ch3;
the .start() / .stop() methods on DockerPipelockProxy /
DockerEgress / DockerGitGate / DockerSupervise (and their
abstractmethod declarations in the four base ABCs) were already
documented as vestigial. With the bundle path in flight
(PRD 0024 ch2), they are truly dead — collapse to nothing.

Changes:
- Removed start/stop methods from the four DockerSidecar
  classes. Plan dataclasses, image/path constants,
  container-name helpers, and the .prepare() methods all stay
  (the renderer + apply path still need them).
- Removed the matching @abstractmethod declarations in the
  base ABCs so concrete subclasses don't have to stub them.
- launch.launch() and prepare.resolve_plan() no longer take
  proxy/git_gate/egress/supervise instance parameters. backend.py
  loses the four instance attributes it threaded through.
  prepare.resolve_plan() instantiates the four classes itself
  to call their .prepare() methods.
- Deleted four integration tests that only exercised the
  removed lifecycle: test_pipelock_sidecar_smoke,
  test_supervise_sidecar, test_git_gate_sidecar,
  test_git_gate_mirror.
- Dropped the .stop-idempotency case in test_orphan_cleanup;
  the network-cleanup cases stay (those test real production
  code).
- Marked test_pipelock_apply @skip pending chunk 4 — its
  bringup helper used .start; chunk 4 rewrites it with direct
  `docker run`.

Dockerfile deletion deferred to chunk 5 (when the bundle flag
default flips) — the legacy compose path still needs
Dockerfile.{egress,git-gate,supervise} until then.

Net: 708 lines removed, 80 added.

533 unit tests + 27 integration tests passing (5 skipped: the
chunk-4-pending case + existing GITEA_ACTIONS guards).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 01:01:10 -04:00
didericis 62109a1caf fix(sidecars): child death no longer tears down the bundle
test / unit (pull_request) Successful in 20s
test / integration (pull_request) Successful in 1m8s
Reverses chunk 1's "any unexpected child death tears down the
rest" policy. New behavior: a daemon dying is logged but does
NOT initiate shutdown — the surviving daemons keep running and
whatever the dead one served starts failing visibly on the
agent side. The supervisor exits only when (a) it receives
SIGTERM/SIGINT, or (b) every child has died on its own.

Eventual design is restart-the-dead-daemon plus a notification
to the supervise sidecar so the operator sees the event
explicitly; this commit ships only the "log and leave alone"
half. PRD 0024 open question 1 updated to reflect the new
intent.

Tests updated: replaced "crash propagates exit code via
auto-teardown" with three cases that exercise the new policy
(crash without shutdown leaves survivors up, crash-then-signal
surfaces the nonzero code, all-children-die-unattended still
converges the loop).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 00:19:50 -04:00
didericis 1894f621dd docs(prd-0024): consolidate per-bottle sidecars into a single bundle
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m11s
Replace pipelock + egress + git-gate + supervise as four
separate containers with one bundle image
(claude-bottle-sidecars) running all four daemons under a small
stdlib Python init supervisor. Compose file collapses from five
services to two; same daemons, same ports, same protocols, one
container.

Sized: bundle image + init → renderer collapse (feature-flagged)
→ backend Python trim → integration sweep → flag removal.

Prerequisite for PRD 0023 chunk 3 (smolmachines backend reuses
the same bundle as its sole host-side sidecar container).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 23:54:29 -04:00
didericis 4e00430c6e docs(prd-0023): consume PRD 0024's bundle as the single sidecar
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m11s
Replace the four host-side sidecar processes (pipelock + egress +
git-gate + supervise) with a single bundled container per bottle,
defined in PRD 0024 and consumed here. egress is internal to the
bundle as pipelock's upstream; only pipelock, git-gate, and
supervise are externally addressable, and only when the bottle
uses them.

gvproxy port_forwards collapse from one-per-process to one-per-
external-port, all pointing into the one bundle container.
Sizing: chunk 3 becomes "sidecar bundle lifecycle" and depends
on PRD 0024 having landed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 23:51:57 -04:00
didericis 041da1d7af docs(prd-0023): make gvproxy the network primitive; reject TSI
test / unit (pull_request) Successful in 19s
test / integration (pull_request) Successful in 1m9s
TSI's --outbound-localhost-only is permissive on all of
127.0.0.0/8 with no destination-port filter, so any host
loopback service (local Postgres, IDE plugins, another bottle's
sidecar) is reachable from the guest. That's the wrong default
for the malicious-agent threat model.

Reworked the network design around gvproxy + VFKT unixgram
attachment: the guest gets a virtio-net device, gvproxy is the
userspace TCP/IP stack on the host side, and the only thing
reachable from the guest is the explicit port-forward list
(typically just pipelock). Host LAN, host loopback, and the
public internet directly are gone by construction.

VMM choice (smolmachines vs PyObjC + Virtualization.framework)
is an open question contingent on whether libkrun's virtio-net
mode lets us point at a custom unixgram socket. Backend name
stays "smolmachines" either way per the original spec.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 23:41:32 -04:00
didericis a2ac124d5c docs(prd-0023): smolmachines bottle backend
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m7s
Specs a second concrete BottleBackend selectable via
CLAUDE_BOTTLE_BACKEND=smolmachines: per-agent libkrun microVM on
macOS, sidecars relocated to host-side loopback ports plumbed via
Smolfile env, PRD 0022's sandbox-escape suite as the acceptance
gate (the env-var flip is the only change required). Docker
backend ships unchanged and remains default.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 23:19:08 -04:00
didericis 1111ced04d docs(prd-0022): resolve remaining open Qs
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m7s
All seven open questions now have decisions baked in:

  - Q1 (HTTP-exfil scope): authoritative. Every shape MUST
    block; chunk 3 expands into remediation sub-PRDs if
    any of path/query/header leak today.
  - Q3 (fake secret): multiple shapes, parameterized.
    Three env vars (TEST_SECRET_ANTHROPIC, _AWS, _GENERIC);
    test 5 loops via subTest. Resilient to gitleaks rule
    renames.
  - Q6 (missing backend): die. `get_bottle_backend()`'s
    current behavior surfaces clearly; surprise-skips are
    worse than loud failures for new-backend branches.
  - Q7 (tool deps): preflight check. setUpClass runs
    `which curl && which git && which dig`; SkipTest with
    the missing list catches future backends shipping
    thinner base images.

Updated implementation chunks + test-5 sketch to match.
No remaining open questions.
2026-05-26 22:11:32 -04:00
didericis 73939861f9 docs(prd-0022): resolve open Qs 2, 4, 5 (DNS, gitleaks order, CI)
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m7s
User feedback:

  - Q2 (direct DNS resolver test): yes — test 4 grows a
    second sub-assertion verifying `dig @8.8.8.8` from the
    agent has no path out, alongside the existing
    crafted-subdomain check.
  - Q4 (gitleaks ordering): test 5 grows an ordering check
    — asserts the rejection mentions `gitleaks` AND does
    NOT mention upstream-network-phase phrases (resolve /
    refused / unreachable / upstream). Confirms gitleaks
    rejects BEFORE git-gate tries any upstream push.
  - Q5 (CI): try it, accept fallback. New chunk 6 adds a
    Gitea Actions job marked `continue-on-error: true` —
    runs the suite if the runner can host compose, doesn't
    block the workflow if docker-in-docker prevents it.

Three open questions remain (1: pipelock's actual DLP
coverage for non-body shapes; 3: realistic fake secret
shape vs. gitleaks regex; 6+7: backend-agnostic invocation
+ required tools — for the smolmachines work).
2026-05-26 22:04:46 -04:00
didericis 62f6716e8d docs(prd-0022): end-to-end sandbox-escape integration test
test / unit (pull_request) Successful in 19s
test / integration (pull_request) Successful in 1m9s
Draft a PRD for a composite integration test that brings up
a real bottle with a known allowlist + planted secret and
runs five attacks from inside the agent container:

  1. Request to non-allowlisted hostname
  2. Request to non-allowlisted IP (incl. host-header spoof)
  3. Secret exfil via HTTP — path / query / body / headers
  4. Secret exfil via crafted DNS subdomain
  5. Secret exfil via README link pushed through git-gate

Each attack passes only when blocked with a permissions
error. The suite is backend-agnostic — runs against
whatever CLAUDE_BOTTLE_BACKEND selects — so it becomes the
gate the upcoming smolmachines spike has to pass before that
backend can substitute for Docker.

Sized into 5 chunks (fixture → attacks 1+2 → attack 3 →
attack 4 → attack 5). Seven open questions called out,
biggest being: today's pipelock probably leaks via header /
path / query because DLP only scans bodies — the test will
expose this as a real gap (chunk 3 lands with
`expectedFailure` markers if so).
2026-05-26 21:52:24 -04:00
didericis e5316be454 docs(prd-0021): rewrite as standalone — no references to closed PR #48
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m6s
PR #48 closed; treat the implementation as starting from
main, where no tmux integration exists yet. The PRD now
describes the full design (including the `_in_tmux` detection
+ helper scaffolding) as fresh work. Sized into 4 chunks:
`claude_docker_argv` refactor → tmux helpers + pane state +
`_attach_to_bottle` dispatch → new-agent flow → stop +
indicator.

Same design as before — opt-in by `\$TMUX`, split-window-then-
respawn, falls back to handoff on tmux failure or missing
binary. No external references to PR #48.
2026-05-26 14:18:24 -04:00
didericis 8b8d668602 docs(prd-0021): dashboard as left tmux pane, selected agent as right pane
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m8s
Draft a PRD that tightens PR #48's tmux integration from
"one new window per attach" to "one persistent right pane that
the dashboard's selection drives." Inside tmux (`\$TMUX` set):
dashboard in the left pane; pressing Enter or `n` spawns
claude in the right pane via `tmux split-window` on first
attach, then `tmux respawn-pane` on subsequent attaches so the
operator-focused agent is always the visible one.

Outside tmux: falls back to today's handoff. Opt-in by
environment; no flag.

Sized into 4 chunks (pane state + create → respawn → stop
integration → supersede PR #48's new-window). Seven open
questions called out, the biggest being whether the dashboard
should auto-exec into a fresh tmux session when launched
outside one (v1 says no — operators start tmux themselves).
2026-05-26 14:14:02 -04:00
didericis 26322bdfd5 docs(prd-0020): record answers to open questions, switch to no-teardown-on-quit 2026-05-26 03:10:26 -04:00
didericis ec20293c0a docs(prd-0020): start + attach to agents from the dashboard
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m7s
Draft a PRD that turns the dashboard into the operator's single
surface — collapses today's two-terminal workflow (one for
`./cli.py start`, one for `./cli.py dashboard`) into a single
dashboard invocation that can spin up new agents, re-attach to
ones it already spun up, and explicitly stop them.

Picks the "handoff" mechanism from `docs/research/claude-code-
pane-in-dashboard.md` (curses.endwin → docker exec -it claude
→ stdscr.refresh) and crucially decouples the bottle's lifetime
from any single claude session: exit claude → back to dashboard
with the bottle still running; quit dashboard → tear down every
bottle the dashboard owns.

Sized into 5 chunks (refactor → picker + new-agent → re-attach
→ explicit stop → quit-cleanup). Seven open questions called
out, the biggest being modal-vs-drop-and-resume for the
preflight Y/N inside curses.
2026-05-26 02:59:42 -04:00
didericis 8cd867f3d2 docs(research): claude-code pane in the dashboard
test / integration (pull_request) Successful in 1m8s
test / unit (pull_request) Successful in 17s
test / unit (push) Successful in 17s
test / integration (push) Successful in 1m2s
Survey the three realistic ways to surface a claude-code session
inside the dashboard TUI:

  1. Handoff — drop curses, foreground claude, restore on exit
     (the existing `e`/`p` pattern, extended). Minimal code,
     side-by-time rather than side-by-side.
  2. Embedded emulator — own a PTY, parse claude-code's ANSI
     stream via `pyte`, paint it into a curses pane. Real
     "pane in the dashboard" but a six-week build with one new
     dep and several integration trap-doors (alt-screen, resize,
     input routing, multi-PTY state).
  3. External multiplexer — delegate pane creation to tmux /
     iTerm / wezterm when detected. Tiny code, but splits the
     operator's mental model and gives up layout control.

Recommendation: ship Option 1 first; defer Option 2 to "only if
Option 1 is observably insufficient"; treat Option 3 as a
niche augmentation for power users.

Calls out four followups worth verifying before committing
(PTY behavior at small sizes, attach-to-existing-exec, SIGWINCH
handling, `-it` vs `-i` for the embedded path).
2026-05-26 02:51:08 -04:00
didericis 9c9c32a941 docs(prd-0019): drop e/p fallback — selection-only, no-op otherwise
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m6s
When no agent is selected, `e` / `p` do nothing (status line
shows "no agent selected") rather than falling back to today's
global discover-and-prompt. The discover-and-prompt scaffolding
in `_operator_edit_routes_flow` / `_operator_edit_allowlist_flow`
comes out entirely — selection in the agents pane is now the
only way to scope an edit. Old open-question #4 (single-bottle
shortcut behavior in proposals-pane mode) is moot and removed.
2026-05-26 01:03:23 -04:00
didericis 9539982d3f docs(prd-0019): active agents in dashboard + agent-scoped edit verbs
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m3s
Draft a PRD that adds an "active agents" pane to the dashboard
TUI (below the existing proposals pane) and reshapes the operator
`routes edit` (e) / `pipelock edit` (p) verbs to be agent-scoped
when the cursor is in the agents pane — no more global discover
+ disambiguation prompt on every press. Tab toggles which pane
nav keys move through.

Sized into 4 chunks (discovery helper → render pane → selection
state → agent-scoped verbs). Six open questions called out, the
biggest being whether per-bottle `compose ps` on every 1s tick
scales for hosts with many bottles (answer leans toward one
label-filtered `docker ps`).
2026-05-26 00:58:34 -04:00
didericis 3386cabe62 docs(prd-0018): resolve TTY open question — keep exec -it
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m3s
2026-05-25 22:34:26 -04:00
didericis 3251ee1394 docs(prd-0018): one compose project per bottle instance
test / unit (pull_request) Successful in 16s
test / integration (pull_request) Successful in 1m3s
Draft a PRD that replaces the chain of per-sidecar docker SDK calls
in `claude-bottle start` with a single `docker compose` project per
instance. Each `state/<slug>/` dir gets a self-describing set of
artifacts: metadata.json, docker-compose.yml, compose.log, and the
existing transcript/ + live-config/.
2026-05-25 22:15:32 -04:00
didericis 9cd583fbbb feat(egress-proxy): retarget remediation at egress-proxy (PRD 0017 chunk 3)
test / unit (pull_request) Successful in 19s
test / integration (pull_request) Successful in 1m6s
Finishes PRD 0017. The `cred-proxy-block` MCP tool is renamed and
its remediation apply path is repointed at egress-proxy.

  - `claude_bottle/supervise.py` — `TOOL_CRED_PROXY_BLOCK` →
    `TOOL_EGRESS_PROXY_BLOCK`; `COMPONENT_FOR_TOOL` maps the new
    tool ID to `egress-proxy` for audit-log routing.

  - `claude_bottle/supervise_server.py` — tool definition renamed
    + description rewritten: "Call when egress-proxy refused your
    HTTPS request ... Read the current routes.yaml from /etc/
    claude-bottle/current-config/routes.yaml, compose a modified
    version, pass the full new file plus a justification." The
    syntactic validator dispatches on the new tool ID.

  - `claude_bottle/backend/docker/egress_proxy_apply.py` — renamed
    from `cred_proxy_apply.py`. Reads routes.yaml from
    /etc/egress-proxy/routes.yaml via `docker exec cat`; validates
    via `egress_proxy_addon_core.load_routes` (so both sides use
    the same parser); writes via `docker cp`; SIGHUPs egress-proxy
    with `docker kill --signal HUP`. `EgressProxyApplyError`
    replaces `CredProxyApplyError`.

  - `claude_bottle/cli/dashboard.py` — wires the new apply +
    `discover_egress_proxy_slugs` helper; the operator-initiated
    `routes edit <bottle>` verb now writes to egress-proxy with
    `.yaml` suffix. Stale follow-up comment about path-aware
    filtering removed — PRD 0017 settled that question.

  - `tests/integration/test_supervise_sidecar.py` — restores the
    approval round-trip test (chunk 2 had switched it to a reject
    path because no cred-proxy existed). Approval stubs
    `apply_routes_change` so the test focuses on the supervise
    queue/response plumbing rather than docker-exec into a real
    egress-proxy sidecar (that's covered separately).

  - `tests/unit/test_egress_proxy_apply.py` — rewritten against
    the new validator; covers JSON shape, missing routes key,
    partial-auth-pair rejection (the addon-core parser catches
    these before SIGHUP).

  - PRDs 0010 + 0014 — status headers updated to
    Superseded / Retargeted with a callout block pointing at PRD
    0017's migration section. Historical text preserved.

384 unit + integration tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 15:13:44 -04:00
didericis a79b2b7be0 docs(prd-0017): nest auth.scheme + auth.token_ref under optional auth
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m38s
Earlier draft had `auth_scheme: "none"` as the unauthenticated
signal — awkward sentinel. Nest the two credential-injection
fields under an optional `auth` key instead. Presence of the key
= authenticated; absence = unauthenticated. Empty `auth: {}` is
an error (omission is what means "no auth").

Touches: scope bullet, manifest example, mitmproxy addon
description's auth-handling step. Two trailing `auth_scheme:
"none"` references kept as historical context for what the new
shape replaces.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 13:35:47 -04:00
didericis b0d9802469 docs(prd-0017): pivot to mitmproxy-based egress-proxy
test / unit (pull_request) Successful in 18s
test / integration (pull_request) Successful in 1m34s
Significant rewrite of PRD 0017 based on PR #25 design discussion.

Original draft proposed adding `path_allowlist` to the existing
cred-proxy. That bought opt-in path filtering for tools that
voluntarily routed through cred-proxy (Claude Code, git, npm) —
but raw `curl https://github.com/foo` from the agent goes to
HTTPS_PROXY=pipelock and bypasses cred-proxy entirely, so any
universal enforcement claim was a lie.

New design: replace cred-proxy with a mitmproxy-based egress-proxy
that becomes the agent's HTTP_PROXY/HTTPS_PROXY. Every agent
HTTP/HTTPS request flows through it before reaching pipelock.
Path-level allow/deny enforcement is universal because the proxy
is on every leg. The proxy also absorbs cred-proxy's credential
injection role (mitmproxy addon hooks request → strip + inject
Authorization).

Net sidecar count: unchanged. cred-proxy is replaced 1:1 by
egress-proxy. Pipelock stays as hostname allow + DLP downstream
of egress-proxy.

Decisions baked in per PR-#25 discussion:
- Tool: mitmproxy (designed for this; Python addons; well-maintained).
- CA custody: egress-proxy holds the per-bottle MITM CA key
  (concentration accepted; documented in trust-domain section).
- Migration: hard cutover. Existing `bottle.cred_proxy.routes[]`
  manifests fail-fast at load time with a pointer at this PRD.

Open questions retained for the implementation PRs: addon
distribution (bake vs mount), prefix-vs-glob match, double-strip
of Authorization between egress-proxy and pipelock, whether
pipelock keeps TLS interception or stays hostname-only post-cutover,
performance under two-MITM-hops.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 13:28:53 -04:00
didericis 5b925a6699 docs(prd-0017): path-aware egress filtering via cred-proxy
test / unit (pull_request) Successful in 17s
test / integration (pull_request) Successful in 1m34s
Extends cred-proxy to filter (not just route) paths, including for
unauthenticated upstreams via a new `auth_scheme: "none"` mode and
`path_allowlist` field per route. Pipelock keeps its hostname
allowlist + DLP role; cred-proxy adds path-level enforcement for
routes that opt in.

Motivated by PR #25's follow-up note in _apply_pipelock_url: pipelock
2.3.0's api_allowlist is hostname-only, so approving pipelock-block
opens the entire host. For shared platforms (github.com, gitlab.com,
public registries) operators usually want narrower-than-host
granularity.

Draft status; open questions on match semantics, allow-route-with-
empty-allowlist edge case, and the eventual MCP tool shape for
agent-proposed path additions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 08:33:01 -04:00