Commit Graph

38 Commits

Author SHA1 Message Date
didericis-claude 8f05226a4a docs(research): local ollama deployment, harness selection, and model sizing
test / unit (pull_request) Successful in 38s
test / integration (pull_request) Successful in 51s
2026-06-04 01:26:11 +00:00
didericis ae1531835d docs: drop "forge" jargon for concrete Gitea wording
test / integration (pull_request) Successful in 53s
test / integration (push) Successful in 57s
test / unit (pull_request) Successful in 33s
test / unit (push) Successful in 36s
We use Gitea, not an abstract forge. Reword the docs added in this
branch: "forge thread" -> "Gitea thread", and the research note's
generic "forge" -> "Gitea" / "hosting provider" as context demands,
keeping its portability argument coherent.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-28 23:05:02 -04:00
didericis 5c5f576df0 docs(research): add README describing research notes
Document what research notes are (opinionated investigations of a
question/design space), their unnumbered kebab-case naming, and their
loose verdict-first shape — explicitly freeform, not a template. Point
the AGENTS.md research line at it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-28 23:05:02 -04:00
didericis c840182d12 docs(research): issue tracking vs in-repo decision history
Analyze tracking feature requests in Gitea against the project's
in-repo PRDs/research notes, given the goal of keeping decision
history portable and not provider-locked. Recommends demoting issues
to an ephemeral inbox and reifying durable rationale into the repo.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-28 23:05:02 -04:00
didericis 7b4c1cd091 docs: drop "forge" jargon for concrete wording
test / unit (push) Successful in 28s
test / integration (push) Successful in 42s
test / unit (pull_request) Successful in 26s
test / integration (pull_request) Successful in 43s
We use Gitea, not an abstract forge. Reword the pre-existing research
and PRD docs: the generic "Forge-API gate"/"forge tokens" become
"Git-host-API gate"/"Git-host tokens" (the gate still spans Gitea /
GitHub / GitLab), "Git/forge history" -> "Git/Gitea history", and the
KNOWN_FORGE_HOSTS / forge: manifest-field examples -> KNOWN_GIT_HOSTS
/ git_host:. Meaning preserved; only the word "forge" is dropped.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-28 22:57:20 -04:00
didericis-codex 18e3b62b72 docs: rename CLAUDE.md to AGENTS.md and rebrand provider-agnostic
test / unit (pull_request) Successful in 28s
test / integration (pull_request) Successful in 40s
test / unit (push) Successful in 31s
test / integration (push) Successful in 44s
Delete CLAUDE.md in favor of AGENTS.md as the orientation doc, rebrand
the project from Codex-bottle to provider-agnostic bot-bottle, and
repoint every CLAUDE.md reference across PRDs, research notes, the
implementer agent example, and the yaml_subset comment.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-28 20:36:47 -04:00
didericis-codex cdb1870b1c docs(agent): clarify claude oauth env
test / unit (pull_request) Successful in 29s
test / integration (pull_request) Successful in 43s
2026-05-28 18:20:09 -04:00
didericis-codex c08b09dc9f refactor!: rename project to bot-bottle
Assisted-by: Codex
2026-05-28 17:56:14 -04:00
didericis 8cd867f3d2 docs(research): claude-code pane in the dashboard
test / integration (pull_request) Successful in 1m8s
test / unit (pull_request) Successful in 17s
test / unit (push) Successful in 17s
test / integration (push) Successful in 1m2s
Survey the three realistic ways to surface a claude-code session
inside the dashboard TUI:

  1. Handoff — drop curses, foreground claude, restore on exit
     (the existing `e`/`p` pattern, extended). Minimal code,
     side-by-time rather than side-by-side.
  2. Embedded emulator — own a PTY, parse claude-code's ANSI
     stream via `pyte`, paint it into a curses pane. Real
     "pane in the dashboard" but a six-week build with one new
     dep and several integration trap-doors (alt-screen, resize,
     input routing, multi-PTY state).
  3. External multiplexer — delegate pane creation to tmux /
     iTerm / wezterm when detected. Tiny code, but splits the
     operator's mental model and gives up layout control.

Recommendation: ship Option 1 first; defer Option 2 to "only if
Option 1 is observably insufficient"; treat Option 3 as a
niche augmentation for power users.

Calls out four followups worth verifying before committing
(PTY behavior at small sizes, attach-to-existing-exec, SIGWINCH
handling, `-it` vs `-i` for the embedded path).
2026-05-26 02:51:08 -04:00
didericis 5e8ca21669 docs: replace stale bash-first framing with Python-stdlib-first
test / unit (pull_request) Successful in 16s
test / integration (pull_request) Successful in 1m32s
The project started life as bash scripts and got rewritten to Python
(documented in docs/research/bash-vs-python-vs-go.md). Several docs
still carried the old "bash-first" framing — misleading for anyone
reading them now (8.7k lines of Python vs. ~130 lines of bash, all
in scripts/demo*.sh).

- CLAUDE.md "What this is" + "Conventions": orchestrator is Python,
  posture is stdlib-first.
- docs/prds/0010-cred-proxy.md, docs/research/manifest-format-and-
  grouping.md: quoted CLAUDE.md's old wording — re-quote.
- docs/research/built-in-supervisor-design.md, landscape-containerized-
  claude.md, agent-sandbox-landscape.md, pipelock-assessment.md,
  network-egress-guard.md: drop "bash-first" claims about the project,
  keep accurate descriptions of external tools' bash usage.

Leaves untouched: bash code-fence syntax in examples, README's
literal `bash scripts/demo.sh` invocation (the demo IS bash),
Claude Code's "Bash tool" references, IVIJL/devbox bash description
(that project actually is bash), and the bash-vs-python-vs-go
research note that records the rewrite decision.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 06:32:42 -04:00
didericis 4cce535008 docs(research): drop auto-respawn from the supervisor design
The autonomous "review comment → respawn bottle with comment as
next prompt" loop is the one feature that opens a prompt-injection
vector the bottle wall can't close (a public commenter would get
to issue instructions inside the agent's perimeter on every
launch). The available mitigations — commenter allowlists,
prompt-injection regex screens, private-repo defaults — are all
soft. The durable defense is to keep the human between the
review comment and any next agent prompt.

So `supervise` is now strictly notify-only. The `auto_respawn`
manifest field, the "with auto_respawn: true" behavior paragraph,
and the matching trust-model edge case all go. The reasoning
stays in the "Where to be conservative" bullet so the decision
isn't re-litigated later.
2026-05-25 04:19:50 -04:00
didericis afbb77b040 docs(research): built-in supervisor design (TUI + PR feedback) 2026-05-25 04:19:50 -04:00
didericis 1f9722ae27 docs(research): add Betterleaks switching analysis
test / unit (pull_request) Successful in 13s
test / integration (pull_request) Successful in 28s
2026-05-24 23:59:42 -04:00
didericis c33930290f docs(research): survey gitleaks dashboards + add baseline-file primitive
test / unit (pull_request) Successful in 13s
test / integration (pull_request) Successful in 24s
2026-05-24 23:54:46 -04:00
didericis a74dd2b97f docs: research on git-gate commit approval; link from PRD 0012
test / unit (pull_request) Successful in 12s
test / integration (pull_request) Successful in 22s
2026-05-24 23:39:17 -04:00
didericis da969a503d docs(research): manifest format + grouping options
test / unit (pull_request) Successful in 12s
test / integration (pull_request) Successful in 25s
Captures the two open questions surfaced by PRD 0011: should bottles and agents stay grouped in one file or split per file, and should the format stay JSON or move to YAML / MD-with-frontmatter.

Recommends per-file MD-with-frontmatter (with agents shaped close to Claude Code's subagent spec so they can drop into ~/.claude/agents/ as a side effect), explicitly flags the PyYAML runtime dependency as a user-decision crossing the project's "low deps by default" line, and leaves several other choices (hidden dotdir vs visible, migration tooling) as open questions.

Companion to docs/prds/0011-cwd-manifest-trust-boundary.md (which solves the trust problem at the resolver layer); this doc explores a structural alternative that would make the boundary self-documenting on disk.
2026-05-24 21:12:43 -04:00
didericis 00649d27e9 docs(research): add credential-proxy landscape and DLP-minimization framing
test / unit (push) Successful in 14s
test / integration (push) Successful in 29s
Consolidates oauth-token-exposure-to-claude.md and
tea-token-isolation-via-proxy.md into agent-credential-proxy-landscape.md,
adding a May-2026 survey of existing tools (Docker AI Sandboxes,
Cloudflare Sandbox Auth, Infisical Agent Vault, nono, Aembit, LiteLLM
CVE-2026-42208, Portkey, Helicone, etc.) and a build-vs-adopt verdict.

Adds secret-minimization-over-dlp.md explaining why pipelock's body
DLP and gitleaks's pre-receive scan cannot stop encoding/splitting
exfil, and why moving credentials out of the bottle (the git-gate
pattern, generalized) is the only robust answer.

Updates git-secret-scanning-hardening.md's reference to point at
the new consolidated landscape doc.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-12 23:25:12 -04:00
didericis 96d2c7b7a1 docs(research): add note on git secret-scanning as defense-in-depth
test / unit (push) Successful in 12s
test / integration (push) Successful in 15s
Threat-models the case where a credential ends up in a tracked
file and is git-pushed to a public remote — the secret is
compromised the instant the push lands (events API, scrapers),
not at merge time. Recommends gitleaks as the smallest-blast-
radius layer to add: Go binary, MIT, offline, scans full history,
hookable from the existing .githooks/.

No code or workflow change; just the research note.
2026-05-12 16:24:06 -04:00
didericis 6716f091c1 docs(prd): add 0006, enable pipelock's native TLS interception
test / unit (pull_request) Successful in 12s
test / integration (pull_request) Successful in 13s
Supersedes the abandoned PR #8 (`mitmproxy-tls-interception`),
which built a mitmproxy + addon chain on the (falsified) premise
that pipelock could not MITM. Empirical proof from the impl-time
spike: with `tls_interception: { enabled: true, ca_cert, ca_key }`
in pipelock's config, pipelock answered a credential POST over
HTTPS with `STATUS=403 / body: blocked: request body contains
secret: GitHub Token` and emitted both `scanner:"tls_intercept"`
and `scanner:"body_dlp"` events. Standalone, no second proxy.

Net change vs PR #8: one sidecar instead of two, no vendored
addon, no addon-verdict pattern matching, no HTTPS-trust /
DNS / lookup workarounds. Same end-state behavior — pipelock's
DLP fires on plaintext for HTTPS hosts in the allowlist.

Also cleaning up the now-stale TLS-research notes:

- `docs/research/tls-mitm-for-pipelock.md` is removed. Its
  entire premise (mitmproxy in front of pipelock) is moot now
  that pipelock does the work natively. The mechanics of CONNECT
  bumping and the CA-lifecycle considerations it documented are
  the same as what pipelock implements; the PRD restates the
  parts that matter for the integration.
- `docs/research/pipelock-assessment.md` had two stale claims
  corrected: the "Pipelock does not perform TLS inspection (no
  CA trust injection)" line in §Scope gaps and the
  "no TLS termination" cell in the comparison table. Both now
  point at the `tls_interception` config and `pipelock tls`
  CLI instead.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-12 14:15:44 -04:00
didericis 8e261563dc docs(research): TLS interception topologies for pipelock content scanning
test / unit (push) Successful in 14s
test / integration (push) Failing after 13s
Survey of TLS-MITM tools (mitmproxy, Squid+ssl_bump, Go libraries) and
five candidate topologies for adding TLS termination to the egress path
so pipelock's DLP, subdomain-entropy, and MCP scanners can fire on
plaintext bodies. Recommends mitmproxy in front of pipelock for v1
with a per-bottle ephemeral CA.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-12 11:41:34 -04:00
didericis b97807ac71 docs(research): evaluate smolmachines as VM backend
test / run tests/run_tests.py (push) Successful in 16s
Compares smolmachines against the six subsystems in
agent-vm-isolation.md. smolmachines replaces the microVM runtime,
network attachment (libkrun TSI with built-in DNS-over-vsock filter),
vsock control plane, and Python lifecycle wrapper. Pipelock stays;
disk-image story shifts to OCI + writable overlay. Recommends adopting
smolmachines as the macOS VM backend after smoke-testing TSI
passthrough to a host-side pipelock.
2026-05-11 16:32:04 -04:00
didericis aba9a823ba docs(research): document macOS agent VM isolation approach
Transcript-style notes on running an agent in a hardware-isolated
microVM on macOS. Covers Virtualization.framework / vfkit / libkrun
choices, hardware-isolation guarantees, driving VMs from Python
(subprocess or PyObjC), pipelock as the egress proxy, vsock for the
control channel, and egress enforcement via
VZFileHandleNetworkDeviceAttachment + gvisor-tap-vsock.
2026-05-11 16:31:40 -04:00
didericis 08159e1031 docs(research): survey AI-agent sandbox tools
test / run tests/run_tests.py (push) Successful in 19s
Compares claude-bottle to endo-familiar, litterbox, agent-safehouse,
matchlock, tilde.run, boxlite, microsandbox, and smolmachines. Covers
isolation primitive, locality, agent integration, network policy, and
maturity, and notes three borrowable ideas (per-use SSH confirmation,
in-flight secret injection, microVM backend) that fit the current
bash-first / local-Docker stance.
2026-05-11 15:56:23 -04:00
didericis 7e0e256370 docs: add research note on polish priorities to close the maturity gap
test / run tests/run_tests.py (push) Successful in 21s
Captures the ranked list of changes that would move the project from
"works for me" toward the perceived maturity of comparable tools —
onboarding friction, error messages, distribution, versioning, schema
validation, starter library, docs site, cross-platform CI. Includes
effort estimates and an explicit "what polish is not" section so the
roadmap doesn't drift into feature work.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 20:38:44 -04:00
didericis e1efc64862 docs: add research note on Apple container as an alternative backend
test / run tests/run_tests.py (push) Successful in 14s
Captures the surface area of the current Docker integration, how it
maps to Apple's `container` framework, the dominant networking risk
(pipelock multi-network attach), and the cost difference between a
faithful port and a simplified VM-firewall variant.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 02:36:11 -04:00
didericis 1e6f254db5 docs: add research note comparing bash, Python, and Go for the CLI
test / run tests/run_tests.py (push) Successful in 14s
Captures the reasoning for staying on Python, the conditions under which
a Go rewrite would pay for itself, and why bash isn't viable at the
project's current size.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 02:34:40 -04:00
didericis ec6261cd77 docs: add Fly Machines case study to remote-docker-vm-isolation note
test / run tests/run_tests.py (push) Successful in 13s
Concrete worked example covering image strategy (with the bake-the-
claude-bottle-image-in optimization that elides 30-90s of in-VM
build), cold/warm/hot boot-to-prompt timing, standby vs ephemeral
cost breakdown, three workflow patterns, and Fly-specific gotchas
(DinD kernel requirements, the y/N preflight blocking automated
launch, pricing-may-have-moved hedge).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 01:18:08 -04:00
didericis 43453c66ea docs: add research note on remote Docker VM as an isolation upgrade
test / run tests/run_tests.py (push) Successful in 15s
Argues that running claude-bottle unchanged on a remote Linux VM with
dockerd is the cheapest practical path to stronger isolation than
local Docker — preserves the v1 pipelock topology, requires zero code
changes, and shrinks the agent's blast radius from the developer
laptop to a disposable VM. Cross-references the existing
stronger-isolation-alternatives and local-vs-remote-agent-execution
notes so the research set composes cleanly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 01:07:17 -04:00
didericis 7986f2bd23 docs: add research note on stronger isolation alternatives
test / run tests/run_tests.py (push) Successful in 19s
Surveys gVisor, Kata, Firecracker, and Apple Container as replacements
or complements to Docker+runc, with concrete file-level migration notes
for this codebase and a recommended rung-by-rung path.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 00:38:46 -04:00
didericis cc5e772519 docs: replace stale .sh paths with claude_bottle/*.py equivalents
test / run tests/run_tests.py (push) Successful in 13s
Cleans up references to the pre-refactor bash layout (cli.sh,
lib/*.sh, scripts/*.sh) across README, Dockerfile, the pipelock PRD,
and research notes. Refreshes line numbers in the oauth-token note
against the current cli/start.py.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 00:27:25 -04:00
didericis 08597ebcf8 docs: add redundancy analysis to pipelock assessment
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-08 00:25:01 -04:00
didericis b36e6da0b3 docs: add research note assessing pipelock for egress/exfil control
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-08 00:15:11 -04:00
didericis c74bd5cf26 docs: add research note on multi-encoding secret exfil tripwires
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-08 00:00:51 -04:00
didericis bc7f506311 docs: add research note on isolating tea token via proxy
Investigates whether the Gitea `tea` CLI can be authenticated via a
header-injecting proxy so the token never enters the container — even as
an env var. Parallels the OAuth-token research note. Recommends an
in-container root-owned reverse proxy as the lowest-friction shape, and
flags the unavoidable tradeoff that the agent retains the token's full
API scope (no exfil ≠ no harm).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 23:30:06 -04:00
didericis edf79b3880 docs: add research note on container network egress guards
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 23:27:18 -04:00
didericis 7a38b8da23 docs: add research note on OAuth token exposure to claude
Walks the current `docker run -e CLAUDE_CODE_OAUTH_TOKEN` flow, why claude
can read the token trivially via its Bash tool, why no Linux primitive
hides an env var from its own process, and why a root-owned localhost
auth-injecting reverse proxy (paired with an egress allowlist) is the
realistic mitigation. Documents `ANTHROPIC_BASE_URL` caveats (SSE,
header passthrough, issue #36998, out-of-band traffic).
2026-05-07 23:24:39 -04:00
didericis 9b4ff29f49 docs: add research note on revoking Claude Code OAuth tokens
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 23:13:42 -04:00
didericis c45f384fb8 Initial commit 2026-05-07 22:45:36 -04:00