docs(prd): add PRD 0049 — named/labelled agents

Draft PRD for prompting operators for a custom label and optional ANSI color at agent launch time, storing both in metadata.json, and surfacing the label (in color) in the dashboard's active-agents pane. Closes #171
refactor(backend): hoist guest_home to BottlePlan base
2026-06-03 21:38:38 -04:00 · 2026-06-03 21:38:13 -04:00 · 2026-06-03 21:38:13 -04:00 · 2026-06-03 21:38:13 -04:00 · 2026-06-03 21:38:13 -04:00 · 2026-06-03 21:38:13 -04:00
4 changed files with 760 additions and 438 deletions
@@ -6,96 +6,26 @@
 [![test](https://gitea.dideric.is/didericis/bot-bottle/actions/workflows/test.yml/badge.svg?branch=main)](https://gitea.dideric.is/didericis/bot-bottle/actions?workflow=test.yml)
-Run multiple Claude Code agents on your own machine, each scoped to its own secrets, skills, and egress allowlist.
+**Problem:** Developer wants to run a coding agent without supervision, but they don't want a prompt injected or misbehaving agent wrecking their environment or exfiltrating sensitive data.
-![pipelock and git-gate blocking exfil attempts against a live bottle](docs/demo.gif)
+**Solution:** Ephemeral, per agent "bottles" the agent cannot modify that scan all traffic for data exfiltration and limit capabilities and egress to only what the agent needs.
-Four prompts to the agent inside a real bottle:
+## Features
 claude replies to `hello there` — proof api.anthropic.com routes
 through pipelock's bumped TLS end-to-end;
 asked to GET a non-allowlisted host, the agent's curl gets 403 back
 from pipelock;
 asked to POST a credential-shaped body to an allowlisted host, the
 same 403 — pipelock's DLP body scanner caught it;
 asked to commit and push an AKIA-shaped key, git-gate's gitleaks
 pre-receive hook rejects the ref.
 Run it yourself with `bash scripts/demo.sh`.
-## Why "bot-bottle"?
+- **Per-bottle egress allowlist** — TLS-bumped HTTP/HTTPS chokepoint with a per-manifest host allowlist and request-body DLP scanner; DoH and arbitrary hosts blocked by default.
-
+- **Tokens the agent never sees** — host secrets live in a sidecar; the agent dials `http://sidecar:9099/<path>` and the proxy strips inbound `Authorization` and injects the real token before forwarding. `printenv` in the agent shows proxy URLs only.
-Each container is a bottle; Claude is the genie inside. The genie's
+- **Gitleaks-scanned push (git-gate)** — `bottle.git` remotes route through a per-bottle `git daemon` that gitleaks-scans incoming refs pre-receive and forwards clean refs upstream over SSH. The agent never holds the upstream credential.
-powers are exactly what the manifest grants it — a specific set of
+- **Manifest-scoped skills + secrets** — each bottle declares its skills, env, git identity, remotes, and egress routes; unknown keys die at load.
-skills, a specific set of secrets, and a specific set of hosts it can
+- **Trust boundary at `$HOME`** — bottles (credentials, egress, remotes) live only under `~/.bot-bottle/bottles/`. Repos may ship agents but not bottles, so a cloned repo can't redirect an env var to an attacker host.
-reach — nothing more. You uncork one bottle per agent
+- **Composable bottles (`extends:`)** — keep provider/runtime policy in one base bottle (e.g. `claude.md`) and overlay task bottles on top.
-(`./cli.py start <agent>`), many bottles run in parallel, and each is
+- **Parallel, isolated bottles** — each bottle is its own per-agent Docker `--internal` network; bottles don't share state or talk to each other.
-scoped to its task. When the session ends the bottle is destroyed and
+- **Provider templates (Claude, Codex)** — `Dockerfile.claude` / `Dockerfile.codex`, or a bottle-supplied Dockerfile. Claude auth via long-lived OAuth token; Codex via opt-in host device-auth forwarding.
-the genie does not persist.
+- **gVisor auto-detect** — on Linux hosts where `runsc` is registered with Docker, every bottle launches under it for a userspace syscall barrier; no manifest config required.
-
+- **Smolmachines backend (macOS)** — opt-in `BOT_BOTTLE_BACKEND=smolmachines` runs the agent in a libkrun micro-VM with the sidecar bundle still in Docker.
 ## Goals
 - Scope each agent to the minimum credentials and network egress its task actually needs
 - Run multiple agents in parallel, isolated from each other
 - Keep code, credentials, and agent activity on infrastructure I control — no third-party agent runtime
 ## Project status
 bot-bottle is a self-hosted secure runtime for AI coding agents.
 Each agent runs in an isolated container or micro-VM-backed bottle with
 scoped secrets, allowlisted egress, TLS-aware proxying, DLP checks, and
 a git-gate that withholds upstream credentials and scans pushes before
 forwarding. The project includes a documented threat model, PRD-driven
 development history, Docker and smolmachines backends, dashboard and
 remediation flows, and unit/integration tests covering exfiltration and
 sandbox escape scenarios.
 ## Security model
 Each agent runs in its own bottle: its own container, its own internal
 Docker network, and its own pipelock sidecar. Bottles don't share
 state, don't talk to each other, and only get the env vars, skills,
 SSH identities, and egress hosts the manifest grants them — nothing
 more. Any one agent only has the access it needs to do its job.
 The bottle limits both what an agent can see and where it can send
 it. Each bottle gets only the secrets and SSH identities the manifest
 grants it — a Gitea token but not a GitHub token, a deploy key but
 not a personal SSH key — so even a compromised or misbehaving agent
 only handles credentials it was already trusted with for its job.
 Egress flows through pipelock, which constrains where those
 credentials can travel: an agent with a Gitea token can reach
 `gitea.dideric.is`, not arbitrary attacker-controlled hosts. The same
 constraint blocks DNS-over-HTTPS as an exfil channel — a DoH resolver
 like `cloudflare-dns.com` would have to be on the allowlist for the
 agent to reach it at all. The container itself adds a layer between
 the agent and the host, but the v1 design leans more on secret
 minimization and egress allowlisting than on the container as a
 hardened boundary. On Linux hosts where [gVisor](https://gvisor.dev/)
 is registered with Docker, bot-bottle auto-detects it and launches
 every bottle under `runsc` for a userspace syscall barrier — no
 manifest configuration required. The broader v2 discussion lives in
 `docs/research/stronger-isolation-alternatives.md`.
 The egress proxy and OAuth-token handling below are the load-bearing
 pieces of v1.
 ## Architecture
-A bottle is two containers per agent: an `agent` container, and a
+A bottle is two containers per agent: an `agent` container, and a `sidecars` container that bundles pipelock + cred-proxy + git-gate + supervise behind a Python init supervisor. They share a per-agent Docker `--internal` network; the agent has no default route off-box.
 `sidecars` container that bundles pipelock + egress + git-gate +
 supervise behind a Python init supervisor (PRD 0024). They share a
 per-agent Docker `--internal` network; the agent has no default
 route off-box. All HTTP and HTTPS egress funnels through pipelock,
 where the egress allowlist, TLS interception, and request-body DLP
 scanner enforce the manifest before any byte leaves the host. The
 only egress that doesn't traverse pipelock is git-gate's SSH
 push/fetch to `bottle.git` upstreams — pipelock can't proxy SSH,
 so git-gate is its own L4-style egress path with gitleaks doing
 the pre-receive scan.
 The agent dials the bundle by the legacy short names (`pipelock`,
 `egress`, `git-gate`, `supervise`); the renderer registers those as
 docker-network aliases on the bundle so existing HTTPS_PROXY URLs
 and MCP endpoints resolve without an agent-side change.
 ```
                            host  ( ./cli.py )
@@ -104,26 +34,21 @@ and MCP endpoints resolve without an agent-side change.
                                  ▼
   ┌─────────────────────────── bottle ──────────────────────────────────┐
   │                                                                     │
-   │   ┌──────────────────┐                                              │
+   │   ┌──────────────────┐                   ┌──────────────┐           │
-   │   │ agent image      │  HTTPS_PROXY                                 │
+   │   │ agent image      │   HTTP(S) proxy   │ cred-proxy   │           │
-   │   │ (claude-code,    │ ────────────────────────┐                    │
+   │   │ (claude-code,    │ ─────────────────►│ (strips/inj  │           │
-   │   │  built locally)  │                         │                    │
+   │   │  codex, etc)     │                   │  Authoriz.)  │           │
-   │   │                  │   plain HTTP            │                    │
+   │   │                  │                   └──────┬───────┘           │
-   │   │ skills, env,     │  (token injection) ┌────▼─────────┐          │
+   │   │ environ: URLs    │                          │                   │
-   │   │ ~/.gitconfig,    │ ──────────────────►│ cred-proxy   │          │
+   │   │ only, no real    │                          ▼                   │
-   │   │ ~/.npmrc, tea    │                    │ (strips/inj  │          │
+   │   │ tokens           │                  ┌────────────────┐          │  HTTPS to
   │   │                  │                    │  Authoriz.)  │          │
   │   │ environ: URLs    │                    └─────┬────────┘          │
   │   │ only, no real    │     HTTPS_PROXY          │                   │
   │   │ tokens           │                          ▼                   │
   │   │                  │                  ┌────────────────┐          │  HTTPS to
   │   │                  │                  │ pipelock image │──────────┼──►  allowlisted
   │   │                  │                  │ (TLS bump, DLP │          │     hosts (incl.
   │   │                  │                  │  body scan,    │          │      cred-proxy
   │   │                  │                  │  allowlist)    │          │      upstreams)
   │   │                  │                  └────────────────┘          │
   │   │                  │                                              │
-   │   │                  │   git://         ┌────────────────┐          │  SSH push/fetch
+   │   │                  │    git proxy     ┌────────────────┐          │  SSH push/fetch
   │   │                  │ ────────────────►│ git-gate image │──────────┼──►  to bottle.git
   │   │                  │                  │ (gitleaks +    │          │      upstreams
   │   └──────────────────┘                  │  git daemon)   │          │     (direct — not
@@ -137,192 +62,25 @@ and MCP endpoints resolve without an agent-side change.
   └─────────────────────────────────────────────────────────────────────┘
 ```
- **agent image** — built from the provider template Dockerfile
+When the agent exits, `cli.py` tears down every sidecar and both networks; nothing about a bottle persists between runs.
  (`Dockerfile.claude` for Claude, `Dockerfile.codex` for Codex, or
  `agent_provider.dockerfile`) on first run; runs the selected agent
  CLI with the manifest-granted skills, env vars, and `~/.gitconfig`
  (the latter for the git-gate's `insteadOf` rules when `bottle.git`
  is set).
 - **pipelock image** — per-agent sidecar. Terminates the agent's
  outbound HTTP/HTTPS, enforces the resolved allowlist, runs DLP
  scanning. Design in `docs/prds/0001-per-agent-egress-proxy-via-pipelock.md`
  and `docs/prds/0006-pipelock-tls-interception.md`.
 - **git-gate image** — per-agent sidecar built on `zricethezav/gitleaks`
  (alpine + gitleaks + git-daemon + openssh-client). Runs
  `git daemon` over `git://` as a bidirectional mirror of each
  declared upstream. A pre-receive hook gitleaks-scans incoming
  refs and forwards clean refs to the real upstream over SSH; an
  access-hook runs `git fetch origin --prune` against the upstream
  before every upload-pack so an agent fetch returns whatever the
  upstream has *now* (fail-closed if unreachable). The agent's
  `~/.gitconfig` rewrites the real URL to the gate via `insteadOf`,
  so push, fetch, clone, and pull all route through. The agent
  never sees the upstream credential. Brought up only when
  `bottle.git` has entries. Design in `docs/prds/0008-git-gate.md`.
 - **cred-proxy image** — per-bottle sidecar (`python:3.13-alpine`
  base, stdlib-only) that holds API tokens declared in
  `bottle.cred_proxy.routes`. Each route names a `path`,
  `upstream`, `auth_scheme`, and `token_ref` (host env var); the
  agent dials `http://cred-proxy:9099<path>...` over plain HTTP
  and the proxy strips any inbound `Authorization`, injects
  `<auth_scheme> <token>` using the value held only in its own
  container's environ, and forwards to the real upstream over
  HTTPS. SSE responses stream back unbuffered. The cred-proxy's
  outbound HTTPS routes through pipelock (it trusts pipelock's
  per-bottle CA), so pipelock's egress allowlist + body scanner
  apply to cred-proxy traffic the same way they apply to direct
  agent traffic. Smart-HTTP push paths (`/git-receive-pack`,
  `/info/refs?service=git-receive-pack`) are refused at the
  proxy — push must go through `bottle.git` / git-gate where
  gitleaks runs. Optional per-route `role` tags drive agent-side
  rewrites: `anthropic-base-url`, `npm-registry`, `git-insteadof`,
  `tea-login`. The agent's `printenv` shows only proxy URLs —
  none of the real token values. Design in
  `docs/prds/0010-cred-proxy.md`.
 When the agent exits, `cli.py` tears down every sidecar that was
 brought up and the two networks; nothing about a bottle persists
 between runs.
 ## Quickstart
-Requires Docker on the host and a long-lived Claude Code OAuth token in
+Requires Docker on the host and a long-lived Claude Code OAuth token (`claude setup-token`) exported as `BOT_BOTTLE_CLAUDE_OAUTH_TOKEN`.
 your shell env.
 ```sh
 ./cli.py start <agent>   # builds the image on first run, drops you into claude
 ```
 The container is removed automatically when the session ends. If the script
 is killed with SIGKILL the exit trap won't fire and the container may be
 left running; remove it with `docker rm -f <container-name>`.
 ### Smolmachines backend (experimental, macOS-only)
 A second backend runs the agent in a smolvm micro-VM (libkrun) with the
 sidecar bundle still in Docker. Selected via
 `BOT_BOTTLE_BACKEND=smolmachines ./cli.py start <agent>`. Requires
 `smolvm` on PATH (`curl -sSL https://smolmachines.com/install.sh | sh`).
 The integration tests run against whichever backend the env var
 selects and skip cleanly when its prerequisites are missing.
 **One-time sudo on first launch (macOS):** smolmachines bottles
 each reserve a loopback alias from a pool (`127.0.0.16` ..
 `127.0.0.31`) and bind their bundle's port-forwards to it; the
 first `./cli.py start` after each reboot prompts for sudo to add
 missing aliases via `ifconfig lo0 alias`. Aliases persist until
 reboot; subsequent launches don't prompt. The agent's TSI
 allowlist is the alias's `/32`, so each bottle can only reach
 its own bundle's published ports — not other bottles' ports,
 not other host loopback services (postgres, dev servers, etc.).
 This enforcement requires a workaround for a smolvm 0.8.0 bug:
 the CLI's `--allow-cidr` flag is silently dropped when combined
 with `--from <smolmachine>`. The launcher patches smolvm's
 persistent state DB
 (`~/Library/Application Support/smolvm/server/smolvm.db`)
 directly between `machine create` and `machine start` to set
 the allowlist. The hack falls away automatically when smolvm
 honors the flag upstream — see the `loopback_alias` module's
 docstring for the investigation trail.
 ## Manifest
-Bottles and agents live as Markdown files with YAML frontmatter under
+Bottles and agents are Markdown files with YAML frontmatter under `~/.bot-bottle/`. The Markdown body is the system prompt. Bottles live in `~/.bot-bottle/bottles/`; agents may also be shipped by a repo at `<repo>/.bot-bottle/agents/<name>.md`.
 `~/.bot-bottle/`. Each bottle is one file in `bottles/`, each agent
 is one file in `agents/`:
-```
+**Bottle** (`~/.bot-bottle/bottles/gitea-dev.md`):
 ~/.bot-bottle/
 ├── bottles/
 │   ├── dev.md
 │   └── gitea-dev.md
 └── agents/
    ├── implementer.md
    └── researcher.md
 ```
 The filename (without `.md`) is the entity's name. Filenames must
 match `[a-z][a-z0-9-]*`; files that don't are skipped with a warning.
 A repo can ship its own agent files alongside its code at
 `<repo>/.bot-bottle/agents/<name>.md`. Those agents reference
 bottles defined in `~/.bot-bottle/bottles/` (the only place
 bottles can come from); a `bottles/` subdir in a repo is ignored
 with a warning. **This is the trust boundary**: bottle infrastructure
 — credentials, egress allowlists, git remotes — comes from your home
 directory only. A cloned repo cannot redirect a host env var to an
 attacker-named upstream because it has no way to declare a bottle.
 ### Bottle composition with `extends:`
 A bottle can inherit from another via `extends: <bottle-name>` so
 operators don't have to duplicate a whole bottle file to vary one
 field (PRD 0025). The parent's resolved config is the base; the
 child's declared fields overlay. Merge rules:
 - `env:` — dict merge, child wins on key collision.
 - `git.user:` — per-field overlay (child's non-empty `name` /
  `email` wins; empty falls through to parent).
 - `git.remotes:` — dict merge by host, child wins on host collision.
  An explicit `git.remotes: {}` clears the parent's remotes; omitting
  `git.remotes` inherits the parent's remotes.
 - `agent_provider:`, `egress:`, `supervise:` — full replace when the
  child declares the field.
 ```yaml
 ---
 extends: dev          # inherit everything from bottles/dev.md
 egress:
  routes:
    - host: staging.example.com
      auth:
        scheme: Bearer
        token_ref: STAGING_TOKEN
 ---
 ```
 Cycles (`A extends B extends A`), self-references, and missing
 parents die at parse with a clear pointer. Bottles remain
 `$HOME`-only — `extends:` preserves the trust boundary above.
 ### Provider base bottles
 Keep provider/runtime policy in one home-owned base bottle, then have
 task bottles extend it. That keeps provider egress/auth in one place
 without hiding security-relevant routes behind `agent_provider.template`.
 For example, `~/.bot-bottle/bottles/claude.md` can hold the Claude
 provider selection and Anthropic API egress:
 ````markdown
 ---
-agent_provider:
+extends: claude          # inherit the Claude provider boundary
  template: claude
 egress:
  routes:
    - host: api.anthropic.com
      role: claude_code_oauth
      auth:
        scheme: Bearer
        token_ref: BOT_BOTTLE_CLAUDE_OAUTH_TOKEN
      pipelock:
        tls_passthrough: true
 ---
 Common Claude provider boundary.
 ````
 Task bottles can then inherit that provider boundary and add their own
 env/git configuration without repeating the Claude route.
 ### Example bottle (`~/.bot-bottle/bottles/gitea-dev.md`)
 ````markdown
 ---
 extends: claude
 env:
  GIT_AUTHOR_NAME: didericis
@@ -337,148 +95,7 @@ git:
      Upstream: ssh://git@gitea.dideric.is:30009/didericis/bot-bottle.git
      IdentityFile: /Users/didericis/.ssh/id_ed25519_gitea
      KnownHostKey: ssh-ed25519 AAAA...
 ---
 The `gitea-dev` bottle. Backs my work on personal projects: provider
 auth through egress and gitea.dideric.is over SSH.
 ````
 For a Codex-backed base bottle, set `agent_provider.template: codex`.
 The Codex template expects ChatGPT/device login state instead of an
 `OPENAI_API_KEY` env var; no API-key placeholder is forwarded into the
 agent. To let bot-bottle read the host's current Codex ChatGPT access
 token and inject it from egress only for Codex's API calls, opt in
 explicitly:
 ```yaml
 agent_provider:
  template: codex
  forward_host_credentials: true
 egress:
  routes:
    - host: auth.openai.com
      path_allowlist:
        - /api/accounts/deviceauth/
 ```
 Run `codex login --device-auth` on the host before launch. The
 launcher reads `tokens.access_token` from the host's
 `~/.codex/auth.json`, verifies it is fresh user/device auth, and passes
 it to the sidecar's `EGRESS_TOKEN_N` env slot. The agent container gets
 a dummy `~/.codex/auth.json` that preserves the host auth-mode shape
 but replaces credential values with placeholders. It keeps the selected
 ChatGPT account id so Codex sends requests for the same account while
 egress owns the real bearer token. The agent never receives real access
 tokens, refresh tokens, or `OPENAI_API_KEY`. The effective egress table
 automatically adds or upgrades `api.openai.com` and `chatgpt.com` to
 authenticated routes when `forward_host_credentials` is true.
 The built-in Codex template uses `Dockerfile.codex`; set
 `agent_provider.dockerfile` to build the agent from a custom Dockerfile
 while keeping the bot-bottle sidecars in place.
 ### Example agent (`~/.bot-bottle/agents/gitea-helper.md`)
 ````markdown
 ---
 bottle: gitea-dev
 skills:
  - init-prd
 git:
  user:
    name: gitea-helper
    email: eric+gitea-helper@dideric.is
 ---
 You help maintain Gitea-hosted projects.
 ````
 The agent's Markdown body is its system prompt (whitespace
 stripped). The frontmatter declares the bottle to launch in and any
 skills to mount. You can also include Claude Code subagent fields
 (`name`, `description`, `model`, `color`, `memory`) in the
 frontmatter — bot-bottle ignores them at launch but doesn't
 reject them, so the same file can drop into `~/.claude/agents/` as a
 Claude Code subagent.
 An agent may also declare `git.user` (`name` / `email`). It overlays
 the referenced bottle's `git.user` per-field — the agent's non-empty
 fields win, the rest fall through to the bottle — so two agents can
 share one bottle and still commit under distinct identities without
 an identity-only bottle (PRD 0027). Only `git.user` is allowed at the
 agent level; `git.remotes` stays bottle-only because it carries
 credentials and host trust. The launch preflight and `cli.py info`
 print the effective identity annotated `(agent)` / `(bottle)` so you
 can see where each field came from. Git authorship is not a
 credential — push auth is the bottle's remote key/token — so a
 repo-shipped agent setting its own identity grants no access; treat
 an agent identity as *claimed, not vouched*.
 Unknown top-level frontmatter keys die at load with a "did you mean"
 pointer; typos don't silently ghost into an empty config.
 The YAML subset the frontmatter accepts is bounded (flat keys,
 strings / ints / true-or-false bools / null / lists / one-level
 nested dicts). Anchors, multi-line block scalars, tags, and
 ambiguous bare strings (`yes` / `NO` / `2026-05-24` /
 `0x...`) all die with a clear pointer at the spec — quote your
 strings when in doubt. The full schema lives in
 `bot_bottle/yaml_subset.py` (~450 lines, stdlib-only, no PyYAML).
 Working examples live under `examples/`. Pipelock's design lives in
 `docs/prds/0001-per-agent-egress-proxy-via-pipelock.md` and the
 rationale in `docs/research/pipelock-assessment.md`. The trust
 boundary rationale lives in `docs/prds/0011-per-file-md-manifest.md`.
 ## Auth: Claude OAuth token, not API key
 Bottles that use `agent_provider.template: claude` authenticate
 `claude` inside the container with the same Pro/Max subscription you
 already use on the host, via a long-lived OAuth token. No
 `ANTHROPIC_API_KEY` is needed.
 **Why a token instead of mounting `~/.claude.json`:** on macOS, Claude
 Code stores OAuth credentials in the encrypted Keychain, not in
 `~/.claude.json`. Mounting that file into a Linux container does not
 carry the credentials with it. Linux hosts keep credentials in
 `~/.claude/.credentials.json`, but to keep the launcher portable
 bot-bottle uses the env-var path on every host.
 **One-time setup on the host:**
 ```sh
 claude setup-token   # browser login, prints a ~1-year OAuth token
 ```
 Stash the token in your shell env (e.g. `~/.zshrc` or a secret manager)
 as `BOT_BOTTLE_CLAUDE_OAUTH_TOKEN`:
 ```sh
 export BOT_BOTTLE_CLAUDE_OAUTH_TOKEN="<token>"
 ```
 The Claude bottle reaches the Anthropic API only through the cred-proxy
 sidecar. To let `claude` authenticate, declare an egress route with
 `role: claude_code_oauth` and
 `token_ref: BOT_BOTTLE_CLAUDE_OAUTH_TOKEN`:
 ```yaml
 egress:
  routes:
    - host: api.anthropic.com
      role: claude_code_oauth
      auth:
        scheme: Bearer
        token_ref: BOT_BOTTLE_CLAUDE_OAUTH_TOKEN
      pipelock:
        tls_passthrough: true
 ```
 Routes that resolve to private or Tailscale addresses can opt into
 pipelock's SSRF destination allowlist explicitly:
 ```yaml
 egress:
  routes:
    - host: gitea.dideric.is
@@ -486,38 +103,31 @@ egress:
        scheme: token
        token_ref: BOT_BOTTLE_GITEA_TOKEN
      pipelock:
-        ssrf_ip_allowlist:
+        ssrf_ip_allowlist: [100.78.141.42/32]
-          - 100.78.141.42/32
+---
 ```
-At launch, `cli.py` reads `BOT_BOTTLE_CLAUDE_OAUTH_TOKEN` from the host
+The `gitea-dev` bottle. Provider auth via the inherited Claude route;
-env and forwards it into the cred-proxy container's environ — never
+gitea over SSH for push, token over HTTPS for the API.
-into the agent's. The agent receives `ANTHROPIC_BASE_URL` pointing at
+````
 `http://cred-proxy:9099/anthropic` and a non-secret placeholder for
 `CLAUDE_CODE_OAUTH_TOKEN` (claude-code refuses to start without one;
 the proxy strips and replaces the header on every request). `printenv`
 inside the agent does not surface the real token, and the value is
 never written to disk or placed on argv on the host.
-A Claude bottle without a `claude_code_oauth` route has no path to the
+**Agent** (`~/.bot-bottle/agents/gitea-helper.md`):
-Anthropic API — there is no fallback that forwards the token directly
+
-to the agent. Caveats: the token is bound to your subscription tier
+````markdown
-(Pro/Max/Team/Enterprise), it does not work with `claude --bare`
+---
-(which only reads `ANTHROPIC_API_KEY`), and if it leaks, regenerate
+bottle: gitea-dev
-via `claude setup-token` again. Reference:
+skills:
-<https://code.claude.com/docs/en/authentication>.
+  - init-prd
 ---
 You help maintain Gitea-hosted projects.
 ````
 More examples in `examples/`. Full design lives under `docs/prds/`; the trust-boundary rationale is in `docs/prds/0011-per-file-md-manifest.md`.
 ## Trademarks
-bot-bottle is an independent project and is not affiliated with,
+bot-bottle is an independent project and is not affiliated with, endorsed by, or sponsored by Anthropic, PBC. "Claude" and "Claude Code" are trademarks of Anthropic, PBC; the project name uses "claude" descriptively to indicate that the tool runs Claude Code inside a sandbox.
 endorsed by, or sponsored by Anthropic, PBC. "Claude" and "Claude
 Code" are trademarks of Anthropic, PBC; the project name uses
 "claude" descriptively to indicate that the tool runs Claude Code
 inside a sandbox.
 ## License
-Copyright 2026 Eric Bauerfeld
+Copyright 2026 Eric Bauerfeld. Licensed under the Apache License, Version 2.0. See [LICENSE](LICENSE) for the full text.
 Licensed under the Apache License, Version 2.0. See [LICENSE](LICENSE)
 for the full text.
@@ -0,0 +1,283 @@
 # PRD 0049: Named / Labelled Agents
 - **Status:** Draft
 - **Author:** didericis
 - **Created:** 2026-06-03
 - **Issue:** #171
 ## Summary
 At agent launch time, prompt the operator for a short human-readable label
 (defaulting to the manifest agent key) and an optional color from the 16-color
 ANSI palette. Store both in the bottle's `metadata.json`. Display the label —
 rendered in the chosen color — in the dashboard's active-agents pane, replacing
 the bare manifest key. Inject the label and color into the in-container
 `claude.json` as `name` / `color` so Claude Code can surface them in its own
 harness when upstream support lands.
 ## Problem
 The dashboard's agents pane identifies each running instance by its manifest
 agent key (e.g., `implementer`) plus a random slug suffix. When an operator
 runs three `implementer` bottles simultaneously — one each for three different
 repos — the pane shows:
 ```
  [docker] a3f9  implementer  started 14:02:11  [egress,pipelock]
  [docker] b81c  implementer  started 14:03:45  [egress,pipelock]
  [docker] d220  implementer  started 14:05:01  [egress,pipelock]
 ```
 There is no way to tell which bottle is working on which task without attaching
 to each one in turn. The slug is opaque; the manifest key is shared. Operators
 working a multi-bottle session resort to keeping a mental map of slug→task,
 which breaks the moment they switch windows.
 ## Goals / Success Criteria
 1. After the operator selects an agent name (dashboard picker or CLI argument),
   they are prompted for a label. The prompt suggests the manifest key as the
   default; pressing Enter (or providing no input) accepts it. The label may
   contain any printable characters up to 64 bytes.
 2. After the label prompt, the operator is optionally prompted for a color from
   the 16-color ANSI palette (names: `black`, `red`, `green`, `yellow`, `blue`,
   `magenta`, `cyan`, `white`, `bright-black`, `bright-red`, `bright-green`,
   `bright-yellow`, `bright-blue`, `bright-magenta`, `bright-cyan`,
   `bright-white`). Pressing Enter without a selection skips color entirely.
 3. `label` and `color` are stored in `BottleMetadata` and written to the
   bottle's `metadata.json`. Both fields default to `""` (empty / unset).
 4. `ActiveAgent` carries `label` and `color`; `enumerate_active()` reads them
   from `metadata.json`.
 5. `_format_agent_row` uses the label when non-empty (falling back to
   `agent_name`). If a non-empty color is set and the terminal supports it, the
   label substring is rendered in that color.
 6. `BottleSpec` carries `label` and `color`; the docker backend's `prepare`
   step copies them into `BottleMetadata`.
 7. `agent_provider.py` writes `label` → `"name"` and `color` → `"color"` into
   the generated `claude.json`, alongside the existing fields. Fields are
   omitted when empty.
 8. The dashboard's `_new_agent_flow` (PRD 0020) includes the label+color step
   between agent selection and the backend picker.
 9. `cmd_start` (CLI) includes the label+color step after argument validation
   and before prepare-with-preflight.
 10. All existing unit tests stay green; no new tests are required for this
    change (the label/color fields are thin plumbing with no branching logic
    worth unit-testing beyond the already-tested metadata read/write path).
 ## Non-goals
 - Showing the agent label inside the Claude Code TUI (status line, terminal
  title, custom header). That requires upstream Claude Code / codex support.
  Writing to `claude.json` is best-effort scaffolding for when that lands.
 - Per-bottle color affecting anything outside the dashboard agents pane (e.g.,
  proposal-pane highlights, log prefixes).
 - Validating or constraining label content beyond the 64-byte printable cap.
 - Persisting color-pair state across dashboard restarts (color pairs are
  initialized fresh each session).
 - Editing the label or color of an already-running bottle.
 - Exposing label/color via `./cli.py list` (out of scope for v1; trivial to
  add later since the field will be in metadata).
 ## Design
 ### Data flow
 ```
 operator input
     │
     ▼
 BottleSpec.label, BottleSpec.color
     │
     ├─► docker/prepare.py → BottleMetadata.label / .color → metadata.json
     │
     └─► agent_provider.py → claude.json {"name": label, "color": color}
                                              (omitted when empty)
 dashboard refresh
     │
     ▼
 enumerate_active() → read_metadata(slug) → ActiveAgent.label / .color
     │
     ▼
 _format_agent_row → label (colored) in the row string
 ```
 ### BottleSpec changes
 ```python
@dataclass(frozen=True)
 class BottleSpec:
    manifest: Manifest
    agent_name: str
    copy_cwd: bool
    user_cwd: str
    identity: str = ""
    label: str = ""   # operator-chosen display name; defaults to agent_name at render time
    color: str = ""   # one of the 16 ANSI color names, or "" for terminal default
 ```
 `label` and `color` default to `""` so all existing callers remain valid with
 no changes.
 ### BottleMetadata changes
 Add two new fields with backward-compatible defaults:
 ```python
@dataclass
 class BottleMetadata:
    identity: str
    agent_name: str
    cwd: str
    copy_cwd: bool
    started_at: str
    compose_project: str
    backend: str
    label: str = ""
    color: str = ""
 ```
 `metadata.json` written by older bot-bottle versions won't have these keys;
 `read_metadata` already uses `dict.get` with defaults, so existing slugs load
 cleanly with `label=""`, `color=""`.
 ### ActiveAgent changes
 ```python
@dataclass(frozen=True)
 class ActiveAgent:
    backend_name: str
    slug: str
    agent_name: str
    started_at: str
    services: tuple[str, ...]
    label: str = ""
    color: str = ""
 ```
 `enumerate_active()` copies `label` and `color` out of `BottleMetadata` when
 constructing each `ActiveAgent`. The smolmachines backend gets the same
 additions for symmetry; it reads from its own metadata path.
 ### Dashboard row rendering
 `_format_agent_row` already falls through cleanly on missing fields. The
 change is:
 ```python
 display_name = a.label if a.label else a.agent_name
 ```
 Color rendering uses the existing `_try_init_green()` pattern as a model.
 A `_color_pair_for(color_name)` helper initialises a fresh curses color pair
 for the requested named color and returns its attr (or 0 on failure). Each
 unique color in the active agent list gets its own pair index. Color pairs are
 allocated lazily and cached in a `dict[str, int]` that lives for the duration
 of the dashboard session.
 The 16 ANSI color name → curses constant mapping:
 | Name | curses constant |
 |------|----------------|
 | `black` | `curses.COLOR_BLACK` |
 | `red` | `curses.COLOR_RED` |
 | `green` | `curses.COLOR_GREEN` |
 | `yellow` | `curses.COLOR_YELLOW` |
 | `blue` | `curses.COLOR_BLUE` |
 | `magenta` | `curses.COLOR_MAGENTA` |
 | `cyan` | `curses.COLOR_CYAN` |
 | `white` | `curses.COLOR_WHITE` |
 | `bright-*` | same constant + `curses.A_BOLD` |
 Terminals that don't support color fall back to plain text (the helper returns
 0, which ORed in is a no-op — same pattern as `_try_init_green`).
 ### Label + color prompt — dashboard
 In `_new_agent_flow`, after `_picker_modal` returns a non-None name and before
 `_backend_picker_modal`:
 ```python
 label, color = _label_color_modal(stdscr, default_label=picked)
 ```
 `_label_color_modal` uses `curses.endwin()` → text-mode prompts → restore
 (the same drop-and-resume pattern as the existing editor flow and preflight
 Y/N). Two sequential prompts:
 ```
 bot-bottle: agent label [implementer]: <operator types>
 bot-bottle: color (red/green/blue/… or Enter to skip): <operator types>
 ```
 Invalid color names are silently ignored (treated as empty). The function
 returns `(label, color)` — both strings, both possibly `""`.
 ### Label + color prompt — CLI
 In `cmd_start`, after argument parsing and before `_launch_bottle`:
 ```python
 label = _text_prompt_label(args.name)
 color = _text_prompt_color()
 ```
 `_text_prompt_label(default)` writes `"bot-bottle: agent label [{default}]: "`
 to stderr and returns the stripped input (or `default` if blank).
 `_text_prompt_color()` writes the color prompt and returns the stripped input
 (or `""` if blank or invalid).
 Both use `read_tty_line()` (already in `start.py`) for the read.
 ### Claude Code config injection
 In `agent_provider.py`, where `claude_config.write_text(...)` is called,
 expand the JSON dict conditionally:
 ```python
 payload = {
    "hasCompletedOnboarding": True,
    "theme": "dark",
    "bypassPermissionsModeAccepted": True,
    "projects": claude_projects,
 }
 if spec.label:
    payload["name"] = spec.label
 if spec.color:
    payload["color"] = spec.color
 claude_config.write_text(json.dumps(payload, indent=2) + "\n")
 ```
 `spec` here is the `AgentProvisionSpec` (or equivalent) that `agent_provider`
 already receives; it needs `label` and `color` threaded in from `BottleSpec`
 through whatever plan/provision object the provider operates on.
 ## Implementation chunks
 Two PRs, each independently mergeable.
 ### Chunk 1 — schema + storage
 - Add `label: str = ""` and `color: str = ""` to `BottleSpec`,
  `BottleMetadata`, and `ActiveAgent`.
 - `docker/prepare.py`: copy `spec.label` / `spec.color` into `BottleMetadata`.
 - `docker/enumerate.py`: copy `metadata.label` / `metadata.color` into
  `ActiveAgent`.
 - `agent_provider.py` (or the plan object it reads): thread label/color through
  to `claude.json` write.
 - Smolmachines backend: parallel changes to metadata read/write and
  `ActiveAgent` construction.
 - No prompt changes; no UI changes. All existing behavior is identical.
 ### Chunk 2 — prompts + display
 - `start.py`: add `_text_prompt_label` and `_text_prompt_color`; call them in
  `cmd_start` before `_launch_bottle`; pass `label` / `color` into `BottleSpec`.
 - `dashboard.py`: add `_label_color_modal` (drop-and-resume); call it in
  `_new_agent_flow`; pass label/color into `BottleSpec`; add
  `_color_pair_for` helper; update `_format_agent_row` to use `a.label` with
  color rendering.
 ## Open questions
 None.
@@ -0,0 +1,151 @@
 # Gitea Webhook Agent Dispatch
 ## Question
 How should bot-bottle spawn and manage agents in response to Gitea PR events — and how do we reuse the same agent (with its full session context) across every event in a PR's lifecycle?
 ## Summary
 A lightweight webhook receiver maps Gitea PR events to `cli.py` invocations. Spawning is straightforward: the existing work on non-interactive run mode (see [host-dispatch-to-container-agents.md](host-dispatch-to-container-agents.md)) is the missing piece. Session continuity is harder: it requires tracking two identifiers per open PR — the **bottle identity** (bot-bottle's slug for the container state dir) and the **Claude session ID** (the UUID Claude writes to its JSONL transcript). The transcript snapshot mechanism already used by capability-block is the right foundation; it just needs a non-interactive path and a PR-keyed store.
 ## Gitea Webhook Events for PR Lifecycle
 Gitea fires `X-Gitea-Event: pull_request` (with an `action` field) for most PR state changes. The payload always includes `pull_request.number`, which is the stable key for correlating events to a running agent.
 | `X-Gitea-Event` value | Relevant `action` values | When it fires |
 |---|---|---|
 | `pull_request` | `opened`, `reopened`, `closed`, `synchronized` | PR created, closed, or pushed to |
 | `pull_request_comment` | `created`, `edited` | Timeline comment posted |
 | `pull_request_review_approved` | — | Review submitted with approval |
 | `pull_request_review_rejected` | — | Review submitted requesting changes |
 | `pull_request_review_comment` | — | Inline code review comment |
 | `pull_request_sync` | — | New commits pushed to the PR branch |
 `pull_request` with `action: synchronized` and `pull_request_sync` both fire on push; they carry the same information but are separate subscriptions in the webhook config UI. Subscribe to `pull_request` and `pull_request_review` (the umbrella) plus `pull_request_comment` to cover the full lifecycle.
 The webhook receiver validates the `X-Gitea-Signature-256` HMAC header (SHA-256 of the raw body, keyed by the configured secret) before dispatching.
 ## Spawning an Agent From a Webhook
 ### What we need from bot-bottle
 The current `cli.py start` is interactive — it prompts y/N and attaches a tty. A webhook handler needs a non-interactive mode that:
 1. Starts the container for a named agent.
 2. Runs `claude -p "<task>" --output-format json --dangerously-skip-permissions` inside it (no tty, no session picker).
 3. Captures stdout as JSON, extracts `session_id`.
 4. Blocks until Claude exits, then tears down.
 The [host-dispatch-to-container-agents](host-dispatch-to-container-agents.md) research proposes `cli.py run <agent> <task>` for exactly this. That command is the prerequisite for everything below. It should return the Claude JSON output so callers can extract `session_id`.
 ### Webhook receiver sketch
 The receiver is a small HTTP service (Flask, FastAPI, or a Go net/http handler) running alongside bot-bottle on the host. It:
 1. Validates the HMAC signature.
 2. Extracts `pull_request.number` and `X-Gitea-Event` / `action`.
 3. Looks up whether a bottle already exists for this PR number.
 4. Spawns or resumes accordingly (see next section).
 5. Optionally posts a comment back to the PR via Gitea API once Claude finishes.
 The receiver does not need to be async or queue-based for a single-repo bot, but should at minimum serialize events for the same PR number (a per-PR lock) to avoid two concurrent sessions clobbering each other's transcript.
 ## Reusing the Same Agent Across a PR
 This is the harder problem. Two separate identities need to be tracked and connected:
 ### Identity 1: bottle identity (bot-bottle slug)
 The slug is the per-bottle state directory name (`~/.bot-bottle/state/<slug>/`). It's what `cli.py resume <slug>` uses to relaunch a container and mount the preserved state — including the transcript snapshot. This already works for the capability-block flow.
 ### Identity 2: Claude session ID
 Claude Code's `--output-format json` response includes a `session_id` UUID. Passing `--resume <session_id>` on a subsequent non-interactive run makes Claude continue from exactly that conversation, with full memory of prior tool calls. `--continue` (which maps to `resume_args` in `agent_provider.py`) only picks up the *most recent* session in the project directory — unsafe when multiple sessions may be running concurrently.
 The session JSONL lives at `~/.claude/projects/<encoded-cwd>/<session_id>.jsonl` inside the container guest. The transcript snapshot (`snapshot_transcript(slug)` in `capability_apply.py`) copies all of `~/.claude` out of the container before teardown, so the JSONL is preserved in `~/.bot-bottle/state/<slug>/transcript/.claude/`. When the bottle is relaunched and the transcript remounted, `claude --resume <session_id>` can find the JSONL at the right path.
 ### Per-PR session registry
 The receiver needs a small persistent map:
 ```
 PR number → { bottle_identity: str, claude_session_id: str, agent_name: str }
 ```
 The simplest implementation is a JSON file at `~/.bot-bottle/pr-sessions.json`, written after each successful first-run and updated with each resume. A sqlite database is better if concurrent multi-repo support is needed.
 ### Full lifecycle flow
 ```
 PR opened
  → webhook: action=opened
  → no entry in pr-sessions.json
  → cli.py run <agent> "Review PR #N: <title>\n<diff URL>"
      → starts container, runs claude -p ... --output-format json
      → on success: captures session_id from JSON output
      → snapshot_transcript(slug)
      → tears down container
  → write pr-sessions.json: { pr: N, slug: <slug>, session_id: <uuid> }
 PR gets new commit
  → webhook: action=synchronized OR pull_request_sync
  → look up pr-sessions.json: found slug + session_id
  → cli.py run-resume <slug> --claude-session <session_id> "New commits pushed. Review the diff."
      → relaunches container with transcript snapshot mounted
      → runs claude -p ... --resume <session_id> --output-format json
      → captures new session_id (same or rotated)
      → snapshot_transcript(slug) again
  → update pr-sessions.json with latest session_id
 Comment @-mentions bot
  → webhook: pull_request_comment, action=created
  → extract comment body, check for bot mention
  → same resume flow as above with comment as the prompt
 PR closed / merged
  → webhook: action=closed
  → cli.py cleanup <slug> (or equivalent)
  → remove from pr-sessions.json
 ```
 ### What needs to be built
 | Piece | Status | Notes |
 |---|---|---|
 | `cli.py run <agent> <task>` | Missing | Non-interactive start; see host-dispatch research |
 | `cli.py run-resume <slug> --claude-session <id> <task>` | Missing | Like `resume` but non-interactive, passes `--resume <id>` to claude |
 | `snapshot_transcript` on clean exit | Exists (PRD 0012) | Already called from `start.py`'s session-end path |
 | Transcript remount on resume | Exists | `bottle_state.py::transcript_snapshot_dir` → docker cp in on launch |
 | PR session registry | Missing | Needs to be designed; `~/.bot-bottle/pr-sessions.json` is the simplest start |
 | Webhook receiver service | Missing | New service; needs to be a declared bottle or run as a host process |
 ## Known Rough Edges
 **Session ID is not available from within the session.** The ID is only in the `--output-format json` result, readable after the process exits. There is no env var or hook that exposes it mid-session ([upstream issue #44607](https://github.com/anthropics/claude-code/issues/44607)). For the webhook bot this is fine — the outer receiver reads it from the subprocess result.
 **`--continue` vs `--resume <id>`:** The existing `resume_args = ("--continue",)` in `agent_provider.py` picks up the *most recent* session. For an interactive single-user resume this is fine. For a webhook bot that may have multiple open PRs, it is not safe — two PRs' transcripts would collide if they share a project directory encoding. Use `--resume <session_id>` explicitly.
 **Project directory encoding.** Claude stores sessions keyed by the absolute cwd, encoded as a path. Inside the container the cwd is always `/home/node` or a subdir. As long as every run for the same PR uses the same cwd, `--resume <session_id>` will find the right JSONL. The cwd should be pinned per PR entry in the session registry.
 **Concurrent events for the same PR.** If two webhooks arrive close together (e.g., push + CI comment), the receiver must serialize them. A per-PR asyncio lock or a simple file lock on the session registry entry is enough.
 **Context window growth.** Each resume appends to the same session. A PR with many round trips will eventually hit the context limit. Mitigation options: start a fresh Claude session (new `cli.py run`) periodically and carry forward a summary; or rely on Claude's built-in compaction. The session registry could include a turn count to trigger rotation.
 **Webhook delivery ordering.** Gitea does not guarantee ordered delivery or exactly-once delivery. The receiver should be idempotent (same PR event processed twice should not create two bottles) and should ignore events for closed PRs.
 ## Relationship to Existing Bot-Bottle Infrastructure
 The transcript snapshot + bottle identity system (PRD 0012, `capability_apply.py`) was designed for the capability-block flow: an operator-triggered resume after a security event. The webhook flow is the same mechanism on a faster loop driven by Gitea events instead of operator action. The implementation delta is:
 1. Non-interactive run mode (the `cli.py run` gap already identified in host-dispatch research).
 2. Passing `--resume <session_id>` explicitly rather than `--continue`.
 3. A PR-keyed registry to connect PR numbers to bottle identities and session IDs.
 4. A webhook receiver to drive the loop.
 These are additive changes that sit on top of the existing transcript preservation machinery without altering it.
 ## Recommendation
 Start with the non-interactive run mode (`cli.py run`) since everything else depends on it. Once that exists, the webhook receiver and session registry are straightforward glue. The receiver should run as a host process (not inside a bottle) since it needs to call `cli.py` and manage the session registry file. Serialize per-PR to avoid concurrency bugs. Use `--resume <session_id>` (not `--continue`) for all resume paths.
 The PR session registry is deliberately minimal to start — a JSON file is fine. If multi-repo or multi-agent scenarios appear, migrating to sqlite is a one-file change.
@@ -0,0 +1,278 @@
 # Local Ollama: Deployment Topology, Harness Selection, and Model Sizing
 Research notes on running Ollama locally for a bot-bottle coding agent workflow.
 Covers the native-vs-VM question, which harness integrates best with an agent loop,
 and which models make sense on an RTX 3070 (8 GB VRAM / 30 GB RAM) machine.
 ---
 ## 1. Deployment topology: native, container, or VM?
 The core question is whether running Ollama in a VM significantly degrades inference
 performance. The short answer: a full KVM/QEMU VM with GPU passthrough adds roughly
 2–5% overhead, Docker on Linux adds roughly 1–2%, and LXC containers add sub-1%. None
 of these are significant for interactive coding use.
 ### Native (bare metal)
 Zero overhead, immediate GPU access, simplest setup. The right default for a solo
 developer doing inference on their own workstation.
 ### Docker containers on Linux + NVIDIA
 With `nvidia-container-toolkit` and `--gpus all`, containerized Ollama runs at
 essentially native speed (~1–2% overhead on Linux). The dramatic exception is macOS,
 where Docker Desktop runs a Linux VM with no access to Apple's Metal/GPU — inference
 is 5–6× slower. On Linux/Windows with NVIDIA hardware, Docker is fine.
 Common pitfall: if `docker exec ollama ollama ps` shows 0 GPU layers, the container
 fell back to CPU. Usual causes: stale VRAM allocation, missing `nvidia-container-toolkit`,
 or a host driver too old for the container's CUDA version.
 ### KVM/QEMU VM with full PCIe passthrough
 Full GPU passthrough makes the GPU invisible to the host while the VM owns it. Overhead
 from the IOMMU translation layer and virtualized PCIe bus is ~2–5%. This is viable if
 you need VM-level isolation (snapshotting, migration, separate kernel). Setup complexity
 is non-trivial: BIOS IOMMU, IOMMU group management, VFIO driver binding. Once configured
 it is stable.
 **Critical gotcha:** set the VM's CPU type to `host`. If left at the default
 (`x86-64-v2-AES` / "QEMU Virtual CPU version 2.5+"), Ollama may silently disable GPU
 support even when drivers appear correct.
 ### LXC containers (Proxmox et al.)
 The sweet spot for isolation without overhead. Sub-1% performance difference from bare
 metal because LXC shares the host kernel; GPU device files are bind-mounted into the
 container. The tradeoff is weaker isolation (shared kernel) and the requirement that
 host and container driver versions match. Not suitable if you need VM-level snapshots
 or live migration.
 ### Summary
 | Topology | GPU overhead | Isolation | Complexity |
 |---|---|---|---|
 | Native | 0% | None | Low |
 | Docker (Linux) | ~1–2% | Process | Low |
 | LXC | <1% | Namespace | Medium |
 | KVM passthrough | 2–5% | Full VM | High |
 | VM no passthrough | CPU-only | Full VM | Medium |
 Running Ollama in a VM will **not** significantly slow inference as long as GPU passthrough
 is configured. Without passthrough (software rendering / CPU fallback) performance
 collapses — that is what the user is rightly worried about.
 ### Local vs. remote server
 | Factor | Local machine | Remote server |
 |---|---|---|
 | Latency | Near-zero | Network round-trip; cumulative in agent loops |
 | Cost | Zero after hardware | Per-token or subscription |
 | Privacy | 100% on-device | Data leaves the machine |
 | Model size ceiling | VRAM-limited | No hard limit (671B+ feasible) |
 | Offline use | Yes | No |
 | Concurrency under load | Sequential by default | Scales horizontally |
 For agentic coding workflows making 20–50 tool calls per session, network latency
 accumulates quickly. Local inference eliminates this. A practical hybrid pattern:
 use the local GPU for routine coding loops; route only to a remote API for tasks
 requiring a 70B+ model or very long context (>128K tokens).
 ---
 ## 2. Harness selection
 The landscape in 2026 has settled into three categories: IDE plugins, terminal agents,
 and chat UIs.
 ### Continue.dev — recommended IDE plugin
 Open-source VS Code / JetBrains / Zed / Vim extension. Routes autocomplete, chat, and
 refactoring commands to any configured LLM backend (Ollama, cloud APIs). The recommended
 setup uses two models: a small FIM-capable model for inline autocomplete (Qwen2.5-Coder 7B)
 and a larger model for chat/edit. Handles inline completions, multi-file edits, and
 codebase-aware chat. No API key, no data leaving the machine.
 ### Aider — recommended for git-native terminal workflows
 Terminal-based coding agent. Builds a codebase map before editing, makes changes
 directly, and auto-commits to git with readable messages. Every change is one
 `git revert` away. Supports 100+ languages; connects to any Ollama-served model
 via the OpenAI-compatible API. Best for terminal-first developers who want
 version-controlled agent interactions. Does not do inline autocomplete.
 ### OpenCode — recommended for bot-bottle–style agent loops
 Terminal-based coding agent with 15 built-in tools (bash execution, file read/write/edit,
 grep, glob, web fetch, MCP support) and connections to 75+ model providers including
 local Ollama models. This is the closest open-source equivalent to a Claude Code–style
 plan → tool-call → execute → observe → loop. Native Ollama integration.
 **Critical setup note:** Ollama defaults to a 4096-token context window, which is
 completely insufficient for an agent loop carrying conversation history, tool schemas,
 a system prompt, and code simultaneously. Configure at least 64K tokens explicitly
 in the model's context settings.
 ### Cline — agentic VS Code assistant
 VS Code extension that operates as an autonomous agent: plans, edits files, runs commands
 in a loop, connects to Ollama's local endpoint. Compared to OpenCode it lives inside the
 IDE rather than the terminal; compared to Continue.dev it is a full agent rather than a
 plugin. Its system prompt overhead is higher (~7,000–10,000 tokens) than minimal harnesses.
 ### Open WebUI / Jan / LM Studio — chat UIs, not coding harnesses
 These are browser or desktop chat interfaces useful for ad-hoc conversations (explaining
 APIs, drafting documentation, exploring ideas) but without IDE integration, autocomplete,
 or git integration. LM Studio offers the smoothest onboarding (visual model browser with
 VRAM estimates). Jan is the most privacy-auditable (fully open-source, Apache 2.0, no
 telemetry). Neither is a replacement for a coding harness.
 ### Harness comparison
 | Harness | Type | Autocomplete | Agent loop | Ollama | Git integration |
 |---|---|---|---|---|---|
 | Continue.dev | IDE plugin | Yes (FIM) | Basic | Native | No |
 | Aider | Terminal agent | No | Multi-turn | Via API | Auto-commit |
 | OpenCode | Terminal agent | No | Full tools | Native | Via bash |
 | Cline | IDE agent | No | Full tools | Via API | Via bash |
 | Open WebUI | Chat UI | No | No | Native | No |
 | Jan | Chat UI | No | No | Native | No |
 For a bot-bottle workflow (an isolated sandbox running an agentic loop with tool access),
 **OpenCode** is the closest open-source match. For an IDE-first developer who wants
 autocomplete + chat, **Continue.dev + Qwen2.5-Coder 7B** is the recommended pair.
 ---
 ## 3. Model selection: RTX 3070 (8 GB VRAM / 30 GB RAM)
 ### VRAM hard limits at Q4_K_M quantization
 | Model size | Approx. VRAM (Q4_K_M) | Fits in 8 GB? | Tokens/sec (RTX 3070) |
 |---|---|---|---|
 | 3–4B | 2.5–3.5 GB | Yes, with headroom | 60–90 |
 | 7–8B | 5–6 GB | Yes | 35–55 |
 | 12–14B | 7.5–9 GB | Edge / RAM offload | 8–18 |
 | 22B+ | 14+ GB | No | — |
 The RTX 3070 has high memory bandwidth for its VRAM tier and consistently outperforms
 the newer RTX 4060 Ti on token generation speed. Bandwidth matters more than raw compute
 for inference.
 ### Does Gemma 4 exist?
 Yes. Google released **Gemma 4** on 2 April 2026 (Apache 2.0). The family includes
 E2B (2B), E4B (4B), a 26B MoE, and a 31B Dense. A 12B multimodal variant was announced
 2026-06-04. The 31B scores 80.0% on LiveCodeBench v6 — a major jump from Gemma 3 27B
 at 29.1%. However, only the E4B fits comfortably within 8 GB VRAM:
 | Variant | VRAM (approx.) | Fits? |
 |---|---|---|
 | Gemma 4 E2B | ~2 GB | Yes |
 | Gemma 4 E4B | ~5 GB | Yes |
 | Gemma 4 12B | ~8–9 GB (Q4) | Edge |
 | Gemma 4 26B MoE | 14–18 GB | No |
 | Gemma 4 31B Dense | ~20 GB | No |
 ### Model-by-model evaluation
 **Qwen2.5-Coder 7B — primary recommendation**
 The strongest purpose-built coding model that fits fully within 8 GB VRAM. Leads
 HumanEval among 7–8B-class models. Strong on Python, JavaScript, TypeScript. Has
 FIM (fill-in-the-middle) support for inline autocomplete. 35–55 tok/sec on RTX 3070.
 ```
 ollama pull qwen2.5-coder:7b
 ```
 **Qwen2.5-Coder 14B — secondary, with RAM offloading**
 At Q4_K_M this needs ~8.7 GB, just over the 8 GB limit. With 30 GB system RAM, Ollama
 automatically offloads the overflow layers to CPU. Performance drops to ~8–18 tok/sec
 versus 35–55 tok/sec for the 7B fully in VRAM. Quality is noticeably better for complex
 multi-file reasoning. Viable for chat-based coding tasks where quality matters more than
 speed; too slow for live autocomplete. Keep context window at 8K tokens to minimize
 VRAM pressure during offloaded inference.
 ```
 ollama pull qwen2.5-coder:14b
 ```
 **Gemma 4 E4B (~5 GB VRAM)**
 Fits comfortably with 3 GB to spare. Strong on reasoning, multimodal, and general-purpose
 tasks. Less specialized for coding than Qwen2.5-Coder 7B. Good choice for one model that
 covers coding + general reasoning + image analysis. The E4B outperforms Gemma 3 equivalents
 significantly on coding benchmarks.
 ```
 ollama pull gemma4:e4b
 ```
 **Phi-4 Mini 3.8B (~3 GB VRAM)**
 Best reasoning-per-VRAM model; leaves ~5 GB free for other applications. Strong on math,
 logic, and structured output. Good for agentic sub-tasks requiring tight reasoning. Not the
 strongest at raw code synthesis but excellent for reasoning-heavy parts of a coding loop.
 Viable as the autocomplete model in a two-model Continue.dev setup.
 ```
 ollama pull phi4-mini
 ```
 **DeepSeek-R1 8B (~5–6 GB VRAM)**
 Strong reasoning model for logic-heavy code (algorithms, correctness proofs). The full
 DeepSeek-Coder-V2 (236B MoE) is impractical here — only the 8B distilled variants are
 relevant. Outperforms Gemma 4 E4B on reasoning-heavy benchmarks; weaker on raw code
 generation than Qwen2.5-Coder 7B.
 **Codestral — not viable at 8 GB**
 The top FIM autocomplete model on HumanEval-FIM benchmarks, but requires 12–16 GB VRAM
 minimum. Not an option here. Worth revisiting if upgrading to a 12 GB+ card (RTX 4070
 Super or newer).
 ### RAM offloading: does 30 GB help?
 Yes, meaningfully. Ollama automatically splits layers between GPU and system RAM when
 VRAM is exceeded. With 30 GB RAM, models up to ~14B at Q4_K_M run with partial offloading.
 The tradeoff is a 2–5× throughput penalty (8–18 tok/sec vs 35–55 tok/sec). Acceptable
 for batch tasks (reviewing a PR, generating an algorithm); too slow for live autocomplete.
 ### Recommended setup
 **Autocomplete (fast, always-in-VRAM):** `qwen2.5-coder:7b`
 - Configure in Continue.dev as the tab-completion model
 - FIM-capable; 35–55 tok/sec; fits with 2–3 GB VRAM to spare
 **Chat / agent loop (quality-first):** `qwen2.5-coder:14b` or `gemma4:e4b`
 - 14B for strongest multi-file coding; expect 8–18 tok/sec with RAM offload
 - Gemma 4 E4B if you want vision + general reasoning + coding in one model; ~60 tok/sec
 **Two-model Continue.dev config (lower VRAM pressure):**
 `phi4-mini` (autocomplete) + `qwen2.5-coder:7b` (chat) — both fit simultaneously with
 ~1–2 GB to spare, keeping the OS and IDE from contending for VRAM.
 ---
 ## Sources
 - [Ollama on Proxmox: GPU Passthrough for LXC and VM AI Workloads](https://linuxprofessional.ie/article.php?slug=ollama-proxmox-gpu-passthrough-lxc-vm)
 - [Run Ollama with NVIDIA GPU in Proxmox VMs and LXC containers](https://www.virtualizationhowto.com/2025/05/run-ollama-with-nvidia-gpu-in-proxmox-vms-and-lxc-containers/)
 - [Ollama Performance Tuning: Getting Maximum Speed from Local LLMs](https://dasroot.net/posts/2026/01/ollama-performance-tuning-gpu-acceleration-model-quantization/)
 - [Pros and Cons: Containerized Ollama vs. Local Setup](https://alain-airom.medium.com/pros-and-cons-using-containerized-ollama-vs-local-setup-d9bdf225bbb5)
 - [Best Local Coding Models Ranked: Every VRAM Tier (2026)](https://insiderllm.com/guides/best-local-coding-models-2026/)
 - [Best Local LLMs for RTX 4060, RTX 3070, and RTX 5060](https://aiagentskit.com/blog/best-local-llms-rtx-4060-3070-5060/)
 - [Best Local LLMs for 8GB VRAM: Real Hardware Benchmarks (2026)](https://localllm.in/blog/best-local-llms-8gb-vram-2025)
 - [Self-Hosted AI Coding Agent: Ollama + Continue + Open WebUI Setup in 2026](https://www.web3aiblog.com/blog/self-hosted-ai-coding-agent-ollama-continue-2026)
 - [Best Local-First AI Coding Tools 2026: 14 Compared](https://nimbalyst.com/blog/best-local-first-ai-coding-tools-2026/)
 - [OpenCode + Ollama: Private Local AI Coding Agent Setup](https://lushbinary.com/blog/opencode-ollama-local-ai-coding-privacy-guide/)
 - [Gemma 4: Google DeepMind](https://deepmind.google/models/gemma/gemma-4/)
 - [Running Gemma 4 Locally: VRAM Requirements](https://knightli.com/en/2026/05/01/gemma-4-local-vram-quantization-table/)
 - [Phi-4 Mini vs. Gemma 3 vs. Qwen 2.5: Best SLM for Coding Tasks in 2026](https://botmonster.com/ai/phi-4-mini-vs-gemma-3-vs-qwen-25-best-slm-coding-2026/)
 - [Qwen2.5-Coder 14B VRAM Requirements Guide](https://willitrunai.com/blog/qwen-2-5-coder-14b-vram-requirements)
 - [Comparing AI Harnesses: OpenCode, Ollama, LM Studio, Claude Code, Open WebUI, and VS Code](https://jace.pro/blog/comparing-ai-harnesses-opencode-ollama-lm-studio-claude-code-open-webui-and-vs-code/)
Author	SHA1	Message	Date
didericis-claude	3997a0a721	docs(prd): add PRD 0049 — named/labelled agents test / unit (pull_request) Successful in 39s Details test / integration (pull_request) Successful in 53s Details Draft PRD for prompting operators for a custom label and optional ANSI color at agent launch time, storing both in metadata.json, and surfacing the label (in color) in the dashboard's active-agents pane. Closes #171	2026-06-03 21:38:38 -04:00
didericis-claude	ea66f63d45	refactor(backend): hoist guest_home to BottlePlan base test / unit (push) Successful in 37s Details test / integration (push) Successful in 54s Details Per PR review feedback (review #132): guest_home shouldn't be buried inside workspace_plan / read from a hardcoded literal in each provision module. It's a cross-cutting bottle property — the backend's prepare step knows it, and every downstream consumer (contrib providers, git provisioning, gitconfig path) should read it from one place. - Adds guest_home: str to BottlePlan base dataclass. - Both backends' prepare steps populate plan.guest_home. - contrib/{claude,codex}/agent_provider.py read plan.guest_home (was plan.workspace_plan.guest_home). - bot_bottle/backend/docker/provision/git.py reads plan.guest_home for the gitconfig destination (was hardcoded "/home/node"). - bot_bottle/backend/smolmachines/provision/git.py drops the _GUEST_HOME / _guest_home() helpers and reads plan.guest_home. - Tests that construct BottlePlan subclasses directly pass guest_home="/home/node" explicitly.	2026-06-03 21:38:13 -04:00
didericis-claude	83db7336c8	refactor(agent_provider): drop GUEST_HOME default, backend drives guest_home Per PR review feedback (review #130): the GUEST_HOME = '/home/node' default in agent_provider.py was driving the wrong direction — the agent provider shouldn't ship its own opinion about the guest home, the backend should. - Removes the GUEST_HOME constant. - Makes guest_home a required kwarg on AgentProvider.provision_plan and the agent_provision_plan shim (no default). - Drops module-level _SKILLS_DIR / _PROMPT_PATH constants from contrib/{claude,codex}/agent_provider.py; both providers now derive the in-guest paths from plan.workspace_plan.guest_home at call time, which the backend's prepare step populated. - Updates tests/unit/test_agent_provider.py callers to pass guest_home explicitly. The backend prepare paths already pass it; no production-code call sites changed.	2026-06-03 21:38:13 -04:00
didericis-claude	bcdffc8400	refactor(contrib): inline provision steps per-provider, drop shared apply module Each AgentProvider now owns its skills / prompt / provision / supervise_mcp end-to-end. The base ABC declares all four as abstract; ClaudeAgentProvider and CodexAgentProvider each carry their own copy loop. Per PR review feedback (review #128): the shared _provision_apply.py abstraction was weak — Claude and Codex harnesses already diverge (codex's dummy-auth + login-status verify has no claude analogue) and forcing both onto one helper just postpones the split. Duplication is intentional. Deletes bot_bottle/_provision_apply.py and consolidates testing under tests/unit/test_contrib_{claude,codex}_provider.py (one file per provider, covering all four methods).	2026-06-03 21:38:13 -04:00
didericis-claude	f44751c4b8	feat(agent_provider): migrate tests, drop guest-home/skills-dir env knobs, activate PRD 0050 - tests/unit/test_provision_apply.py covers the new shared apply helpers (apply_skills / apply_prompt / apply_provision) that replace the per-backend modules deleted in the prior commit. - tests/unit/test_contrib_supervise_mcp.py covers both providers' provision_supervise_mcp behavior — confirms the codex bottle now runs `codex mcp add` symmetrically with claude. - tests/unit/test_smolmachines_provision.py drops the four test classes whose subjects moved (TestProvisionPrompt / TestProvisionProviderAuth / TestProvisionSkills / TestProvisionSupervise); the backend-side CA / git / workspace classes stay. - tests/unit/test_docker_provision_provider_auth.py removed; its coverage now lives in tests/unit/test_provision_apply.py (apply_provision is backend-agnostic, one test file suffices). Drops the BOT_BOTTLE_CONTAINER_HOME, BOT_BOTTLE_GUEST_HOME, BOT_BOTTLE_CONTAINER_SKILLS_DIR, and BOT_BOTTLE_GUEST_SKILLS_DIR env knobs the deleted provision modules used to read. /home/node is hardcoded everywhere the knobs lived; the values were effectively constants today and removing them keeps the PRD-0050 surface area honest. Flips PRD 0050 Status: Draft → Active. Closes #177 on merge.	2026-06-03 21:38:13 -04:00
didericis-claude	3d557beeee	refactor(backend): move per-provider provisioning onto AgentProvider BottleBackend.provision now resolves the provider plugin from the plan and dispatches prompt / skills / declarative-apply / supervise-mcp through it. The four hooks the docker + smolmachines backends used to override (provision_skills, provision_prompt, provision_provider_auth, provision_supervise) are gone — the duplicated 50-line implementations under backend/{docker,smolmachines}/provision/{skills,prompt, provider_auth,supervise}.py are deleted. Each backend gains a small supervise_mcp_url(plan) override so the provider plugin can run `claude mcp add` / `codex mcp add` against the right URL: docker returns http://{SUPERVISE_HOSTNAME}:{SUPERVISE_PORT}/ on the compose network alias; smolmachines returns plan.agent_supervise_url which launch.py already pins to a host-loopback port. Removes tests/unit/test_provision_supervise.py — the URL it asserted on now lives on the backend, with no equivalent standalone surface to test against (it's covered by the broader plan / launch integration tests).	2026-06-03 21:38:13 -04:00
didericis-claude	44365ecf68	refactor(agent_provider): introduce AgentProvider ABC + contrib plugins Lift the provider-specific blocks of agent_provision_plan into contrib/claude/agent_provider.py and contrib/codex/agent_provider.py, behind a new AgentProvider ABC and a lazy get_provider() registry (mirrors PRD 0048's contrib convention). agent_provision_plan and runtime_for stay as thin shims so existing callers in backend/{docker,smolmachines}/prepare.py and cli/start.py keep working without per-call edits — the shipping diff in this commit is purely 'who owns the producer'. Adds bot_bottle/_provision_apply.py — the backend-agnostic skills / prompt / declarative-plan apply loops the per-provider default methods will dispatch through in the next commit.	2026-06-03 21:38:13 -04:00
didericis-claude	703b12ee9a	docs(prd): draft PRD 0050 — move provider logic into contrib	2026-06-03 21:38:13 -04:00
didericis-claude	d1556f4659	docs(research): local ollama deployment, harness selection, and model sizing test / unit (push) Successful in 41s Details test / integration (push) Successful in 48s Details	2026-06-03 21:37:55 -04:00
didericis-claude	06eed5b236	docs(research): gitea webhook agent dispatch and PR session continuity test / unit (push) Successful in 38s Details test / integration (push) Successful in 51s Details Research note covering how to spawn bot-bottle agents from Gitea webhook events and reuse the same session (bottle identity + Claude session ID) across an entire PR lifecycle. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-03 21:37:40 -04:00
didericis	98e4e2b7dc	docs(readme): additional tweaks test / unit (pull_request) Successful in 34s Details test / integration (pull_request) Successful in 50s Details test / unit (push) Successful in 42s Details test / integration (push) Successful in 52s Details	2026-06-03 21:19:00 -04:00
didericis-claude	9eca46b408	docs: slim README to threat model, features, one diagram, one manifest test / unit (pull_request) Successful in 45s Details test / integration (pull_request) Successful in 56s Details	2026-06-03 21:29:32 +00:00