Compare commits
12 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 3997a0a721 | |||
| ea66f63d45 | |||
| 83db7336c8 | |||
| bcdffc8400 | |||
| f44751c4b8 | |||
| 3d557beeee | |||
| 44365ecf68 | |||
| 703b12ee9a | |||
| d1556f4659 | |||
| 06eed5b236 | |||
| 98e4e2b7dc | |||
| 9eca46b408 |
@@ -6,96 +6,26 @@
|
|||||||
|
|
||||||
[](https://gitea.dideric.is/didericis/bot-bottle/actions?workflow=test.yml)
|
[](https://gitea.dideric.is/didericis/bot-bottle/actions?workflow=test.yml)
|
||||||
|
|
||||||
Run multiple Claude Code agents on your own machine, each scoped to its own secrets, skills, and egress allowlist.
|
**Problem:** Developer wants to run a coding agent without supervision, but they don't want a prompt injected or misbehaving agent wrecking their environment or exfiltrating sensitive data.
|
||||||
|
|
||||||

|
**Solution:** Ephemeral, per agent "bottles" the agent cannot modify that scan all traffic for data exfiltration and limit capabilities and egress to only what the agent needs.
|
||||||
|
|
||||||
Four prompts to the agent inside a real bottle:
|
## Features
|
||||||
claude replies to `hello there` — proof api.anthropic.com routes
|
|
||||||
through pipelock's bumped TLS end-to-end;
|
|
||||||
asked to GET a non-allowlisted host, the agent's curl gets 403 back
|
|
||||||
from pipelock;
|
|
||||||
asked to POST a credential-shaped body to an allowlisted host, the
|
|
||||||
same 403 — pipelock's DLP body scanner caught it;
|
|
||||||
asked to commit and push an AKIA-shaped key, git-gate's gitleaks
|
|
||||||
pre-receive hook rejects the ref.
|
|
||||||
Run it yourself with `bash scripts/demo.sh`.
|
|
||||||
|
|
||||||
## Why "bot-bottle"?
|
- **Per-bottle egress allowlist** — TLS-bumped HTTP/HTTPS chokepoint with a per-manifest host allowlist and request-body DLP scanner; DoH and arbitrary hosts blocked by default.
|
||||||
|
- **Tokens the agent never sees** — host secrets live in a sidecar; the agent dials `http://sidecar:9099/<path>` and the proxy strips inbound `Authorization` and injects the real token before forwarding. `printenv` in the agent shows proxy URLs only.
|
||||||
Each container is a bottle; Claude is the genie inside. The genie's
|
- **Gitleaks-scanned push (git-gate)** — `bottle.git` remotes route through a per-bottle `git daemon` that gitleaks-scans incoming refs pre-receive and forwards clean refs upstream over SSH. The agent never holds the upstream credential.
|
||||||
powers are exactly what the manifest grants it — a specific set of
|
- **Manifest-scoped skills + secrets** — each bottle declares its skills, env, git identity, remotes, and egress routes; unknown keys die at load.
|
||||||
skills, a specific set of secrets, and a specific set of hosts it can
|
- **Trust boundary at `$HOME`** — bottles (credentials, egress, remotes) live only under `~/.bot-bottle/bottles/`. Repos may ship agents but not bottles, so a cloned repo can't redirect an env var to an attacker host.
|
||||||
reach — nothing more. You uncork one bottle per agent
|
- **Composable bottles (`extends:`)** — keep provider/runtime policy in one base bottle (e.g. `claude.md`) and overlay task bottles on top.
|
||||||
(`./cli.py start <agent>`), many bottles run in parallel, and each is
|
- **Parallel, isolated bottles** — each bottle is its own per-agent Docker `--internal` network; bottles don't share state or talk to each other.
|
||||||
scoped to its task. When the session ends the bottle is destroyed and
|
- **Provider templates (Claude, Codex)** — `Dockerfile.claude` / `Dockerfile.codex`, or a bottle-supplied Dockerfile. Claude auth via long-lived OAuth token; Codex via opt-in host device-auth forwarding.
|
||||||
the genie does not persist.
|
- **gVisor auto-detect** — on Linux hosts where `runsc` is registered with Docker, every bottle launches under it for a userspace syscall barrier; no manifest config required.
|
||||||
|
- **Smolmachines backend (macOS)** — opt-in `BOT_BOTTLE_BACKEND=smolmachines` runs the agent in a libkrun micro-VM with the sidecar bundle still in Docker.
|
||||||
## Goals
|
|
||||||
|
|
||||||
- Scope each agent to the minimum credentials and network egress its task actually needs
|
|
||||||
- Run multiple agents in parallel, isolated from each other
|
|
||||||
- Keep code, credentials, and agent activity on infrastructure I control — no third-party agent runtime
|
|
||||||
|
|
||||||
## Project status
|
|
||||||
|
|
||||||
bot-bottle is a self-hosted secure runtime for AI coding agents.
|
|
||||||
Each agent runs in an isolated container or micro-VM-backed bottle with
|
|
||||||
scoped secrets, allowlisted egress, TLS-aware proxying, DLP checks, and
|
|
||||||
a git-gate that withholds upstream credentials and scans pushes before
|
|
||||||
forwarding. The project includes a documented threat model, PRD-driven
|
|
||||||
development history, Docker and smolmachines backends, dashboard and
|
|
||||||
remediation flows, and unit/integration tests covering exfiltration and
|
|
||||||
sandbox escape scenarios.
|
|
||||||
|
|
||||||
## Security model
|
|
||||||
|
|
||||||
Each agent runs in its own bottle: its own container, its own internal
|
|
||||||
Docker network, and its own pipelock sidecar. Bottles don't share
|
|
||||||
state, don't talk to each other, and only get the env vars, skills,
|
|
||||||
SSH identities, and egress hosts the manifest grants them — nothing
|
|
||||||
more. Any one agent only has the access it needs to do its job.
|
|
||||||
|
|
||||||
The bottle limits both what an agent can see and where it can send
|
|
||||||
it. Each bottle gets only the secrets and SSH identities the manifest
|
|
||||||
grants it — a Gitea token but not a GitHub token, a deploy key but
|
|
||||||
not a personal SSH key — so even a compromised or misbehaving agent
|
|
||||||
only handles credentials it was already trusted with for its job.
|
|
||||||
Egress flows through pipelock, which constrains where those
|
|
||||||
credentials can travel: an agent with a Gitea token can reach
|
|
||||||
`gitea.dideric.is`, not arbitrary attacker-controlled hosts. The same
|
|
||||||
constraint blocks DNS-over-HTTPS as an exfil channel — a DoH resolver
|
|
||||||
like `cloudflare-dns.com` would have to be on the allowlist for the
|
|
||||||
agent to reach it at all. The container itself adds a layer between
|
|
||||||
the agent and the host, but the v1 design leans more on secret
|
|
||||||
minimization and egress allowlisting than on the container as a
|
|
||||||
hardened boundary. On Linux hosts where [gVisor](https://gvisor.dev/)
|
|
||||||
is registered with Docker, bot-bottle auto-detects it and launches
|
|
||||||
every bottle under `runsc` for a userspace syscall barrier — no
|
|
||||||
manifest configuration required. The broader v2 discussion lives in
|
|
||||||
`docs/research/stronger-isolation-alternatives.md`.
|
|
||||||
|
|
||||||
The egress proxy and OAuth-token handling below are the load-bearing
|
|
||||||
pieces of v1.
|
|
||||||
|
|
||||||
## Architecture
|
## Architecture
|
||||||
|
|
||||||
A bottle is two containers per agent: an `agent` container, and a
|
A bottle is two containers per agent: an `agent` container, and a `sidecars` container that bundles pipelock + cred-proxy + git-gate + supervise behind a Python init supervisor. They share a per-agent Docker `--internal` network; the agent has no default route off-box.
|
||||||
`sidecars` container that bundles pipelock + egress + git-gate +
|
|
||||||
supervise behind a Python init supervisor (PRD 0024). They share a
|
|
||||||
per-agent Docker `--internal` network; the agent has no default
|
|
||||||
route off-box. All HTTP and HTTPS egress funnels through pipelock,
|
|
||||||
where the egress allowlist, TLS interception, and request-body DLP
|
|
||||||
scanner enforce the manifest before any byte leaves the host. The
|
|
||||||
only egress that doesn't traverse pipelock is git-gate's SSH
|
|
||||||
push/fetch to `bottle.git` upstreams — pipelock can't proxy SSH,
|
|
||||||
so git-gate is its own L4-style egress path with gitleaks doing
|
|
||||||
the pre-receive scan.
|
|
||||||
|
|
||||||
The agent dials the bundle by the legacy short names (`pipelock`,
|
|
||||||
`egress`, `git-gate`, `supervise`); the renderer registers those as
|
|
||||||
docker-network aliases on the bundle so existing HTTPS_PROXY URLs
|
|
||||||
and MCP endpoints resolve without an agent-side change.
|
|
||||||
|
|
||||||
```
|
```
|
||||||
host ( ./cli.py )
|
host ( ./cli.py )
|
||||||
@@ -104,26 +34,21 @@ and MCP endpoints resolve without an agent-side change.
|
|||||||
▼
|
▼
|
||||||
┌─────────────────────────── bottle ──────────────────────────────────┐
|
┌─────────────────────────── bottle ──────────────────────────────────┐
|
||||||
│ │
|
│ │
|
||||||
│ ┌──────────────────┐ │
|
│ ┌──────────────────┐ ┌──────────────┐ │
|
||||||
│ │ agent image │ HTTPS_PROXY │
|
│ │ agent image │ HTTP(S) proxy │ cred-proxy │ │
|
||||||
│ │ (claude-code, │ ────────────────────────┐ │
|
│ │ (claude-code, │ ─────────────────►│ (strips/inj │ │
|
||||||
│ │ built locally) │ │ │
|
│ │ codex, etc) │ │ Authoriz.) │ │
|
||||||
│ │ │ plain HTTP │ │
|
│ │ │ └──────┬───────┘ │
|
||||||
│ │ skills, env, │ (token injection) ┌────▼─────────┐ │
|
│ │ environ: URLs │ │ │
|
||||||
│ │ ~/.gitconfig, │ ──────────────────►│ cred-proxy │ │
|
│ │ only, no real │ ▼ │
|
||||||
│ │ ~/.npmrc, tea │ │ (strips/inj │ │
|
│ │ tokens │ ┌────────────────┐ │ HTTPS to
|
||||||
│ │ │ │ Authoriz.) │ │
|
|
||||||
│ │ environ: URLs │ └─────┬────────┘ │
|
|
||||||
│ │ only, no real │ HTTPS_PROXY │ │
|
|
||||||
│ │ tokens │ ▼ │
|
|
||||||
│ │ │ ┌────────────────┐ │ HTTPS to
|
|
||||||
│ │ │ │ pipelock image │──────────┼──► allowlisted
|
│ │ │ │ pipelock image │──────────┼──► allowlisted
|
||||||
│ │ │ │ (TLS bump, DLP │ │ hosts (incl.
|
│ │ │ │ (TLS bump, DLP │ │ hosts (incl.
|
||||||
│ │ │ │ body scan, │ │ cred-proxy
|
│ │ │ │ body scan, │ │ cred-proxy
|
||||||
│ │ │ │ allowlist) │ │ upstreams)
|
│ │ │ │ allowlist) │ │ upstreams)
|
||||||
│ │ │ └────────────────┘ │
|
│ │ │ └────────────────┘ │
|
||||||
│ │ │ │
|
│ │ │ │
|
||||||
│ │ │ git:// ┌────────────────┐ │ SSH push/fetch
|
│ │ │ git proxy ┌────────────────┐ │ SSH push/fetch
|
||||||
│ │ │ ────────────────►│ git-gate image │──────────┼──► to bottle.git
|
│ │ │ ────────────────►│ git-gate image │──────────┼──► to bottle.git
|
||||||
│ │ │ │ (gitleaks + │ │ upstreams
|
│ │ │ │ (gitleaks + │ │ upstreams
|
||||||
│ └──────────────────┘ │ git daemon) │ │ (direct — not
|
│ └──────────────────┘ │ git daemon) │ │ (direct — not
|
||||||
@@ -137,192 +62,25 @@ and MCP endpoints resolve without an agent-side change.
|
|||||||
└─────────────────────────────────────────────────────────────────────┘
|
└─────────────────────────────────────────────────────────────────────┘
|
||||||
```
|
```
|
||||||
|
|
||||||
- **agent image** — built from the provider template Dockerfile
|
When the agent exits, `cli.py` tears down every sidecar and both networks; nothing about a bottle persists between runs.
|
||||||
(`Dockerfile.claude` for Claude, `Dockerfile.codex` for Codex, or
|
|
||||||
`agent_provider.dockerfile`) on first run; runs the selected agent
|
|
||||||
CLI with the manifest-granted skills, env vars, and `~/.gitconfig`
|
|
||||||
(the latter for the git-gate's `insteadOf` rules when `bottle.git`
|
|
||||||
is set).
|
|
||||||
- **pipelock image** — per-agent sidecar. Terminates the agent's
|
|
||||||
outbound HTTP/HTTPS, enforces the resolved allowlist, runs DLP
|
|
||||||
scanning. Design in `docs/prds/0001-per-agent-egress-proxy-via-pipelock.md`
|
|
||||||
and `docs/prds/0006-pipelock-tls-interception.md`.
|
|
||||||
- **git-gate image** — per-agent sidecar built on `zricethezav/gitleaks`
|
|
||||||
(alpine + gitleaks + git-daemon + openssh-client). Runs
|
|
||||||
`git daemon` over `git://` as a bidirectional mirror of each
|
|
||||||
declared upstream. A pre-receive hook gitleaks-scans incoming
|
|
||||||
refs and forwards clean refs to the real upstream over SSH; an
|
|
||||||
access-hook runs `git fetch origin --prune` against the upstream
|
|
||||||
before every upload-pack so an agent fetch returns whatever the
|
|
||||||
upstream has *now* (fail-closed if unreachable). The agent's
|
|
||||||
`~/.gitconfig` rewrites the real URL to the gate via `insteadOf`,
|
|
||||||
so push, fetch, clone, and pull all route through. The agent
|
|
||||||
never sees the upstream credential. Brought up only when
|
|
||||||
`bottle.git` has entries. Design in `docs/prds/0008-git-gate.md`.
|
|
||||||
- **cred-proxy image** — per-bottle sidecar (`python:3.13-alpine`
|
|
||||||
base, stdlib-only) that holds API tokens declared in
|
|
||||||
`bottle.cred_proxy.routes`. Each route names a `path`,
|
|
||||||
`upstream`, `auth_scheme`, and `token_ref` (host env var); the
|
|
||||||
agent dials `http://cred-proxy:9099<path>...` over plain HTTP
|
|
||||||
and the proxy strips any inbound `Authorization`, injects
|
|
||||||
`<auth_scheme> <token>` using the value held only in its own
|
|
||||||
container's environ, and forwards to the real upstream over
|
|
||||||
HTTPS. SSE responses stream back unbuffered. The cred-proxy's
|
|
||||||
outbound HTTPS routes through pipelock (it trusts pipelock's
|
|
||||||
per-bottle CA), so pipelock's egress allowlist + body scanner
|
|
||||||
apply to cred-proxy traffic the same way they apply to direct
|
|
||||||
agent traffic. Smart-HTTP push paths (`/git-receive-pack`,
|
|
||||||
`/info/refs?service=git-receive-pack`) are refused at the
|
|
||||||
proxy — push must go through `bottle.git` / git-gate where
|
|
||||||
gitleaks runs. Optional per-route `role` tags drive agent-side
|
|
||||||
rewrites: `anthropic-base-url`, `npm-registry`, `git-insteadof`,
|
|
||||||
`tea-login`. The agent's `printenv` shows only proxy URLs —
|
|
||||||
none of the real token values. Design in
|
|
||||||
`docs/prds/0010-cred-proxy.md`.
|
|
||||||
|
|
||||||
When the agent exits, `cli.py` tears down every sidecar that was
|
|
||||||
brought up and the two networks; nothing about a bottle persists
|
|
||||||
between runs.
|
|
||||||
|
|
||||||
## Quickstart
|
## Quickstart
|
||||||
|
|
||||||
Requires Docker on the host and a long-lived Claude Code OAuth token in
|
Requires Docker on the host and a long-lived Claude Code OAuth token (`claude setup-token`) exported as `BOT_BOTTLE_CLAUDE_OAUTH_TOKEN`.
|
||||||
your shell env.
|
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
./cli.py start <agent> # builds the image on first run, drops you into claude
|
./cli.py start <agent> # builds the image on first run, drops you into claude
|
||||||
```
|
```
|
||||||
|
|
||||||
The container is removed automatically when the session ends. If the script
|
|
||||||
is killed with SIGKILL the exit trap won't fire and the container may be
|
|
||||||
left running; remove it with `docker rm -f <container-name>`.
|
|
||||||
|
|
||||||
### Smolmachines backend (experimental, macOS-only)
|
|
||||||
|
|
||||||
A second backend runs the agent in a smolvm micro-VM (libkrun) with the
|
|
||||||
sidecar bundle still in Docker. Selected via
|
|
||||||
`BOT_BOTTLE_BACKEND=smolmachines ./cli.py start <agent>`. Requires
|
|
||||||
`smolvm` on PATH (`curl -sSL https://smolmachines.com/install.sh | sh`).
|
|
||||||
|
|
||||||
The integration tests run against whichever backend the env var
|
|
||||||
selects and skip cleanly when its prerequisites are missing.
|
|
||||||
|
|
||||||
**One-time sudo on first launch (macOS):** smolmachines bottles
|
|
||||||
each reserve a loopback alias from a pool (`127.0.0.16` ..
|
|
||||||
`127.0.0.31`) and bind their bundle's port-forwards to it; the
|
|
||||||
first `./cli.py start` after each reboot prompts for sudo to add
|
|
||||||
missing aliases via `ifconfig lo0 alias`. Aliases persist until
|
|
||||||
reboot; subsequent launches don't prompt. The agent's TSI
|
|
||||||
allowlist is the alias's `/32`, so each bottle can only reach
|
|
||||||
its own bundle's published ports — not other bottles' ports,
|
|
||||||
not other host loopback services (postgres, dev servers, etc.).
|
|
||||||
|
|
||||||
This enforcement requires a workaround for a smolvm 0.8.0 bug:
|
|
||||||
the CLI's `--allow-cidr` flag is silently dropped when combined
|
|
||||||
with `--from <smolmachine>`. The launcher patches smolvm's
|
|
||||||
persistent state DB
|
|
||||||
(`~/Library/Application Support/smolvm/server/smolvm.db`)
|
|
||||||
directly between `machine create` and `machine start` to set
|
|
||||||
the allowlist. The hack falls away automatically when smolvm
|
|
||||||
honors the flag upstream — see the `loopback_alias` module's
|
|
||||||
docstring for the investigation trail.
|
|
||||||
|
|
||||||
## Manifest
|
## Manifest
|
||||||
|
|
||||||
Bottles and agents live as Markdown files with YAML frontmatter under
|
Bottles and agents are Markdown files with YAML frontmatter under `~/.bot-bottle/`. The Markdown body is the system prompt. Bottles live in `~/.bot-bottle/bottles/`; agents may also be shipped by a repo at `<repo>/.bot-bottle/agents/<name>.md`.
|
||||||
`~/.bot-bottle/`. Each bottle is one file in `bottles/`, each agent
|
|
||||||
is one file in `agents/`:
|
|
||||||
|
|
||||||
```
|
**Bottle** (`~/.bot-bottle/bottles/gitea-dev.md`):
|
||||||
~/.bot-bottle/
|
|
||||||
├── bottles/
|
|
||||||
│ ├── dev.md
|
|
||||||
│ └── gitea-dev.md
|
|
||||||
└── agents/
|
|
||||||
├── implementer.md
|
|
||||||
└── researcher.md
|
|
||||||
```
|
|
||||||
|
|
||||||
The filename (without `.md`) is the entity's name. Filenames must
|
|
||||||
match `[a-z][a-z0-9-]*`; files that don't are skipped with a warning.
|
|
||||||
|
|
||||||
A repo can ship its own agent files alongside its code at
|
|
||||||
`<repo>/.bot-bottle/agents/<name>.md`. Those agents reference
|
|
||||||
bottles defined in `~/.bot-bottle/bottles/` (the only place
|
|
||||||
bottles can come from); a `bottles/` subdir in a repo is ignored
|
|
||||||
with a warning. **This is the trust boundary**: bottle infrastructure
|
|
||||||
— credentials, egress allowlists, git remotes — comes from your home
|
|
||||||
directory only. A cloned repo cannot redirect a host env var to an
|
|
||||||
attacker-named upstream because it has no way to declare a bottle.
|
|
||||||
|
|
||||||
### Bottle composition with `extends:`
|
|
||||||
|
|
||||||
A bottle can inherit from another via `extends: <bottle-name>` so
|
|
||||||
operators don't have to duplicate a whole bottle file to vary one
|
|
||||||
field (PRD 0025). The parent's resolved config is the base; the
|
|
||||||
child's declared fields overlay. Merge rules:
|
|
||||||
|
|
||||||
- `env:` — dict merge, child wins on key collision.
|
|
||||||
- `git.user:` — per-field overlay (child's non-empty `name` /
|
|
||||||
`email` wins; empty falls through to parent).
|
|
||||||
- `git.remotes:` — dict merge by host, child wins on host collision.
|
|
||||||
An explicit `git.remotes: {}` clears the parent's remotes; omitting
|
|
||||||
`git.remotes` inherits the parent's remotes.
|
|
||||||
- `agent_provider:`, `egress:`, `supervise:` — full replace when the
|
|
||||||
child declares the field.
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
---
|
|
||||||
extends: dev # inherit everything from bottles/dev.md
|
|
||||||
egress:
|
|
||||||
routes:
|
|
||||||
- host: staging.example.com
|
|
||||||
auth:
|
|
||||||
scheme: Bearer
|
|
||||||
token_ref: STAGING_TOKEN
|
|
||||||
---
|
|
||||||
```
|
|
||||||
|
|
||||||
Cycles (`A extends B extends A`), self-references, and missing
|
|
||||||
parents die at parse with a clear pointer. Bottles remain
|
|
||||||
`$HOME`-only — `extends:` preserves the trust boundary above.
|
|
||||||
|
|
||||||
### Provider base bottles
|
|
||||||
|
|
||||||
Keep provider/runtime policy in one home-owned base bottle, then have
|
|
||||||
task bottles extend it. That keeps provider egress/auth in one place
|
|
||||||
without hiding security-relevant routes behind `agent_provider.template`.
|
|
||||||
|
|
||||||
For example, `~/.bot-bottle/bottles/claude.md` can hold the Claude
|
|
||||||
provider selection and Anthropic API egress:
|
|
||||||
|
|
||||||
````markdown
|
````markdown
|
||||||
---
|
---
|
||||||
agent_provider:
|
extends: claude # inherit the Claude provider boundary
|
||||||
template: claude
|
|
||||||
|
|
||||||
egress:
|
|
||||||
routes:
|
|
||||||
- host: api.anthropic.com
|
|
||||||
role: claude_code_oauth
|
|
||||||
auth:
|
|
||||||
scheme: Bearer
|
|
||||||
token_ref: BOT_BOTTLE_CLAUDE_OAUTH_TOKEN
|
|
||||||
pipelock:
|
|
||||||
tls_passthrough: true
|
|
||||||
---
|
|
||||||
|
|
||||||
Common Claude provider boundary.
|
|
||||||
````
|
|
||||||
|
|
||||||
Task bottles can then inherit that provider boundary and add their own
|
|
||||||
env/git configuration without repeating the Claude route.
|
|
||||||
|
|
||||||
### Example bottle (`~/.bot-bottle/bottles/gitea-dev.md`)
|
|
||||||
|
|
||||||
````markdown
|
|
||||||
---
|
|
||||||
extends: claude
|
|
||||||
|
|
||||||
env:
|
env:
|
||||||
GIT_AUTHOR_NAME: didericis
|
GIT_AUTHOR_NAME: didericis
|
||||||
@@ -337,148 +95,7 @@ git:
|
|||||||
Upstream: ssh://git@gitea.dideric.is:30009/didericis/bot-bottle.git
|
Upstream: ssh://git@gitea.dideric.is:30009/didericis/bot-bottle.git
|
||||||
IdentityFile: /Users/didericis/.ssh/id_ed25519_gitea
|
IdentityFile: /Users/didericis/.ssh/id_ed25519_gitea
|
||||||
KnownHostKey: ssh-ed25519 AAAA...
|
KnownHostKey: ssh-ed25519 AAAA...
|
||||||
---
|
|
||||||
|
|
||||||
The `gitea-dev` bottle. Backs my work on personal projects: provider
|
|
||||||
auth through egress and gitea.dideric.is over SSH.
|
|
||||||
````
|
|
||||||
|
|
||||||
For a Codex-backed base bottle, set `agent_provider.template: codex`.
|
|
||||||
The Codex template expects ChatGPT/device login state instead of an
|
|
||||||
`OPENAI_API_KEY` env var; no API-key placeholder is forwarded into the
|
|
||||||
agent. To let bot-bottle read the host's current Codex ChatGPT access
|
|
||||||
token and inject it from egress only for Codex's API calls, opt in
|
|
||||||
explicitly:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
agent_provider:
|
|
||||||
template: codex
|
|
||||||
forward_host_credentials: true
|
|
||||||
|
|
||||||
egress:
|
|
||||||
routes:
|
|
||||||
- host: auth.openai.com
|
|
||||||
path_allowlist:
|
|
||||||
- /api/accounts/deviceauth/
|
|
||||||
```
|
|
||||||
|
|
||||||
Run `codex login --device-auth` on the host before launch. The
|
|
||||||
launcher reads `tokens.access_token` from the host's
|
|
||||||
`~/.codex/auth.json`, verifies it is fresh user/device auth, and passes
|
|
||||||
it to the sidecar's `EGRESS_TOKEN_N` env slot. The agent container gets
|
|
||||||
a dummy `~/.codex/auth.json` that preserves the host auth-mode shape
|
|
||||||
but replaces credential values with placeholders. It keeps the selected
|
|
||||||
ChatGPT account id so Codex sends requests for the same account while
|
|
||||||
egress owns the real bearer token. The agent never receives real access
|
|
||||||
tokens, refresh tokens, or `OPENAI_API_KEY`. The effective egress table
|
|
||||||
automatically adds or upgrades `api.openai.com` and `chatgpt.com` to
|
|
||||||
authenticated routes when `forward_host_credentials` is true.
|
|
||||||
|
|
||||||
The built-in Codex template uses `Dockerfile.codex`; set
|
|
||||||
`agent_provider.dockerfile` to build the agent from a custom Dockerfile
|
|
||||||
while keeping the bot-bottle sidecars in place.
|
|
||||||
|
|
||||||
### Example agent (`~/.bot-bottle/agents/gitea-helper.md`)
|
|
||||||
|
|
||||||
````markdown
|
|
||||||
---
|
|
||||||
bottle: gitea-dev
|
|
||||||
skills:
|
|
||||||
- init-prd
|
|
||||||
git:
|
|
||||||
user:
|
|
||||||
name: gitea-helper
|
|
||||||
email: eric+gitea-helper@dideric.is
|
|
||||||
---
|
|
||||||
|
|
||||||
You help maintain Gitea-hosted projects.
|
|
||||||
````
|
|
||||||
|
|
||||||
The agent's Markdown body is its system prompt (whitespace
|
|
||||||
stripped). The frontmatter declares the bottle to launch in and any
|
|
||||||
skills to mount. You can also include Claude Code subagent fields
|
|
||||||
(`name`, `description`, `model`, `color`, `memory`) in the
|
|
||||||
frontmatter — bot-bottle ignores them at launch but doesn't
|
|
||||||
reject them, so the same file can drop into `~/.claude/agents/` as a
|
|
||||||
Claude Code subagent.
|
|
||||||
|
|
||||||
An agent may also declare `git.user` (`name` / `email`). It overlays
|
|
||||||
the referenced bottle's `git.user` per-field — the agent's non-empty
|
|
||||||
fields win, the rest fall through to the bottle — so two agents can
|
|
||||||
share one bottle and still commit under distinct identities without
|
|
||||||
an identity-only bottle (PRD 0027). Only `git.user` is allowed at the
|
|
||||||
agent level; `git.remotes` stays bottle-only because it carries
|
|
||||||
credentials and host trust. The launch preflight and `cli.py info`
|
|
||||||
print the effective identity annotated `(agent)` / `(bottle)` so you
|
|
||||||
can see where each field came from. Git authorship is not a
|
|
||||||
credential — push auth is the bottle's remote key/token — so a
|
|
||||||
repo-shipped agent setting its own identity grants no access; treat
|
|
||||||
an agent identity as *claimed, not vouched*.
|
|
||||||
|
|
||||||
Unknown top-level frontmatter keys die at load with a "did you mean"
|
|
||||||
pointer; typos don't silently ghost into an empty config.
|
|
||||||
|
|
||||||
The YAML subset the frontmatter accepts is bounded (flat keys,
|
|
||||||
strings / ints / true-or-false bools / null / lists / one-level
|
|
||||||
nested dicts). Anchors, multi-line block scalars, tags, and
|
|
||||||
ambiguous bare strings (`yes` / `NO` / `2026-05-24` /
|
|
||||||
`0x...`) all die with a clear pointer at the spec — quote your
|
|
||||||
strings when in doubt. The full schema lives in
|
|
||||||
`bot_bottle/yaml_subset.py` (~450 lines, stdlib-only, no PyYAML).
|
|
||||||
|
|
||||||
Working examples live under `examples/`. Pipelock's design lives in
|
|
||||||
`docs/prds/0001-per-agent-egress-proxy-via-pipelock.md` and the
|
|
||||||
rationale in `docs/research/pipelock-assessment.md`. The trust
|
|
||||||
boundary rationale lives in `docs/prds/0011-per-file-md-manifest.md`.
|
|
||||||
|
|
||||||
## Auth: Claude OAuth token, not API key
|
|
||||||
|
|
||||||
Bottles that use `agent_provider.template: claude` authenticate
|
|
||||||
`claude` inside the container with the same Pro/Max subscription you
|
|
||||||
already use on the host, via a long-lived OAuth token. No
|
|
||||||
`ANTHROPIC_API_KEY` is needed.
|
|
||||||
|
|
||||||
**Why a token instead of mounting `~/.claude.json`:** on macOS, Claude
|
|
||||||
Code stores OAuth credentials in the encrypted Keychain, not in
|
|
||||||
`~/.claude.json`. Mounting that file into a Linux container does not
|
|
||||||
carry the credentials with it. Linux hosts keep credentials in
|
|
||||||
`~/.claude/.credentials.json`, but to keep the launcher portable
|
|
||||||
bot-bottle uses the env-var path on every host.
|
|
||||||
|
|
||||||
**One-time setup on the host:**
|
|
||||||
|
|
||||||
```sh
|
|
||||||
claude setup-token # browser login, prints a ~1-year OAuth token
|
|
||||||
```
|
|
||||||
|
|
||||||
Stash the token in your shell env (e.g. `~/.zshrc` or a secret manager)
|
|
||||||
as `BOT_BOTTLE_CLAUDE_OAUTH_TOKEN`:
|
|
||||||
|
|
||||||
```sh
|
|
||||||
export BOT_BOTTLE_CLAUDE_OAUTH_TOKEN="<token>"
|
|
||||||
```
|
|
||||||
|
|
||||||
The Claude bottle reaches the Anthropic API only through the cred-proxy
|
|
||||||
sidecar. To let `claude` authenticate, declare an egress route with
|
|
||||||
`role: claude_code_oauth` and
|
|
||||||
`token_ref: BOT_BOTTLE_CLAUDE_OAUTH_TOKEN`:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
egress:
|
|
||||||
routes:
|
|
||||||
- host: api.anthropic.com
|
|
||||||
role: claude_code_oauth
|
|
||||||
auth:
|
|
||||||
scheme: Bearer
|
|
||||||
token_ref: BOT_BOTTLE_CLAUDE_OAUTH_TOKEN
|
|
||||||
pipelock:
|
|
||||||
tls_passthrough: true
|
|
||||||
```
|
|
||||||
|
|
||||||
Routes that resolve to private or Tailscale addresses can opt into
|
|
||||||
pipelock's SSRF destination allowlist explicitly:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
egress:
|
egress:
|
||||||
routes:
|
routes:
|
||||||
- host: gitea.dideric.is
|
- host: gitea.dideric.is
|
||||||
@@ -486,38 +103,31 @@ egress:
|
|||||||
scheme: token
|
scheme: token
|
||||||
token_ref: BOT_BOTTLE_GITEA_TOKEN
|
token_ref: BOT_BOTTLE_GITEA_TOKEN
|
||||||
pipelock:
|
pipelock:
|
||||||
ssrf_ip_allowlist:
|
ssrf_ip_allowlist: [100.78.141.42/32]
|
||||||
- 100.78.141.42/32
|
---
|
||||||
```
|
|
||||||
|
|
||||||
At launch, `cli.py` reads `BOT_BOTTLE_CLAUDE_OAUTH_TOKEN` from the host
|
The `gitea-dev` bottle. Provider auth via the inherited Claude route;
|
||||||
env and forwards it into the cred-proxy container's environ — never
|
gitea over SSH for push, token over HTTPS for the API.
|
||||||
into the agent's. The agent receives `ANTHROPIC_BASE_URL` pointing at
|
````
|
||||||
`http://cred-proxy:9099/anthropic` and a non-secret placeholder for
|
|
||||||
`CLAUDE_CODE_OAUTH_TOKEN` (claude-code refuses to start without one;
|
|
||||||
the proxy strips and replaces the header on every request). `printenv`
|
|
||||||
inside the agent does not surface the real token, and the value is
|
|
||||||
never written to disk or placed on argv on the host.
|
|
||||||
|
|
||||||
A Claude bottle without a `claude_code_oauth` route has no path to the
|
**Agent** (`~/.bot-bottle/agents/gitea-helper.md`):
|
||||||
Anthropic API — there is no fallback that forwards the token directly
|
|
||||||
to the agent. Caveats: the token is bound to your subscription tier
|
````markdown
|
||||||
(Pro/Max/Team/Enterprise), it does not work with `claude --bare`
|
---
|
||||||
(which only reads `ANTHROPIC_API_KEY`), and if it leaks, regenerate
|
bottle: gitea-dev
|
||||||
via `claude setup-token` again. Reference:
|
skills:
|
||||||
<https://code.claude.com/docs/en/authentication>.
|
- init-prd
|
||||||
|
---
|
||||||
|
|
||||||
|
You help maintain Gitea-hosted projects.
|
||||||
|
````
|
||||||
|
|
||||||
|
More examples in `examples/`. Full design lives under `docs/prds/`; the trust-boundary rationale is in `docs/prds/0011-per-file-md-manifest.md`.
|
||||||
|
|
||||||
## Trademarks
|
## Trademarks
|
||||||
|
|
||||||
bot-bottle is an independent project and is not affiliated with,
|
bot-bottle is an independent project and is not affiliated with, endorsed by, or sponsored by Anthropic, PBC. "Claude" and "Claude Code" are trademarks of Anthropic, PBC; the project name uses "claude" descriptively to indicate that the tool runs Claude Code inside a sandbox.
|
||||||
endorsed by, or sponsored by Anthropic, PBC. "Claude" and "Claude
|
|
||||||
Code" are trademarks of Anthropic, PBC; the project name uses
|
|
||||||
"claude" descriptively to indicate that the tool runs Claude Code
|
|
||||||
inside a sandbox.
|
|
||||||
|
|
||||||
## License
|
## License
|
||||||
|
|
||||||
Copyright 2026 Eric Bauerfeld
|
Copyright 2026 Eric Bauerfeld. Licensed under the Apache License, Version 2.0. See [LICENSE](LICENSE) for the full text.
|
||||||
|
|
||||||
Licensed under the Apache License, Version 2.0. See [LICENSE](LICENSE)
|
|
||||||
for the full text.
|
|
||||||
|
|||||||
@@ -0,0 +1,283 @@
|
|||||||
|
# PRD 0049: Named / Labelled Agents
|
||||||
|
|
||||||
|
- **Status:** Draft
|
||||||
|
- **Author:** didericis
|
||||||
|
- **Created:** 2026-06-03
|
||||||
|
- **Issue:** #171
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
At agent launch time, prompt the operator for a short human-readable label
|
||||||
|
(defaulting to the manifest agent key) and an optional color from the 16-color
|
||||||
|
ANSI palette. Store both in the bottle's `metadata.json`. Display the label —
|
||||||
|
rendered in the chosen color — in the dashboard's active-agents pane, replacing
|
||||||
|
the bare manifest key. Inject the label and color into the in-container
|
||||||
|
`claude.json` as `name` / `color` so Claude Code can surface them in its own
|
||||||
|
harness when upstream support lands.
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
The dashboard's agents pane identifies each running instance by its manifest
|
||||||
|
agent key (e.g., `implementer`) plus a random slug suffix. When an operator
|
||||||
|
runs three `implementer` bottles simultaneously — one each for three different
|
||||||
|
repos — the pane shows:
|
||||||
|
|
||||||
|
```
|
||||||
|
[docker] a3f9 implementer started 14:02:11 [egress,pipelock]
|
||||||
|
[docker] b81c implementer started 14:03:45 [egress,pipelock]
|
||||||
|
[docker] d220 implementer started 14:05:01 [egress,pipelock]
|
||||||
|
```
|
||||||
|
|
||||||
|
There is no way to tell which bottle is working on which task without attaching
|
||||||
|
to each one in turn. The slug is opaque; the manifest key is shared. Operators
|
||||||
|
working a multi-bottle session resort to keeping a mental map of slug→task,
|
||||||
|
which breaks the moment they switch windows.
|
||||||
|
|
||||||
|
## Goals / Success Criteria
|
||||||
|
|
||||||
|
1. After the operator selects an agent name (dashboard picker or CLI argument),
|
||||||
|
they are prompted for a label. The prompt suggests the manifest key as the
|
||||||
|
default; pressing Enter (or providing no input) accepts it. The label may
|
||||||
|
contain any printable characters up to 64 bytes.
|
||||||
|
2. After the label prompt, the operator is optionally prompted for a color from
|
||||||
|
the 16-color ANSI palette (names: `black`, `red`, `green`, `yellow`, `blue`,
|
||||||
|
`magenta`, `cyan`, `white`, `bright-black`, `bright-red`, `bright-green`,
|
||||||
|
`bright-yellow`, `bright-blue`, `bright-magenta`, `bright-cyan`,
|
||||||
|
`bright-white`). Pressing Enter without a selection skips color entirely.
|
||||||
|
3. `label` and `color` are stored in `BottleMetadata` and written to the
|
||||||
|
bottle's `metadata.json`. Both fields default to `""` (empty / unset).
|
||||||
|
4. `ActiveAgent` carries `label` and `color`; `enumerate_active()` reads them
|
||||||
|
from `metadata.json`.
|
||||||
|
5. `_format_agent_row` uses the label when non-empty (falling back to
|
||||||
|
`agent_name`). If a non-empty color is set and the terminal supports it, the
|
||||||
|
label substring is rendered in that color.
|
||||||
|
6. `BottleSpec` carries `label` and `color`; the docker backend's `prepare`
|
||||||
|
step copies them into `BottleMetadata`.
|
||||||
|
7. `agent_provider.py` writes `label` → `"name"` and `color` → `"color"` into
|
||||||
|
the generated `claude.json`, alongside the existing fields. Fields are
|
||||||
|
omitted when empty.
|
||||||
|
8. The dashboard's `_new_agent_flow` (PRD 0020) includes the label+color step
|
||||||
|
between agent selection and the backend picker.
|
||||||
|
9. `cmd_start` (CLI) includes the label+color step after argument validation
|
||||||
|
and before prepare-with-preflight.
|
||||||
|
10. All existing unit tests stay green; no new tests are required for this
|
||||||
|
change (the label/color fields are thin plumbing with no branching logic
|
||||||
|
worth unit-testing beyond the already-tested metadata read/write path).
|
||||||
|
|
||||||
|
## Non-goals
|
||||||
|
|
||||||
|
- Showing the agent label inside the Claude Code TUI (status line, terminal
|
||||||
|
title, custom header). That requires upstream Claude Code / codex support.
|
||||||
|
Writing to `claude.json` is best-effort scaffolding for when that lands.
|
||||||
|
- Per-bottle color affecting anything outside the dashboard agents pane (e.g.,
|
||||||
|
proposal-pane highlights, log prefixes).
|
||||||
|
- Validating or constraining label content beyond the 64-byte printable cap.
|
||||||
|
- Persisting color-pair state across dashboard restarts (color pairs are
|
||||||
|
initialized fresh each session).
|
||||||
|
- Editing the label or color of an already-running bottle.
|
||||||
|
- Exposing label/color via `./cli.py list` (out of scope for v1; trivial to
|
||||||
|
add later since the field will be in metadata).
|
||||||
|
|
||||||
|
## Design
|
||||||
|
|
||||||
|
### Data flow
|
||||||
|
|
||||||
|
```
|
||||||
|
operator input
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
BottleSpec.label, BottleSpec.color
|
||||||
|
│
|
||||||
|
├─► docker/prepare.py → BottleMetadata.label / .color → metadata.json
|
||||||
|
│
|
||||||
|
└─► agent_provider.py → claude.json {"name": label, "color": color}
|
||||||
|
(omitted when empty)
|
||||||
|
|
||||||
|
dashboard refresh
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
enumerate_active() → read_metadata(slug) → ActiveAgent.label / .color
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
_format_agent_row → label (colored) in the row string
|
||||||
|
```
|
||||||
|
|
||||||
|
### BottleSpec changes
|
||||||
|
|
||||||
|
```python
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class BottleSpec:
|
||||||
|
manifest: Manifest
|
||||||
|
agent_name: str
|
||||||
|
copy_cwd: bool
|
||||||
|
user_cwd: str
|
||||||
|
identity: str = ""
|
||||||
|
label: str = "" # operator-chosen display name; defaults to agent_name at render time
|
||||||
|
color: str = "" # one of the 16 ANSI color names, or "" for terminal default
|
||||||
|
```
|
||||||
|
|
||||||
|
`label` and `color` default to `""` so all existing callers remain valid with
|
||||||
|
no changes.
|
||||||
|
|
||||||
|
### BottleMetadata changes
|
||||||
|
|
||||||
|
Add two new fields with backward-compatible defaults:
|
||||||
|
|
||||||
|
```python
|
||||||
|
@dataclass
|
||||||
|
class BottleMetadata:
|
||||||
|
identity: str
|
||||||
|
agent_name: str
|
||||||
|
cwd: str
|
||||||
|
copy_cwd: bool
|
||||||
|
started_at: str
|
||||||
|
compose_project: str
|
||||||
|
backend: str
|
||||||
|
label: str = ""
|
||||||
|
color: str = ""
|
||||||
|
```
|
||||||
|
|
||||||
|
`metadata.json` written by older bot-bottle versions won't have these keys;
|
||||||
|
`read_metadata` already uses `dict.get` with defaults, so existing slugs load
|
||||||
|
cleanly with `label=""`, `color=""`.
|
||||||
|
|
||||||
|
### ActiveAgent changes
|
||||||
|
|
||||||
|
```python
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class ActiveAgent:
|
||||||
|
backend_name: str
|
||||||
|
slug: str
|
||||||
|
agent_name: str
|
||||||
|
started_at: str
|
||||||
|
services: tuple[str, ...]
|
||||||
|
label: str = ""
|
||||||
|
color: str = ""
|
||||||
|
```
|
||||||
|
|
||||||
|
`enumerate_active()` copies `label` and `color` out of `BottleMetadata` when
|
||||||
|
constructing each `ActiveAgent`. The smolmachines backend gets the same
|
||||||
|
additions for symmetry; it reads from its own metadata path.
|
||||||
|
|
||||||
|
### Dashboard row rendering
|
||||||
|
|
||||||
|
`_format_agent_row` already falls through cleanly on missing fields. The
|
||||||
|
change is:
|
||||||
|
|
||||||
|
```python
|
||||||
|
display_name = a.label if a.label else a.agent_name
|
||||||
|
```
|
||||||
|
|
||||||
|
Color rendering uses the existing `_try_init_green()` pattern as a model.
|
||||||
|
A `_color_pair_for(color_name)` helper initialises a fresh curses color pair
|
||||||
|
for the requested named color and returns its attr (or 0 on failure). Each
|
||||||
|
unique color in the active agent list gets its own pair index. Color pairs are
|
||||||
|
allocated lazily and cached in a `dict[str, int]` that lives for the duration
|
||||||
|
of the dashboard session.
|
||||||
|
|
||||||
|
The 16 ANSI color name → curses constant mapping:
|
||||||
|
|
||||||
|
| Name | curses constant |
|
||||||
|
|------|----------------|
|
||||||
|
| `black` | `curses.COLOR_BLACK` |
|
||||||
|
| `red` | `curses.COLOR_RED` |
|
||||||
|
| `green` | `curses.COLOR_GREEN` |
|
||||||
|
| `yellow` | `curses.COLOR_YELLOW` |
|
||||||
|
| `blue` | `curses.COLOR_BLUE` |
|
||||||
|
| `magenta` | `curses.COLOR_MAGENTA` |
|
||||||
|
| `cyan` | `curses.COLOR_CYAN` |
|
||||||
|
| `white` | `curses.COLOR_WHITE` |
|
||||||
|
| `bright-*` | same constant + `curses.A_BOLD` |
|
||||||
|
|
||||||
|
Terminals that don't support color fall back to plain text (the helper returns
|
||||||
|
0, which ORed in is a no-op — same pattern as `_try_init_green`).
|
||||||
|
|
||||||
|
### Label + color prompt — dashboard
|
||||||
|
|
||||||
|
In `_new_agent_flow`, after `_picker_modal` returns a non-None name and before
|
||||||
|
`_backend_picker_modal`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
label, color = _label_color_modal(stdscr, default_label=picked)
|
||||||
|
```
|
||||||
|
|
||||||
|
`_label_color_modal` uses `curses.endwin()` → text-mode prompts → restore
|
||||||
|
(the same drop-and-resume pattern as the existing editor flow and preflight
|
||||||
|
Y/N). Two sequential prompts:
|
||||||
|
|
||||||
|
```
|
||||||
|
bot-bottle: agent label [implementer]: <operator types>
|
||||||
|
bot-bottle: color (red/green/blue/… or Enter to skip): <operator types>
|
||||||
|
```
|
||||||
|
|
||||||
|
Invalid color names are silently ignored (treated as empty). The function
|
||||||
|
returns `(label, color)` — both strings, both possibly `""`.
|
||||||
|
|
||||||
|
### Label + color prompt — CLI
|
||||||
|
|
||||||
|
In `cmd_start`, after argument parsing and before `_launch_bottle`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
label = _text_prompt_label(args.name)
|
||||||
|
color = _text_prompt_color()
|
||||||
|
```
|
||||||
|
|
||||||
|
`_text_prompt_label(default)` writes `"bot-bottle: agent label [{default}]: "`
|
||||||
|
to stderr and returns the stripped input (or `default` if blank).
|
||||||
|
`_text_prompt_color()` writes the color prompt and returns the stripped input
|
||||||
|
(or `""` if blank or invalid).
|
||||||
|
|
||||||
|
Both use `read_tty_line()` (already in `start.py`) for the read.
|
||||||
|
|
||||||
|
### Claude Code config injection
|
||||||
|
|
||||||
|
In `agent_provider.py`, where `claude_config.write_text(...)` is called,
|
||||||
|
expand the JSON dict conditionally:
|
||||||
|
|
||||||
|
```python
|
||||||
|
payload = {
|
||||||
|
"hasCompletedOnboarding": True,
|
||||||
|
"theme": "dark",
|
||||||
|
"bypassPermissionsModeAccepted": True,
|
||||||
|
"projects": claude_projects,
|
||||||
|
}
|
||||||
|
if spec.label:
|
||||||
|
payload["name"] = spec.label
|
||||||
|
if spec.color:
|
||||||
|
payload["color"] = spec.color
|
||||||
|
claude_config.write_text(json.dumps(payload, indent=2) + "\n")
|
||||||
|
```
|
||||||
|
|
||||||
|
`spec` here is the `AgentProvisionSpec` (or equivalent) that `agent_provider`
|
||||||
|
already receives; it needs `label` and `color` threaded in from `BottleSpec`
|
||||||
|
through whatever plan/provision object the provider operates on.
|
||||||
|
|
||||||
|
## Implementation chunks
|
||||||
|
|
||||||
|
Two PRs, each independently mergeable.
|
||||||
|
|
||||||
|
### Chunk 1 — schema + storage
|
||||||
|
|
||||||
|
- Add `label: str = ""` and `color: str = ""` to `BottleSpec`,
|
||||||
|
`BottleMetadata`, and `ActiveAgent`.
|
||||||
|
- `docker/prepare.py`: copy `spec.label` / `spec.color` into `BottleMetadata`.
|
||||||
|
- `docker/enumerate.py`: copy `metadata.label` / `metadata.color` into
|
||||||
|
`ActiveAgent`.
|
||||||
|
- `agent_provider.py` (or the plan object it reads): thread label/color through
|
||||||
|
to `claude.json` write.
|
||||||
|
- Smolmachines backend: parallel changes to metadata read/write and
|
||||||
|
`ActiveAgent` construction.
|
||||||
|
- No prompt changes; no UI changes. All existing behavior is identical.
|
||||||
|
|
||||||
|
### Chunk 2 — prompts + display
|
||||||
|
|
||||||
|
- `start.py`: add `_text_prompt_label` and `_text_prompt_color`; call them in
|
||||||
|
`cmd_start` before `_launch_bottle`; pass `label` / `color` into `BottleSpec`.
|
||||||
|
- `dashboard.py`: add `_label_color_modal` (drop-and-resume); call it in
|
||||||
|
`_new_agent_flow`; pass label/color into `BottleSpec`; add
|
||||||
|
`_color_pair_for` helper; update `_format_agent_row` to use `a.label` with
|
||||||
|
color rendering.
|
||||||
|
|
||||||
|
## Open questions
|
||||||
|
|
||||||
|
None.
|
||||||
@@ -0,0 +1,151 @@
|
|||||||
|
# Gitea Webhook Agent Dispatch
|
||||||
|
|
||||||
|
## Question
|
||||||
|
|
||||||
|
How should bot-bottle spawn and manage agents in response to Gitea PR events — and how do we reuse the same agent (with its full session context) across every event in a PR's lifecycle?
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
A lightweight webhook receiver maps Gitea PR events to `cli.py` invocations. Spawning is straightforward: the existing work on non-interactive run mode (see [host-dispatch-to-container-agents.md](host-dispatch-to-container-agents.md)) is the missing piece. Session continuity is harder: it requires tracking two identifiers per open PR — the **bottle identity** (bot-bottle's slug for the container state dir) and the **Claude session ID** (the UUID Claude writes to its JSONL transcript). The transcript snapshot mechanism already used by capability-block is the right foundation; it just needs a non-interactive path and a PR-keyed store.
|
||||||
|
|
||||||
|
## Gitea Webhook Events for PR Lifecycle
|
||||||
|
|
||||||
|
Gitea fires `X-Gitea-Event: pull_request` (with an `action` field) for most PR state changes. The payload always includes `pull_request.number`, which is the stable key for correlating events to a running agent.
|
||||||
|
|
||||||
|
| `X-Gitea-Event` value | Relevant `action` values | When it fires |
|
||||||
|
|---|---|---|
|
||||||
|
| `pull_request` | `opened`, `reopened`, `closed`, `synchronized` | PR created, closed, or pushed to |
|
||||||
|
| `pull_request_comment` | `created`, `edited` | Timeline comment posted |
|
||||||
|
| `pull_request_review_approved` | — | Review submitted with approval |
|
||||||
|
| `pull_request_review_rejected` | — | Review submitted requesting changes |
|
||||||
|
| `pull_request_review_comment` | — | Inline code review comment |
|
||||||
|
| `pull_request_sync` | — | New commits pushed to the PR branch |
|
||||||
|
|
||||||
|
`pull_request` with `action: synchronized` and `pull_request_sync` both fire on push; they carry the same information but are separate subscriptions in the webhook config UI. Subscribe to `pull_request` and `pull_request_review` (the umbrella) plus `pull_request_comment` to cover the full lifecycle.
|
||||||
|
|
||||||
|
The webhook receiver validates the `X-Gitea-Signature-256` HMAC header (SHA-256 of the raw body, keyed by the configured secret) before dispatching.
|
||||||
|
|
||||||
|
## Spawning an Agent From a Webhook
|
||||||
|
|
||||||
|
### What we need from bot-bottle
|
||||||
|
|
||||||
|
The current `cli.py start` is interactive — it prompts y/N and attaches a tty. A webhook handler needs a non-interactive mode that:
|
||||||
|
|
||||||
|
1. Starts the container for a named agent.
|
||||||
|
2. Runs `claude -p "<task>" --output-format json --dangerously-skip-permissions` inside it (no tty, no session picker).
|
||||||
|
3. Captures stdout as JSON, extracts `session_id`.
|
||||||
|
4. Blocks until Claude exits, then tears down.
|
||||||
|
|
||||||
|
The [host-dispatch-to-container-agents](host-dispatch-to-container-agents.md) research proposes `cli.py run <agent> <task>` for exactly this. That command is the prerequisite for everything below. It should return the Claude JSON output so callers can extract `session_id`.
|
||||||
|
|
||||||
|
### Webhook receiver sketch
|
||||||
|
|
||||||
|
The receiver is a small HTTP service (Flask, FastAPI, or a Go net/http handler) running alongside bot-bottle on the host. It:
|
||||||
|
|
||||||
|
1. Validates the HMAC signature.
|
||||||
|
2. Extracts `pull_request.number` and `X-Gitea-Event` / `action`.
|
||||||
|
3. Looks up whether a bottle already exists for this PR number.
|
||||||
|
4. Spawns or resumes accordingly (see next section).
|
||||||
|
5. Optionally posts a comment back to the PR via Gitea API once Claude finishes.
|
||||||
|
|
||||||
|
The receiver does not need to be async or queue-based for a single-repo bot, but should at minimum serialize events for the same PR number (a per-PR lock) to avoid two concurrent sessions clobbering each other's transcript.
|
||||||
|
|
||||||
|
## Reusing the Same Agent Across a PR
|
||||||
|
|
||||||
|
This is the harder problem. Two separate identities need to be tracked and connected:
|
||||||
|
|
||||||
|
### Identity 1: bottle identity (bot-bottle slug)
|
||||||
|
|
||||||
|
The slug is the per-bottle state directory name (`~/.bot-bottle/state/<slug>/`). It's what `cli.py resume <slug>` uses to relaunch a container and mount the preserved state — including the transcript snapshot. This already works for the capability-block flow.
|
||||||
|
|
||||||
|
### Identity 2: Claude session ID
|
||||||
|
|
||||||
|
Claude Code's `--output-format json` response includes a `session_id` UUID. Passing `--resume <session_id>` on a subsequent non-interactive run makes Claude continue from exactly that conversation, with full memory of prior tool calls. `--continue` (which maps to `resume_args` in `agent_provider.py`) only picks up the *most recent* session in the project directory — unsafe when multiple sessions may be running concurrently.
|
||||||
|
|
||||||
|
The session JSONL lives at `~/.claude/projects/<encoded-cwd>/<session_id>.jsonl` inside the container guest. The transcript snapshot (`snapshot_transcript(slug)` in `capability_apply.py`) copies all of `~/.claude` out of the container before teardown, so the JSONL is preserved in `~/.bot-bottle/state/<slug>/transcript/.claude/`. When the bottle is relaunched and the transcript remounted, `claude --resume <session_id>` can find the JSONL at the right path.
|
||||||
|
|
||||||
|
### Per-PR session registry
|
||||||
|
|
||||||
|
The receiver needs a small persistent map:
|
||||||
|
|
||||||
|
```
|
||||||
|
PR number → { bottle_identity: str, claude_session_id: str, agent_name: str }
|
||||||
|
```
|
||||||
|
|
||||||
|
The simplest implementation is a JSON file at `~/.bot-bottle/pr-sessions.json`, written after each successful first-run and updated with each resume. A sqlite database is better if concurrent multi-repo support is needed.
|
||||||
|
|
||||||
|
### Full lifecycle flow
|
||||||
|
|
||||||
|
```
|
||||||
|
PR opened
|
||||||
|
→ webhook: action=opened
|
||||||
|
→ no entry in pr-sessions.json
|
||||||
|
→ cli.py run <agent> "Review PR #N: <title>\n<diff URL>"
|
||||||
|
→ starts container, runs claude -p ... --output-format json
|
||||||
|
→ on success: captures session_id from JSON output
|
||||||
|
→ snapshot_transcript(slug)
|
||||||
|
→ tears down container
|
||||||
|
→ write pr-sessions.json: { pr: N, slug: <slug>, session_id: <uuid> }
|
||||||
|
|
||||||
|
PR gets new commit
|
||||||
|
→ webhook: action=synchronized OR pull_request_sync
|
||||||
|
→ look up pr-sessions.json: found slug + session_id
|
||||||
|
→ cli.py run-resume <slug> --claude-session <session_id> "New commits pushed. Review the diff."
|
||||||
|
→ relaunches container with transcript snapshot mounted
|
||||||
|
→ runs claude -p ... --resume <session_id> --output-format json
|
||||||
|
→ captures new session_id (same or rotated)
|
||||||
|
→ snapshot_transcript(slug) again
|
||||||
|
→ update pr-sessions.json with latest session_id
|
||||||
|
|
||||||
|
Comment @-mentions bot
|
||||||
|
→ webhook: pull_request_comment, action=created
|
||||||
|
→ extract comment body, check for bot mention
|
||||||
|
→ same resume flow as above with comment as the prompt
|
||||||
|
|
||||||
|
PR closed / merged
|
||||||
|
→ webhook: action=closed
|
||||||
|
→ cli.py cleanup <slug> (or equivalent)
|
||||||
|
→ remove from pr-sessions.json
|
||||||
|
```
|
||||||
|
|
||||||
|
### What needs to be built
|
||||||
|
|
||||||
|
| Piece | Status | Notes |
|
||||||
|
|---|---|---|
|
||||||
|
| `cli.py run <agent> <task>` | Missing | Non-interactive start; see host-dispatch research |
|
||||||
|
| `cli.py run-resume <slug> --claude-session <id> <task>` | Missing | Like `resume` but non-interactive, passes `--resume <id>` to claude |
|
||||||
|
| `snapshot_transcript` on clean exit | Exists (PRD 0012) | Already called from `start.py`'s session-end path |
|
||||||
|
| Transcript remount on resume | Exists | `bottle_state.py::transcript_snapshot_dir` → docker cp in on launch |
|
||||||
|
| PR session registry | Missing | Needs to be designed; `~/.bot-bottle/pr-sessions.json` is the simplest start |
|
||||||
|
| Webhook receiver service | Missing | New service; needs to be a declared bottle or run as a host process |
|
||||||
|
|
||||||
|
## Known Rough Edges
|
||||||
|
|
||||||
|
**Session ID is not available from within the session.** The ID is only in the `--output-format json` result, readable after the process exits. There is no env var or hook that exposes it mid-session ([upstream issue #44607](https://github.com/anthropics/claude-code/issues/44607)). For the webhook bot this is fine — the outer receiver reads it from the subprocess result.
|
||||||
|
|
||||||
|
**`--continue` vs `--resume <id>`:** The existing `resume_args = ("--continue",)` in `agent_provider.py` picks up the *most recent* session. For an interactive single-user resume this is fine. For a webhook bot that may have multiple open PRs, it is not safe — two PRs' transcripts would collide if they share a project directory encoding. Use `--resume <session_id>` explicitly.
|
||||||
|
|
||||||
|
**Project directory encoding.** Claude stores sessions keyed by the absolute cwd, encoded as a path. Inside the container the cwd is always `/home/node` or a subdir. As long as every run for the same PR uses the same cwd, `--resume <session_id>` will find the right JSONL. The cwd should be pinned per PR entry in the session registry.
|
||||||
|
|
||||||
|
**Concurrent events for the same PR.** If two webhooks arrive close together (e.g., push + CI comment), the receiver must serialize them. A per-PR asyncio lock or a simple file lock on the session registry entry is enough.
|
||||||
|
|
||||||
|
**Context window growth.** Each resume appends to the same session. A PR with many round trips will eventually hit the context limit. Mitigation options: start a fresh Claude session (new `cli.py run`) periodically and carry forward a summary; or rely on Claude's built-in compaction. The session registry could include a turn count to trigger rotation.
|
||||||
|
|
||||||
|
**Webhook delivery ordering.** Gitea does not guarantee ordered delivery or exactly-once delivery. The receiver should be idempotent (same PR event processed twice should not create two bottles) and should ignore events for closed PRs.
|
||||||
|
|
||||||
|
## Relationship to Existing Bot-Bottle Infrastructure
|
||||||
|
|
||||||
|
The transcript snapshot + bottle identity system (PRD 0012, `capability_apply.py`) was designed for the capability-block flow: an operator-triggered resume after a security event. The webhook flow is the same mechanism on a faster loop driven by Gitea events instead of operator action. The implementation delta is:
|
||||||
|
|
||||||
|
1. Non-interactive run mode (the `cli.py run` gap already identified in host-dispatch research).
|
||||||
|
2. Passing `--resume <session_id>` explicitly rather than `--continue`.
|
||||||
|
3. A PR-keyed registry to connect PR numbers to bottle identities and session IDs.
|
||||||
|
4. A webhook receiver to drive the loop.
|
||||||
|
|
||||||
|
These are additive changes that sit on top of the existing transcript preservation machinery without altering it.
|
||||||
|
|
||||||
|
## Recommendation
|
||||||
|
|
||||||
|
Start with the non-interactive run mode (`cli.py run`) since everything else depends on it. Once that exists, the webhook receiver and session registry are straightforward glue. The receiver should run as a host process (not inside a bottle) since it needs to call `cli.py` and manage the session registry file. Serialize per-PR to avoid concurrency bugs. Use `--resume <session_id>` (not `--continue`) for all resume paths.
|
||||||
|
|
||||||
|
The PR session registry is deliberately minimal to start — a JSON file is fine. If multi-repo or multi-agent scenarios appear, migrating to sqlite is a one-file change.
|
||||||
@@ -0,0 +1,278 @@
|
|||||||
|
# Local Ollama: Deployment Topology, Harness Selection, and Model Sizing
|
||||||
|
|
||||||
|
Research notes on running Ollama locally for a bot-bottle coding agent workflow.
|
||||||
|
Covers the native-vs-VM question, which harness integrates best with an agent loop,
|
||||||
|
and which models make sense on an RTX 3070 (8 GB VRAM / 30 GB RAM) machine.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Deployment topology: native, container, or VM?
|
||||||
|
|
||||||
|
The core question is whether running Ollama in a VM significantly degrades inference
|
||||||
|
performance. The short answer: a full KVM/QEMU VM with GPU passthrough adds roughly
|
||||||
|
2–5% overhead, Docker on Linux adds roughly 1–2%, and LXC containers add sub-1%. None
|
||||||
|
of these are significant for interactive coding use.
|
||||||
|
|
||||||
|
### Native (bare metal)
|
||||||
|
|
||||||
|
Zero overhead, immediate GPU access, simplest setup. The right default for a solo
|
||||||
|
developer doing inference on their own workstation.
|
||||||
|
|
||||||
|
### Docker containers on Linux + NVIDIA
|
||||||
|
|
||||||
|
With `nvidia-container-toolkit` and `--gpus all`, containerized Ollama runs at
|
||||||
|
essentially native speed (~1–2% overhead on Linux). The dramatic exception is macOS,
|
||||||
|
where Docker Desktop runs a Linux VM with no access to Apple's Metal/GPU — inference
|
||||||
|
is 5–6× slower. On Linux/Windows with NVIDIA hardware, Docker is fine.
|
||||||
|
|
||||||
|
Common pitfall: if `docker exec ollama ollama ps` shows 0 GPU layers, the container
|
||||||
|
fell back to CPU. Usual causes: stale VRAM allocation, missing `nvidia-container-toolkit`,
|
||||||
|
or a host driver too old for the container's CUDA version.
|
||||||
|
|
||||||
|
### KVM/QEMU VM with full PCIe passthrough
|
||||||
|
|
||||||
|
Full GPU passthrough makes the GPU invisible to the host while the VM owns it. Overhead
|
||||||
|
from the IOMMU translation layer and virtualized PCIe bus is ~2–5%. This is viable if
|
||||||
|
you need VM-level isolation (snapshotting, migration, separate kernel). Setup complexity
|
||||||
|
is non-trivial: BIOS IOMMU, IOMMU group management, VFIO driver binding. Once configured
|
||||||
|
it is stable.
|
||||||
|
|
||||||
|
**Critical gotcha:** set the VM's CPU type to `host`. If left at the default
|
||||||
|
(`x86-64-v2-AES` / "QEMU Virtual CPU version 2.5+"), Ollama may silently disable GPU
|
||||||
|
support even when drivers appear correct.
|
||||||
|
|
||||||
|
### LXC containers (Proxmox et al.)
|
||||||
|
|
||||||
|
The sweet spot for isolation without overhead. Sub-1% performance difference from bare
|
||||||
|
metal because LXC shares the host kernel; GPU device files are bind-mounted into the
|
||||||
|
container. The tradeoff is weaker isolation (shared kernel) and the requirement that
|
||||||
|
host and container driver versions match. Not suitable if you need VM-level snapshots
|
||||||
|
or live migration.
|
||||||
|
|
||||||
|
### Summary
|
||||||
|
|
||||||
|
| Topology | GPU overhead | Isolation | Complexity |
|
||||||
|
|---|---|---|---|
|
||||||
|
| Native | 0% | None | Low |
|
||||||
|
| Docker (Linux) | ~1–2% | Process | Low |
|
||||||
|
| LXC | <1% | Namespace | Medium |
|
||||||
|
| KVM passthrough | 2–5% | Full VM | High |
|
||||||
|
| VM no passthrough | CPU-only | Full VM | Medium |
|
||||||
|
|
||||||
|
Running Ollama in a VM will **not** significantly slow inference as long as GPU passthrough
|
||||||
|
is configured. Without passthrough (software rendering / CPU fallback) performance
|
||||||
|
collapses — that is what the user is rightly worried about.
|
||||||
|
|
||||||
|
### Local vs. remote server
|
||||||
|
|
||||||
|
| Factor | Local machine | Remote server |
|
||||||
|
|---|---|---|
|
||||||
|
| Latency | Near-zero | Network round-trip; cumulative in agent loops |
|
||||||
|
| Cost | Zero after hardware | Per-token or subscription |
|
||||||
|
| Privacy | 100% on-device | Data leaves the machine |
|
||||||
|
| Model size ceiling | VRAM-limited | No hard limit (671B+ feasible) |
|
||||||
|
| Offline use | Yes | No |
|
||||||
|
| Concurrency under load | Sequential by default | Scales horizontally |
|
||||||
|
|
||||||
|
For agentic coding workflows making 20–50 tool calls per session, network latency
|
||||||
|
accumulates quickly. Local inference eliminates this. A practical hybrid pattern:
|
||||||
|
use the local GPU for routine coding loops; route only to a remote API for tasks
|
||||||
|
requiring a 70B+ model or very long context (>128K tokens).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Harness selection
|
||||||
|
|
||||||
|
The landscape in 2026 has settled into three categories: IDE plugins, terminal agents,
|
||||||
|
and chat UIs.
|
||||||
|
|
||||||
|
### Continue.dev — recommended IDE plugin
|
||||||
|
|
||||||
|
Open-source VS Code / JetBrains / Zed / Vim extension. Routes autocomplete, chat, and
|
||||||
|
refactoring commands to any configured LLM backend (Ollama, cloud APIs). The recommended
|
||||||
|
setup uses two models: a small FIM-capable model for inline autocomplete (Qwen2.5-Coder 7B)
|
||||||
|
and a larger model for chat/edit. Handles inline completions, multi-file edits, and
|
||||||
|
codebase-aware chat. No API key, no data leaving the machine.
|
||||||
|
|
||||||
|
### Aider — recommended for git-native terminal workflows
|
||||||
|
|
||||||
|
Terminal-based coding agent. Builds a codebase map before editing, makes changes
|
||||||
|
directly, and auto-commits to git with readable messages. Every change is one
|
||||||
|
`git revert` away. Supports 100+ languages; connects to any Ollama-served model
|
||||||
|
via the OpenAI-compatible API. Best for terminal-first developers who want
|
||||||
|
version-controlled agent interactions. Does not do inline autocomplete.
|
||||||
|
|
||||||
|
### OpenCode — recommended for bot-bottle–style agent loops
|
||||||
|
|
||||||
|
Terminal-based coding agent with 15 built-in tools (bash execution, file read/write/edit,
|
||||||
|
grep, glob, web fetch, MCP support) and connections to 75+ model providers including
|
||||||
|
local Ollama models. This is the closest open-source equivalent to a Claude Code–style
|
||||||
|
plan → tool-call → execute → observe → loop. Native Ollama integration.
|
||||||
|
|
||||||
|
**Critical setup note:** Ollama defaults to a 4096-token context window, which is
|
||||||
|
completely insufficient for an agent loop carrying conversation history, tool schemas,
|
||||||
|
a system prompt, and code simultaneously. Configure at least 64K tokens explicitly
|
||||||
|
in the model's context settings.
|
||||||
|
|
||||||
|
### Cline — agentic VS Code assistant
|
||||||
|
|
||||||
|
VS Code extension that operates as an autonomous agent: plans, edits files, runs commands
|
||||||
|
in a loop, connects to Ollama's local endpoint. Compared to OpenCode it lives inside the
|
||||||
|
IDE rather than the terminal; compared to Continue.dev it is a full agent rather than a
|
||||||
|
plugin. Its system prompt overhead is higher (~7,000–10,000 tokens) than minimal harnesses.
|
||||||
|
|
||||||
|
### Open WebUI / Jan / LM Studio — chat UIs, not coding harnesses
|
||||||
|
|
||||||
|
These are browser or desktop chat interfaces useful for ad-hoc conversations (explaining
|
||||||
|
APIs, drafting documentation, exploring ideas) but without IDE integration, autocomplete,
|
||||||
|
or git integration. LM Studio offers the smoothest onboarding (visual model browser with
|
||||||
|
VRAM estimates). Jan is the most privacy-auditable (fully open-source, Apache 2.0, no
|
||||||
|
telemetry). Neither is a replacement for a coding harness.
|
||||||
|
|
||||||
|
### Harness comparison
|
||||||
|
|
||||||
|
| Harness | Type | Autocomplete | Agent loop | Ollama | Git integration |
|
||||||
|
|---|---|---|---|---|---|
|
||||||
|
| Continue.dev | IDE plugin | Yes (FIM) | Basic | Native | No |
|
||||||
|
| Aider | Terminal agent | No | Multi-turn | Via API | Auto-commit |
|
||||||
|
| OpenCode | Terminal agent | No | Full tools | Native | Via bash |
|
||||||
|
| Cline | IDE agent | No | Full tools | Via API | Via bash |
|
||||||
|
| Open WebUI | Chat UI | No | No | Native | No |
|
||||||
|
| Jan | Chat UI | No | No | Native | No |
|
||||||
|
|
||||||
|
For a bot-bottle workflow (an isolated sandbox running an agentic loop with tool access),
|
||||||
|
**OpenCode** is the closest open-source match. For an IDE-first developer who wants
|
||||||
|
autocomplete + chat, **Continue.dev + Qwen2.5-Coder 7B** is the recommended pair.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Model selection: RTX 3070 (8 GB VRAM / 30 GB RAM)
|
||||||
|
|
||||||
|
### VRAM hard limits at Q4_K_M quantization
|
||||||
|
|
||||||
|
| Model size | Approx. VRAM (Q4_K_M) | Fits in 8 GB? | Tokens/sec (RTX 3070) |
|
||||||
|
|---|---|---|---|
|
||||||
|
| 3–4B | 2.5–3.5 GB | Yes, with headroom | 60–90 |
|
||||||
|
| 7–8B | 5–6 GB | Yes | 35–55 |
|
||||||
|
| 12–14B | 7.5–9 GB | Edge / RAM offload | 8–18 |
|
||||||
|
| 22B+ | 14+ GB | No | — |
|
||||||
|
|
||||||
|
The RTX 3070 has high memory bandwidth for its VRAM tier and consistently outperforms
|
||||||
|
the newer RTX 4060 Ti on token generation speed. Bandwidth matters more than raw compute
|
||||||
|
for inference.
|
||||||
|
|
||||||
|
### Does Gemma 4 exist?
|
||||||
|
|
||||||
|
Yes. Google released **Gemma 4** on 2 April 2026 (Apache 2.0). The family includes
|
||||||
|
E2B (2B), E4B (4B), a 26B MoE, and a 31B Dense. A 12B multimodal variant was announced
|
||||||
|
2026-06-04. The 31B scores 80.0% on LiveCodeBench v6 — a major jump from Gemma 3 27B
|
||||||
|
at 29.1%. However, only the E4B fits comfortably within 8 GB VRAM:
|
||||||
|
|
||||||
|
| Variant | VRAM (approx.) | Fits? |
|
||||||
|
|---|---|---|
|
||||||
|
| Gemma 4 E2B | ~2 GB | Yes |
|
||||||
|
| Gemma 4 E4B | ~5 GB | Yes |
|
||||||
|
| Gemma 4 12B | ~8–9 GB (Q4) | Edge |
|
||||||
|
| Gemma 4 26B MoE | 14–18 GB | No |
|
||||||
|
| Gemma 4 31B Dense | ~20 GB | No |
|
||||||
|
|
||||||
|
### Model-by-model evaluation
|
||||||
|
|
||||||
|
**Qwen2.5-Coder 7B — primary recommendation**
|
||||||
|
|
||||||
|
The strongest purpose-built coding model that fits fully within 8 GB VRAM. Leads
|
||||||
|
HumanEval among 7–8B-class models. Strong on Python, JavaScript, TypeScript. Has
|
||||||
|
FIM (fill-in-the-middle) support for inline autocomplete. 35–55 tok/sec on RTX 3070.
|
||||||
|
|
||||||
|
```
|
||||||
|
ollama pull qwen2.5-coder:7b
|
||||||
|
```
|
||||||
|
|
||||||
|
**Qwen2.5-Coder 14B — secondary, with RAM offloading**
|
||||||
|
|
||||||
|
At Q4_K_M this needs ~8.7 GB, just over the 8 GB limit. With 30 GB system RAM, Ollama
|
||||||
|
automatically offloads the overflow layers to CPU. Performance drops to ~8–18 tok/sec
|
||||||
|
versus 35–55 tok/sec for the 7B fully in VRAM. Quality is noticeably better for complex
|
||||||
|
multi-file reasoning. Viable for chat-based coding tasks where quality matters more than
|
||||||
|
speed; too slow for live autocomplete. Keep context window at 8K tokens to minimize
|
||||||
|
VRAM pressure during offloaded inference.
|
||||||
|
|
||||||
|
```
|
||||||
|
ollama pull qwen2.5-coder:14b
|
||||||
|
```
|
||||||
|
|
||||||
|
**Gemma 4 E4B (~5 GB VRAM)**
|
||||||
|
|
||||||
|
Fits comfortably with 3 GB to spare. Strong on reasoning, multimodal, and general-purpose
|
||||||
|
tasks. Less specialized for coding than Qwen2.5-Coder 7B. Good choice for one model that
|
||||||
|
covers coding + general reasoning + image analysis. The E4B outperforms Gemma 3 equivalents
|
||||||
|
significantly on coding benchmarks.
|
||||||
|
|
||||||
|
```
|
||||||
|
ollama pull gemma4:e4b
|
||||||
|
```
|
||||||
|
|
||||||
|
**Phi-4 Mini 3.8B (~3 GB VRAM)**
|
||||||
|
|
||||||
|
Best reasoning-per-VRAM model; leaves ~5 GB free for other applications. Strong on math,
|
||||||
|
logic, and structured output. Good for agentic sub-tasks requiring tight reasoning. Not the
|
||||||
|
strongest at raw code synthesis but excellent for reasoning-heavy parts of a coding loop.
|
||||||
|
Viable as the autocomplete model in a two-model Continue.dev setup.
|
||||||
|
|
||||||
|
```
|
||||||
|
ollama pull phi4-mini
|
||||||
|
```
|
||||||
|
|
||||||
|
**DeepSeek-R1 8B (~5–6 GB VRAM)**
|
||||||
|
|
||||||
|
Strong reasoning model for logic-heavy code (algorithms, correctness proofs). The full
|
||||||
|
DeepSeek-Coder-V2 (236B MoE) is impractical here — only the 8B distilled variants are
|
||||||
|
relevant. Outperforms Gemma 4 E4B on reasoning-heavy benchmarks; weaker on raw code
|
||||||
|
generation than Qwen2.5-Coder 7B.
|
||||||
|
|
||||||
|
**Codestral — not viable at 8 GB**
|
||||||
|
|
||||||
|
The top FIM autocomplete model on HumanEval-FIM benchmarks, but requires 12–16 GB VRAM
|
||||||
|
minimum. Not an option here. Worth revisiting if upgrading to a 12 GB+ card (RTX 4070
|
||||||
|
Super or newer).
|
||||||
|
|
||||||
|
### RAM offloading: does 30 GB help?
|
||||||
|
|
||||||
|
Yes, meaningfully. Ollama automatically splits layers between GPU and system RAM when
|
||||||
|
VRAM is exceeded. With 30 GB RAM, models up to ~14B at Q4_K_M run with partial offloading.
|
||||||
|
The tradeoff is a 2–5× throughput penalty (8–18 tok/sec vs 35–55 tok/sec). Acceptable
|
||||||
|
for batch tasks (reviewing a PR, generating an algorithm); too slow for live autocomplete.
|
||||||
|
|
||||||
|
### Recommended setup
|
||||||
|
|
||||||
|
**Autocomplete (fast, always-in-VRAM):** `qwen2.5-coder:7b`
|
||||||
|
- Configure in Continue.dev as the tab-completion model
|
||||||
|
- FIM-capable; 35–55 tok/sec; fits with 2–3 GB VRAM to spare
|
||||||
|
|
||||||
|
**Chat / agent loop (quality-first):** `qwen2.5-coder:14b` or `gemma4:e4b`
|
||||||
|
- 14B for strongest multi-file coding; expect 8–18 tok/sec with RAM offload
|
||||||
|
- Gemma 4 E4B if you want vision + general reasoning + coding in one model; ~60 tok/sec
|
||||||
|
|
||||||
|
**Two-model Continue.dev config (lower VRAM pressure):**
|
||||||
|
`phi4-mini` (autocomplete) + `qwen2.5-coder:7b` (chat) — both fit simultaneously with
|
||||||
|
~1–2 GB to spare, keeping the OS and IDE from contending for VRAM.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Sources
|
||||||
|
|
||||||
|
- [Ollama on Proxmox: GPU Passthrough for LXC and VM AI Workloads](https://linuxprofessional.ie/article.php?slug=ollama-proxmox-gpu-passthrough-lxc-vm)
|
||||||
|
- [Run Ollama with NVIDIA GPU in Proxmox VMs and LXC containers](https://www.virtualizationhowto.com/2025/05/run-ollama-with-nvidia-gpu-in-proxmox-vms-and-lxc-containers/)
|
||||||
|
- [Ollama Performance Tuning: Getting Maximum Speed from Local LLMs](https://dasroot.net/posts/2026/01/ollama-performance-tuning-gpu-acceleration-model-quantization/)
|
||||||
|
- [Pros and Cons: Containerized Ollama vs. Local Setup](https://alain-airom.medium.com/pros-and-cons-using-containerized-ollama-vs-local-setup-d9bdf225bbb5)
|
||||||
|
- [Best Local Coding Models Ranked: Every VRAM Tier (2026)](https://insiderllm.com/guides/best-local-coding-models-2026/)
|
||||||
|
- [Best Local LLMs for RTX 4060, RTX 3070, and RTX 5060](https://aiagentskit.com/blog/best-local-llms-rtx-4060-3070-5060/)
|
||||||
|
- [Best Local LLMs for 8GB VRAM: Real Hardware Benchmarks (2026)](https://localllm.in/blog/best-local-llms-8gb-vram-2025)
|
||||||
|
- [Self-Hosted AI Coding Agent: Ollama + Continue + Open WebUI Setup in 2026](https://www.web3aiblog.com/blog/self-hosted-ai-coding-agent-ollama-continue-2026)
|
||||||
|
- [Best Local-First AI Coding Tools 2026: 14 Compared](https://nimbalyst.com/blog/best-local-first-ai-coding-tools-2026/)
|
||||||
|
- [OpenCode + Ollama: Private Local AI Coding Agent Setup](https://lushbinary.com/blog/opencode-ollama-local-ai-coding-privacy-guide/)
|
||||||
|
- [Gemma 4: Google DeepMind](https://deepmind.google/models/gemma/gemma-4/)
|
||||||
|
- [Running Gemma 4 Locally: VRAM Requirements](https://knightli.com/en/2026/05/01/gemma-4-local-vram-quantization-table/)
|
||||||
|
- [Phi-4 Mini vs. Gemma 3 vs. Qwen 2.5: Best SLM for Coding Tasks in 2026](https://botmonster.com/ai/phi-4-mini-vs-gemma-3-vs-qwen-25-best-slm-coding-2026/)
|
||||||
|
- [Qwen2.5-Coder 14B VRAM Requirements Guide](https://willitrunai.com/blog/qwen-2-5-coder-14b-vram-requirements)
|
||||||
|
- [Comparing AI Harnesses: OpenCode, Ollama, LM Studio, Claude Code, Open WebUI, and VS Code](https://jace.pro/blog/comparing-ai-harnesses-opencode-ollama-lm-studio-claude-code-open-webui-and-vs-code/)
|
||||||
Reference in New Issue
Block a user