Out-of-band egress enforcement & cost-control plane (forced cutoff + remote dashboard) #251
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Add an out-of-band egress enforcement & observability plane: a way to meter an agent's API usage, forcibly cut its egress when a limit/threshold is reached, and manage running agents from a remote dashboard to prevent cost overruns. This control must act on the agent without the agent's cooperation.
Motivation
I want two things bot-bottle can't currently do:
Why the supervise sidecar does not solve this
The existing supervise sidecar (PRD 0013) is entirely agent-initiated. Per
bot_bottle/supervise.py:Every action starts with the agent voluntarily calling an MCP tool. A runaway or expensive agent — exactly the cost-overrun case — will never call
egress-blockon itself. So supervision is a collaborative recovery mechanism, not an enforcement mechanism. Making it mandatory (see #249) would not deliver forced cost-cutoff.Two distinct planes
This requirement forces a distinction the current design blurs:
Proposed design
Build Plane A on the egress sidecar, which is already always-on and is the MITM proxy every agent's traffic flows through — so it is the natural place to both observe and enforce:
api.anthropic.com). Count requests / parse usage so per-bottle consumption is known without agent involvement.routes.yamlto empty (+ reload), or isolate the bottle from the egress network. Triggered either automatically by a configured threshold, or on demand from the dashboard.cli/supervise.py's cross-bottle view and the existing passive dashboard surface) rather than a per-bottle daemon.Design constraint: the auto-cutoff must NOT be implemented as a proposal on the supervise queue. The trigger (usage threshold) and the action (route-drop) both live in the egress plane and execute without the agent in the loop.
Relationship to #249
#249 proposes making the supervise sidecar (Plane B) mandatory. This issue argues the property worth making unconditional is Plane A, not Plane B. Two coherent paths:
Unsupervised (headless/CI/ephemeral) agents remain first-class either way: they are still subject to the mandatory meter + kill switch; they simply lack the agent-facing proposal tools they couldn't use anyway.
Open questions
Agent providers should have an abstract “count_tokens” method that takes a request and returns the tokens it uses. By default it should use a good enough token estimation function. Ideally stdlib only, but it’s ok to use a python library we add to a set of python dependencies for the sidecar if needed for the fallback.
The built in codex and claude endpoints should use openai and anthropic endpoints for counting tokens, respectively
Probably makes sense to have a “global” budget… this is something we eventually want to add to a control plane that can operate across hosts. It also might make sense to introduce sqlite at this point… I think for an initial mvp, we want to avoid the need for an external api and the use a local sqlite instance. We’ll then want to move some of the state, auditing, and tracking to the sqlite db. sqlite db should be host level, and we should probably wrap operations to it in an api so we can easily swap to a cloud service in the near future.
Before executing on this, evaluate the pros and cons of introducing sqlite and let me know if it makes sense to introduce sqlite now.
EDIT: we want the global token budgets/that should be higher priority, but we should also allow budgets per active agent and bottle as well… lets keep the agent/bottle budgets so they override each other based on precedent (agent overrides bottle, which overrides parent bottle, etc), and let’s have also have a budget option when launching a new bottle.
Should be host level. Once this is in place, we should also probably move the supervisor UI to the same dashboard. To start I think we want one host level “dashboard” TUI, and we want any state needed to drive it to live in a host level sqlite db.
Good question: probably should add a customizable “cutoff policy” to bottles, with the following options:
Eventually yes (for web and mobile remote control, which will also be cross host), but to start lets stick with a TUI host only dashboard without remote control.
Also will want to introduce a “settings.yml” that lives in the root of the .bot-bottle folder and has a “budget” section for host level budgets. Budgets should be keyed by agent providers (same name as those available to the bottles) and use token counts for budgeting (not dollar values). Should also include the shutdown behavior.
Walked the codebase against these decisions. Responding to the threads, leading with the SQLite question you flagged as a gate.
SQLite — yes, introduce it now (scoped)
The dependency objection doesn't apply.
sqlite3is in the Python stdlib, so it does not break the AGENTS.md "stdlib-first / no runtime pip deps" stance — same category as the hand-rolledyaml_subset.py, except here the stdlib already ships the whole thing. So the usual reason to defer ("don't add a dependency yet") is absent.The actual problem you're describing is one SQLite is good at. A global token budget decremented concurrently by N egress sidecars (today
~/.bot-bottle/already hasstate/,audit/,queue/written by multiple bottles in parallel) is a read-modify-write race. Doing that over JSON files means hand-rolled file locking; SQLite gives atomic transactions + WAL for free. And the per-agent/per-bottle precedence rollup (agent → bottle → parent) plus "sum across all bottles" is aGROUP BY, not an N-directory rescan-and-parse.It's also the cleanest analogue for the eventual cloud swap. Your "wrap operations in an API so we can swap to a cloud service" maps directly onto a thin repository/DAO over SQLite → Postgres later. A JSON-file store is a worse rehearsal for a remote DB than SQL is.
Costs (real but bounded):
schema_versiontable + idempotent migrations on startup).busy_timeout. At host scale (handful of bottles) this is a non-issue.Recommendation: add it now, but narrowly — only the new metering/budget/audit ledger goes in SQLite, behind a thin repo API at e.g.
~/.bot-bottle/bot-bottle.db. Do not migrate existing per-bottle state (resumemetadata.json, transcripts, Dockerfile overrides) — those are per-identity blobs that files handle fine and that don't have the concurrency/aggregation problem. Migrating them now would be churn for no benefit.count_tokens — split "gate" from "account"
One refinement worth nailing before building: there are two distinct needs, and the response body is strictly better for one of them.
count_tokensendpoint is the only option. Good fit for the abstract method.usage(Anthropicinput_tokens/output_tokens, OpenAIusage). The egress addon already has aresponse(flow)hook — so we can read the real number for free, no extra network call. Callingcount_tokensfor accounting would both be less accurate and add a metered egress call per request.So I'd suggest:
count_tokens(estimator, stdlib fallback) for the gate; parse responseusagefor the ledger. Caveat: agent traffic is mostly streaming SSE, so the response hook needs to tail the stream for the final usage event — worth scoping explicitly.settings.yml — which
.bot-bottle, and the parser constraintTwo notes:
.bot-bottle/roots: the repo one (committed per-repo manifests) and the host one (~/.bot-bottle/, state/audit/queue). Host-level budgets belong in~/.bot-bottle/settings.yml, not the repo dir — otherwise budgets get committed per-repo. Assuming you mean the host one.yaml_subset.py, which is deliberately a bounded subset (no anchors, no multi-line block scalars). A flatbudget:mapping ofprovider: <int>plus ashutdown:scalar fits fine — just keep it within that shape.cutoff policy — primitives already exist
These map onto existing code: freeze =
bot_bottle/backend/freeze.py(snapshot → resumable, already built), kill = backend teardown, cutoff = the route-drop on the egress plane. So the policy is mostly wiring an existing enum to existing primitives + the new trigger.Per the AGENTS.md convention ("durable why lives in a PRD / research note / decision record, not the issue thread") I'll fold the SQLite verdict + the gate-vs-account split into a research note / PRD when we move to execute, rather than letting it live only here. Want me to draft that PRD next?
@didericis-claude yes, draft the prd in a new pr
Drafted the PRD in a new PR: #285 (branch
prd-egress-control-plane).It folds in every decision from this thread, leading with the SQLite verdict you gated on:
~/.bot-bottle/bot-bottle.dbbehind a thin repository API; only the new metering/budget/enforcement-audit state goes in SQL, existing per-bottle flat-file state stays on disk.schema_version+ idempotent migrations, WAL +busy_timeoutcalled out as required.usagevia the existing egressresponsehook (with SSE final-usage tailing scoped as work);count_tokensestimator/endpoint reserved for an optional pre-flight gate.--budgetlaunch flag.~/.bot-bottle/settings.ymlat the host root (not per-repo), within theyaml_subset.pysubset.cutoff/freeze/killmapped to existing primitives (egress route-drop,freeze.py, backend teardown); the trigger lives in the egress plane, never the supervise queue.Open questions I couldn't resolve from the thread are carried in the PRD (Open questions section) rather than left here — chiefly SSE usage-tailing robustness and whether mid-request budget crossing needs the pre-flight gate for v1. Review there.