# ADR 0004: Risk-weighted coverage, not a single global target - **Status:** Accepted - **Date:** 2026-06-25 - **Deciders:** didericis ## Context bot-bottle is a security tool: it sandboxes agents, scans egress for secret exfiltration, strips credentials, and gates git pushes. A latent bug in that logic is expensive, so test coverage there genuinely matters. But the repo also contains code where coverage is a poor signal: - **Interactive entry-point shells** — `cli/init.py` (a `read_tty_line()` prompt loop) and `cli/tui.py` (a curses picker). Their bodies are I/O; a unit test has to fake the entire terminal conversation, so it inflates the number without asserting behaviour that would otherwise go unchecked. - **Subprocess / backend orchestration** — the docker / smolmachines / macos-container backends shell out to `docker`, `container`, `smolvm`. Mock-heavy unit tests here mostly re-assert the argv you already wrote (the test passes whether or not the real teardown works), while many of the missed *branches* are failure paths you cannot provoke against a real daemon on cue. Chasing a single global percentage (e.g. 90%) pushes the most test effort onto the least safety-relevant code — exactly backwards — and invites performative tests written to colour a line rather than to catch a regression (Goodhart's law). ## Decision Coverage is **risk-weighted**, measured over the **combined unit + integration** suites, with three rules: 1. **Critical modules target ≥ 90%.** The security/logic core — `egress_addon{,_core}.py`, `dlp_detectors.py`, `egress.py`, `manifest*.py`, `git_gate.py`, `git_http_backend.py`, `supervise.py`, `yaml_subset.py`, `bottle_state.py` — is Docker-independent and unit-testable, so it carries the high bar. We ratchet toward 90% as these modules are touched; new gaps in them are not acceptable. 2. **Subprocess/backend orchestration is covered by the integration suite, not omitted.** `scripts/coverage.sh` runs unit + integration under one coverage measurement so these modules are scored where they are actually exercised. They stay *visible* — hiding the code that tears down sandboxes and wires networks is the one place we will not omit. 3. **Interactive entry-point shells are omitted** (`.coveragerc`), with a rationale comment. This is the only sanctioned use of `omit` besides `tests/*`. The forward-looking guard is a **diff-coverage gate** (`scripts/diff_coverage.py`): new/changed executable lines on a branch must be ≥ 90% covered. This catches regressions where they are introduced without forcing a back-fill crusade through legacy glue. The gate skips lines in omitted files (there is no coverage data for them), so the omit list cannot launder *new* logic into the dark: anything that needs real testing must live outside the interactive shells to be scored at all. The **global percentage is informational**, not a CI gate — it would otherwise be hostage to the CI runner's Docker availability and to the omit list. ## Consequences - The number we report (`scripts/coverage.sh`) means "coverage of the code we consider testable, across both suites" — a dip is a real regression in code we control, not noise from added CLI glue. - No incentive to write mock-the-mock tests for orchestration to defend a global figure. - The omit list needs governance: an entry must be a genuinely interactive shell, justified in the `.coveragerc` comment and here. `cli/init.py` and `cli/tui.py` qualify; backend orchestration does not. - CI must run the integration suite under coverage to score the orchestration modules; where the runner lacks Docker those tests skip and their modules read low — accepted, because the *enforced* gates (critical-module standard + diff coverage) are Docker-independent. - "We're at N%" is now a curated figure; outsiders should read the policy, not just the badge. ## Links - PRs #290 (cover the egress adapter), and the coverage-policy PR that introduces this record. - `.coveragerc`, `scripts/coverage.sh`, `scripts/diff_coverage.py`. - `scripts/critical-modules.txt` — the single source of truth for the core-module list; read by both `scripts/coverage.sh` and the `update-badges.yml` "core coverage" badge so they cannot drift. - The README carries a `core coverage` badge (auto-updated from that list) — the headline number, distinct from the informational global `coverage` badge.