Risk-weighted coverage policy + diff-coverage gate (ADR 0004) #294

Open
didericis-claude wants to merge 1 commits from cover-global-90 into cover-egress-addon-adapter
Collaborator

Stacked on #290 (base = cover-egress-addon-adapter).

Implements ADR 0004 — risk-weighted coverage: instead of chasing one global percentage (which pushes the most test effort onto the least safety-relevant code and invites performative line-chasing), measure what matters.

What this does

  • Omit only genuinely-interactive shells. cli/init.py (a read_tty_line() prompt loop) joins cli/tui.py in .coveragerc, with a rationale comment. Subprocess/backend orchestration is deliberately NOT omitted — that code tears down sandboxes and wires networks, so it stays visible and is scored via the integration suite instead.
  • Count integration coverage. scripts/coverage.sh runs unit + integration under one measurement (the policy's yardstick) and can print the critical security/logic core (--critical).
  • Enforce a diff-coverage gate. scripts/diff_coverage.py is stdlib-only (no diff-cover dependency, per the repo's low-dep rule): new/changed executable lines must be ≥90% covered. This is the enforced regression guard; the global number is informational.
  • CI gains a coverage job: combined report + the diff gate (degrades gracefully when the runner lacks Docker — integration skips, the diff gate is Docker-independent).
  • Unit-test cli/__init__ dispatch — it's exit-code logic, not I/O, so it earns tests rather than an omit (9 tests).

Numbers

Measurement Coverage
Unit only (old badge) 79%
Combined unit+integration (new yardstick) 83%
Critical security/logic core (aggregate) 87%

Per-module ratcheting of the critical core toward 90% (egress_addon 76→, git_gate 80→, yaml_subset 83→, …) is the ongoing work this policy frames — see ADR 0004.

Full unit suite (1329 tests) passes; pyright clean; pylint 10.00 on new files.

Stacked on #290 (base = `cover-egress-addon-adapter`). Implements **ADR 0004 — risk-weighted coverage**: instead of chasing one global percentage (which pushes the most test effort onto the least safety-relevant code and invites performative line-chasing), measure what matters. ## What this does - **Omit only genuinely-interactive shells.** `cli/init.py` (a `read_tty_line()` prompt loop) joins `cli/tui.py` in `.coveragerc`, with a rationale comment. **Subprocess/backend orchestration is deliberately NOT omitted** — that code tears down sandboxes and wires networks, so it stays visible and is scored via the integration suite instead. - **Count integration coverage.** `scripts/coverage.sh` runs unit + integration under one measurement (the policy's yardstick) and can print the critical security/logic core (`--critical`). - **Enforce a diff-coverage gate.** `scripts/diff_coverage.py` is stdlib-only (no `diff-cover` dependency, per the repo's low-dep rule): new/changed executable lines must be ≥90% covered. This is the enforced regression guard; the global number is informational. - **CI** gains a `coverage` job: combined report + the diff gate (degrades gracefully when the runner lacks Docker — integration skips, the diff gate is Docker-independent). - **Unit-test `cli/__init__` dispatch** — it's exit-code logic, not I/O, so it earns tests rather than an omit (9 tests). ## Numbers | Measurement | Coverage | |---|---| | Unit only (old badge) | 79% | | **Combined unit+integration (new yardstick)** | **83%** | | Critical security/logic core (aggregate) | 87% | Per-module ratcheting of the critical core toward 90% (egress_addon 76→, git_gate 80→, yaml_subset 83→, …) is the ongoing work this policy frames — see ADR 0004. Full unit suite (1329 tests) passes; pyright clean; pylint 10.00 on new files.
didericis-claude added 1 commit 2026-06-25 21:29:29 -04:00
ci(coverage): risk-weighted coverage policy + diff-coverage gate
lint / lint (push) Successful in 1m52s
test / unit (pull_request) Successful in 46s
test / integration (pull_request) Successful in 16s
test / coverage (pull_request) Successful in 1m2s
632ab002ed
Adopt ADR 0004: stop chasing a single global coverage number and
measure what matters instead.

- Omit the genuinely-interactive `cli/init.py` shell (read_tty_line
  prompt loops) alongside the existing `cli/tui.py`, with a rationale
  comment in .coveragerc. Subprocess/backend orchestration is NOT
  omitted — it stays visible and is scored via the integration suite.
- scripts/coverage.sh runs unit + integration under one coverage
  measurement (the policy's yardstick) and can report the critical
  security/logic core held to the >=90% target.
- scripts/diff_coverage.py is a stdlib-only gate (no diff-cover dep):
  new/changed executable lines must be >=90% covered. This is the
  enforced regression guard; the global number is informational.
- CI gains a `coverage` job: combined report + the diff-coverage gate.
- Unit-test `cli/__init__.py` dispatch/exit-code mapping (it's logic,
  not I/O, so it earns tests rather than an omit).

Combined unit+integration coverage now reports 83% global / 87% across
the critical modules; per-module ratcheting toward 90% is the ongoing
work this policy frames.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
Some checks are pending
lint / lint (push) Successful in 1m52s
test / unit (pull_request) Successful in 46s
test / integration (pull_request) Successful in 16s
test / coverage (pull_request) Successful in 1m2s
This pull request can be merged automatically.
You are not authorized to merge this pull request.
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin cover-global-90:cover-global-90
git checkout cover-global-90
Sign in to join this conversation.