manifest.py imported the extends/loader resolvers, while those resolvers
needed ManifestBottle back from manifest.py — a true bidirectional cycle
papered over with in-function imports and TYPE_CHECKING guards (not clear
dependency inversion).
Extract ManifestBottle into a new leaf module manifest_bottle.py that depends
only on the other leaf modules (manifest_util/agent/egress/git/schema).
manifest.py re-exports ManifestBottle, so `from .manifest import ManifestBottle`
callers are unaffected. With the cycle gone:
- manifest_extends and manifest_loader import ManifestBottle from
manifest_bottle and their other deps from the real source modules, all at
top level (TYPE_CHECKING block removed).
- manifest.py imports the extends/loader/schema/yaml_subset/log helpers at
module top; all per-function lazy imports in the cluster are removed.
No behavior change; full unit suite green, pyright clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
- dlp_detectors._closest_pair: replace the O(n*m) cross product with an
O(n log n) sort + O(n) two-pointer merge, and early-out once a pair
falls within the proximity threshold. The inputs are attacker-controlled
response-body matches past the body-size cap, so the quadratic form was a
latent DoS. Extract _match_gap to share the span-gap calc with the caller.
- dlp_detectors._compute_encoded_variants: back the memo with a bounded
functools.lru_cache instead of an unbounded module dict, so a long-lived
proxy seeing rotating secrets evicts rather than growing without limit.
- supervise_server: extract the duplicated routes.yaml inputSchema into
_proposal_input_schema()/_ROUTES_YAML_DESCRIPTION so the egress-allow and
egress-block tools can't drift.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
The pyright "0 errors" and pylint "9.93/10" badges were static,
hand-synced shields that duplicated state the `lint` CI job already
enforces — a maintenance tax that could silently drift from reality.
Remove both badges from the README and strip the corresponding steps
(pylint/pyright runs, sed rewrites, commit-message lines, and the
`.pylintrc`/`pyrightconfig.json` path triggers) from the badge-update
workflow. Lint/type enforcement in CI is unchanged; only the published
badges go away. Coverage and core-coverage badges stay.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
`ManifestIndex.load_for_agent` was a ~100-line method branching across
the eager (from_json_obj) and lazy (from disk) resolution modes, with
the git-user merge tail duplicated in both branches. Split into
`_load_for_agent_eager` / `_load_for_agent_lazy` behind a small
dispatcher and extract the shared tail into
`_manifest_with_merged_git_user`. No behavior change.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
Two per-request hot-path costs in the egress DLP scanner:
- `_encoded_variants` derived the full variant set (gzip + nine
encodings) for every provisioned secret on every redaction and
known-secret scan — once per host, path, header, and body. Cache it
per distinct secret; callers still get a fresh list so they can't
corrupt the shared cached tuple.
- `_find_partial_window` searched the text once per secret n-gram,
giving O(len(secret) * len(text)). Build the secret's n-gram set once
and sweep the text a single time: O(len(text)), no coverage loss.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
update-badges.yml triggered on **.py / .pylintrc / pyrightconfig.json /
.coveragerc but not scripts/critical-modules.txt, so editing the core
module list alone wouldn't refresh the `core coverage` badge until the
next .py change. Add it to the push paths.
Closes#305
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
git_gate.py (699 LOC) mixed three responsibilities. Split into:
- git_gate_render.py — pure host-side rendering: the gate constants,
GitGateUpstream, gitconfig/known-hosts rendering, and the entrypoint /
pre-receive / access-hook script builders.
- git_gate_provision.py — the gitea deploy-key lifecycle
(_provision_dynamic_key / revoke / _resolve_identity_file).
- git_gate.py — the GitGate ABC + GitGatePlan, now 169 LOC, re-exporting
all moved names (see __all__) so the 19 importers are unchanged.
Host-side only (not flat-bundled), so no sidecar import shim. The one
test that patched the internal `_provision_dynamic_key` lookup is
repointed to its new module (public API unchanged). The two new modules
are added to scripts/critical-modules.txt so the decompose doesn't move
security code out of the measured core — critical aggregate stays 95%
(git_gate 100%, render 100%, provision 97%).
Closes#303
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
The eager from_json_obj path is unit-tested; the lazy resolve()/
from_md_dirs path was only hit by the integration suite, so a critical
module relied on Docker for branch coverage. Add tmp-dir tests driving:
all_agent_names with a cwd overlay, load_for_agent on unknown and
malformed-frontmatter agent files, and require_agent's names-only
file-existence checks (home + cwd).
manifest.py: 86% -> 99%. The one remaining line is the OSError branch on
an unreadable agent file (not reliably triggerable cross-environment).
Closes#304
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
The unit suite could write to and flock the real ~/.bot-bottle: state,
queue, and audit dirs all derive from supervise.bot_bottle_root() ->
Path.home(). A test taking a flock on the real audit log blocks
indefinitely when a live bottle's supervise sidecar holds that lock
(observed: a `coverage run` hung at 0% CPU), and unisolated tests
otherwise pollute the developer's home dir.
Point HOME at a throwaway temp dir for the whole tests/unit package
(restored + cleaned at exit). Tests that set their own HOME now restore
to the isolated dir, not the real one; tests that patch bot_bottle_root
directly are unaffected.
Closes#302
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
Surface the metric ADR 0004 says matters — the critical security/logic
core, currently 95% — as a README badge, distinct from the
informational global `coverage` badge.
- scripts/critical-modules.txt: single source of truth for the core
module list. scripts/coverage.sh now reads it (instead of a hardcoded
string) and update-badges.yml reads the same file, so the badge and
the `critical` report cannot drift.
- update-badges.yml: a `core coverage` step reuses the unit-coverage
data (every core module is unit-tested, so unit-only is accurate for
it) and sed-updates the new badge, like the existing ones.
- README: `core coverage 95%` badge linking to ADR 0004 so a reader can
find out what "core" means.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
The token-pattern detector had 15 near-identical test methods across
`TestScanTokenPatterns` and `TestScanTokenPatternsExtended`, each
scanning a body carrying one synthetic token and asserting the reason
names the credential type.
Collapse them into a single `_TOKEN_PATTERN_CASES` table driven by
`subTest`, so adding a new token shape is a one-line row. Each case now
also asserts block severity (previously only the AWS case did).
`TestScanTokenPatternsExtended` is removed; its rows live in the table.
The non-matrix cases (clean text, location, context, reason) stay as
explicit methods. No production code change.
Closes#289
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
The interactive multiselect loop nested key dispatch up to six indent
levels deep — the worst offender being the space-bar toggle
(while > if focus > elif key > if filtered > if/else membership) and
the long order-mode elif chain inside the focus branch.
Extract two behaviour-identical helpers:
- `_toggle_membership(items, item)` collapses the add/remove if/else,
pulling the space branch back to four levels.
- `_handle_order_key(key, selected, order_cursor)` moves the entire
order-focus dispatch out of the loop, returning the new cursor.
No control-flow or key-binding changes; the loop's early returns and
focus toggling are untouched. (git_gate.py's deep-looking lines named
in the issue are multiline call-argument continuations already under
four levels of control nesting, so no change was warranted there.)
Closes#288
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
`egress_addon_core.py` mixed the per-route `dlp:` block parser
(`_parse_detectors` plus the detector-name and `outbound_on_match`
constants) in with the request-time scan/decision flow. Move that
config-parsing layer into a new stdlib-only `egress_dlp_config.py` as
`parse_dlp_block`, so the decision path in the core module reads
top-to-bottom without scrolling past config plumbing.
The constants and parser are re-exported from `egress_addon_core`
(and listed in `__all__`) so existing `from egress_addon_core import
ON_MATCH_*` / `OUTBOUND_DETECTOR_NAMES` callers are unchanged. The new
module ships flat into the sidecar bundle (Dockerfile.sidecars) and
uses the same flat/package import shim as its siblings. Pure refactor;
behavior and wire format unchanged.
Closes#287
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
Sixth per-module ratchet under ADR 0004. Cover the queue/audit
malformed-input and fallback branches:
- path helpers (bot_bottle_root, queue_dir_for_slug,
_id_from_proposal_filename non-match)
- read_proposal / read_response reject non-object JSON
- list_pending_proposals skips unreadable/non-dict/incomplete
proposals and ones with a response already present
- wait_for_response tolerates a malformed or incomplete response file
and then times out at the deadline
- read_audit_entries returns [] for a missing log and skips blank /
non-JSON / non-dict / missing-field lines
- the fcntl flock helpers swallow OSError on a bad fd
supervise.py: 89% -> 99%. The one remaining line is an unreachable
`continue` (glob already guarantees the .proposal.json suffix).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
Fourth per-module ratchet under ADR 0004. Cover the pure
`git_gate_render_gitconfig` renderer (empty entries, insteadOf URL,
scheme override, RemoteKey ssh alias with/without non-default port,
newline-injection rejection) and the dynamic gitea deploy-key
lifecycle with the forge provisioner mocked:
- `_provision_dynamic_key`: writes key + key-id files, strips `.git`
from owner/repo, builds the proposal title; missing token raises.
- `revoke_git_gate_provisioned_keys`: revokes a gitea key when the
id-file is present, skips static-provider entries and missing
id-files, raises on a missing token.
bot_bottle/git_gate.py: 70% -> 99% (unit only). Two remaining partial
branches are inner conditionals on the alias/owner-repo paths.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
Third per-module ratchet under ADR 0004. Add a parsing/serialization
suite for the egress engine's core:
- route validation rejections: payload/route shape, host, auth pairing,
git block, every matches sub-field (paths/methods/headers type +
regex-compile + unknown-key), and the dlp block (detector type/name,
outbound_on_match, unknown key)
- a full valid route round-trips; detectors:false disables
- parse_config log-level validation + load_config invalid-YAML
- route_to_yaml_dict: minimal/auth/git/dlp/matches with default-omission
- evaluate_matches: exact/prefix/regex paths, method filter, exact +
regex header matching (match and non-match)
egress_addon_core.py: 84% -> 99%. The two remaining missed statements
are defensive guards (an unreachable separator-return and a
no-matching-path-type fallthrough).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
First per-module ratchet under ADR 0004. Extend the adapter flow suite
to cover the remaining behavioural gaps:
- inbound response DLP: injection block (403), warn (logged, forwarded),
and LOG_FULL response logging
- WebSocket inbound (server->client) scanning: injection kills the
connection; warn does not; no-websocket is a no-op
- redaction scrubs the token in a header and the request path, not just
the body
- supervise queue-write OSError fails closed (403)
- _token_allow_timeout_from_env: unset/valid/non-numeric/non-positive
- SIGHUP handler reloads routes; a reload failure keeps the last good
config
- LOG_FULL logs the forwarded request
egress_addon.py: 76% -> 94%. The remaining misses are the low-value
edges (no-SIGHUP platform, hostname-redaction-fails-closed) called out
in the egress adapter PR.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
Adopt ADR 0004: stop chasing a single global coverage number and
measure what matters instead.
- Omit the genuinely-interactive `cli/init.py` shell (read_tty_line
prompt loops) alongside the existing `cli/tui.py`, with a rationale
comment in .coveragerc. Subprocess/backend orchestration is NOT
omitted — it stays visible and is scored via the integration suite.
- scripts/coverage.sh runs unit + integration under one coverage
measurement (the policy's yardstick) and can report the critical
security/logic core held to the >=90% target.
- scripts/diff_coverage.py is a stdlib-only gate (no diff-cover dep):
new/changed executable lines must be >=90% covered. This is the
enforced regression guard; the global number is informational.
- CI gains a `coverage` job: combined report + the diff-coverage gate.
- Unit-test `cli/__init__.py` dispatch/exit-code mapping (it's logic,
not I/O, so it earns tests rather than an omit).
Combined unit+integration coverage now reports 83% global / 87% across
the critical modules; per-module ratcheting toward 90% is the ongoing
work this policy frames.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
The mitmproxy adapter `egress_addon.py` was omitted from coverage
because it can't import on the host (mitmproxy is sidecar-only) and
only its log-redaction helpers were exercised. Add a request/response
flow suite that stubs mitmproxy and drives the adapter glue:
introspection, allowlist enforcement, auth strip+inject, git
push/fetch blocking, the outbound-DLP block/redact/supervise policy
branches (including the operator approval round-trip), inbound
response scanning, and WebSocket frame scanning.
Removes the `bot_bottle/egress_addon.py` omit from `.coveragerc`;
the adapter now reports ~76% covered.
Closes#286
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
Three UX improvements requested in #270 review:
- filter_multiselect: Space toggles selection, Enter confirms (was both)
- bottle picker: bottles with extends chains show ancestry labels
(e.g. 'claude-dev <- bot-bottle-dev <- dev') for at-a-glance lineage
- preflight: replaces key-value summary with YAML of the resolved manifest
Tab switches focus to the selected-order panel; K/J shift the
highlighted item up/down; Space/Enter removes it. The filter list dims
while the order panel is active. Help line updates per focus mode.
- `bottle:` in agent frontmatter is now optional; agents without it
are portable and require bottles to be selected at launch.
- Adds `filter_multiselect` to `tui.py`: multi-select picker with
ordered selection list, Space/Enter to toggle, Ctrl-D to confirm.
- `ManifestIndex` gains `all_bottle_names` and `load_for_agent` accepts
`bottle_names: tuple[str, ...]` to merge bottles in order at runtime.
- `merge_bottles_runtime` in `manifest_extends.py` applies the same
field-merge rules as `extends:` to pre-resolved bottle objects.
- `BottleSpec` gains `bottle_names`; `_validate` and `write_launch_metadata`
thread it through so `resume` replays the same bottle configuration.
- `cmd_start` shows the bottle multiselect after agent selection,
pre-populated from the agent's `bottle:` field when present.
- Existing agents with `bottle:` declared continue to work unchanged.
The update-badges workflow only refreshed pylint and pyright. Add a
coverage step that runs the unit suite under coverage.py, extracts the
TOTAL percentage, and updates a new coverage badge in the README.
Also trigger the workflow on .coveragerc changes.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
Allow extends: to accept a list of bottle names in addition to a plain
string. Parents are resolved independently and folded left-to-right
into a single combined parent before the child is merged on top, so
orthogonal concerns (base env, networking, agent provider) can live in
separate bottles without forcing a linear chain.
Merge rules for the parent fold: env dict-merge with later winning on
collision; git-gate.user per-field overlay; git-gate.repos union by
name with later winning per-field on same name; egress.routes
concatenated; all scalar fields (supervise, agent_provider, egress.log)
use last-wins. The existing child-wins-over-all-parents rule is
unchanged. Cycle detection, diamond deduplication, and missing/invalid
parent errors all work across multi-parent graphs.
Closes#268