`cli.py start` was interactive-only: TUI selectors (agent / bottle /
name+color) plus a y/N preflight, then a blocking PTY attached to the
controlling terminal. That shape can't be driven by an orchestrator
(Paseo), CI, or webhook dispatch, and made spinning up a known
agent+bottle more friction than necessary.
Add a `--headless` path on `start`:
- agent / bottles / label / color come from flags + manifest defaults;
no TUI selectors, no y/N (auto-confirmed via a new `assume_yes` param
threaded into the shared `_launch_bottle` core).
- `--bottle` (repeatable) defaults to the agent's own `bottle:` when
omitted; `--label` defaults to the agent name and auto-uniquifies on
slug collision (orchestrators fire-and-forget many bottles);
`--color` defaults to none.
- the agent still execs on inherited stdio/PTY, so whatever allocates
the PTY drives the live session — only the launch chrome went
non-interactive.
- `--headless --dry-run` previews the resolved plan without launching.
Prerequisite for orchestrator integration, webhook dispatch, and remote
spin-up. New unit coverage in tests/unit/test_cli_start_headless.py
(11 tests); start/cli/launch sweep green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
Skill names become host/guest path segments interpolated into the
`bottle.exec` shell strings in each contrib provider's provision_skills.
They were validated only as strings, so a name with shell metacharacters
or path traversal could reach the command.
Layer two defenses:
- Primary: reject any skill name that isn't kebab-case
([a-z][a-z0-9-]*) at manifest load, reusing the convention already
enforced on bottle/agent filenames (new is_valid_entity_name helper
in manifest_schema). Fails loud and early, protecting every consumer
of the name — not just the exec call sites.
- Failsafe: shlex.quote the interpolated skills_dir / dst paths in the
claude, codex, and pi providers, so a future unvalidated field can't
inject shell metacharacters even if it bypasses the load-time check.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
- dlp_detectors._closest_pair: replace the O(n*m) cross product with an
O(n log n) sort + O(n) two-pointer merge, and early-out once a pair
falls within the proximity threshold. The inputs are attacker-controlled
response-body matches past the body-size cap, so the quadratic form was a
latent DoS. Extract _match_gap to share the span-gap calc with the caller.
- dlp_detectors._compute_encoded_variants: back the memo with a bounded
functools.lru_cache instead of an unbounded module dict, so a long-lived
proxy seeing rotating secrets evicts rather than growing without limit.
- supervise_server: extract the duplicated routes.yaml inputSchema into
_proposal_input_schema()/_ROUTES_YAML_DESCRIPTION so the egress-allow and
egress-block tools can't drift.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
Two per-request hot-path costs in the egress DLP scanner:
- `_encoded_variants` derived the full variant set (gzip + nine
encodings) for every provisioned secret on every redaction and
known-secret scan — once per host, path, header, and body. Cache it
per distinct secret; callers still get a fresh list so they can't
corrupt the shared cached tuple.
- `_find_partial_window` searched the text once per secret n-gram,
giving O(len(secret) * len(text)). Build the secret's n-gram set once
and sweep the text a single time: O(len(text)), no coverage loss.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
git_gate.py (699 LOC) mixed three responsibilities. Split into:
- git_gate_render.py — pure host-side rendering: the gate constants,
GitGateUpstream, gitconfig/known-hosts rendering, and the entrypoint /
pre-receive / access-hook script builders.
- git_gate_provision.py — the gitea deploy-key lifecycle
(_provision_dynamic_key / revoke / _resolve_identity_file).
- git_gate.py — the GitGate ABC + GitGatePlan, now 169 LOC, re-exporting
all moved names (see __all__) so the 19 importers are unchanged.
Host-side only (not flat-bundled), so no sidecar import shim. The one
test that patched the internal `_provision_dynamic_key` lookup is
repointed to its new module (public API unchanged). The two new modules
are added to scripts/critical-modules.txt so the decompose doesn't move
security code out of the measured core — critical aggregate stays 95%
(git_gate 100%, render 100%, provision 97%).
Closes#303
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
The eager from_json_obj path is unit-tested; the lazy resolve()/
from_md_dirs path was only hit by the integration suite, so a critical
module relied on Docker for branch coverage. Add tmp-dir tests driving:
all_agent_names with a cwd overlay, load_for_agent on unknown and
malformed-frontmatter agent files, and require_agent's names-only
file-existence checks (home + cwd).
manifest.py: 86% -> 99%. The one remaining line is the OSError branch on
an unreadable agent file (not reliably triggerable cross-environment).
Closes#304
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
The unit suite could write to and flock the real ~/.bot-bottle: state,
queue, and audit dirs all derive from supervise.bot_bottle_root() ->
Path.home(). A test taking a flock on the real audit log blocks
indefinitely when a live bottle's supervise sidecar holds that lock
(observed: a `coverage run` hung at 0% CPU), and unisolated tests
otherwise pollute the developer's home dir.
Point HOME at a throwaway temp dir for the whole tests/unit package
(restored + cleaned at exit). Tests that set their own HOME now restore
to the isolated dir, not the real one; tests that patch bot_bottle_root
directly are unaffected.
Closes#302
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
The token-pattern detector had 15 near-identical test methods across
`TestScanTokenPatterns` and `TestScanTokenPatternsExtended`, each
scanning a body carrying one synthetic token and asserting the reason
names the credential type.
Collapse them into a single `_TOKEN_PATTERN_CASES` table driven by
`subTest`, so adding a new token shape is a one-line row. Each case now
also asserts block severity (previously only the AWS case did).
`TestScanTokenPatternsExtended` is removed; its rows live in the table.
The non-matrix cases (clean text, location, context, reason) stay as
explicit methods. No production code change.
Closes#289
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
Sixth per-module ratchet under ADR 0004. Cover the queue/audit
malformed-input and fallback branches:
- path helpers (bot_bottle_root, queue_dir_for_slug,
_id_from_proposal_filename non-match)
- read_proposal / read_response reject non-object JSON
- list_pending_proposals skips unreadable/non-dict/incomplete
proposals and ones with a response already present
- wait_for_response tolerates a malformed or incomplete response file
and then times out at the deadline
- read_audit_entries returns [] for a missing log and skips blank /
non-JSON / non-dict / missing-field lines
- the fcntl flock helpers swallow OSError on a bad fd
supervise.py: 89% -> 99%. The one remaining line is an unreachable
`continue` (glob already guarantees the .proposal.json suffix).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
Fourth per-module ratchet under ADR 0004. Cover the pure
`git_gate_render_gitconfig` renderer (empty entries, insteadOf URL,
scheme override, RemoteKey ssh alias with/without non-default port,
newline-injection rejection) and the dynamic gitea deploy-key
lifecycle with the forge provisioner mocked:
- `_provision_dynamic_key`: writes key + key-id files, strips `.git`
from owner/repo, builds the proposal title; missing token raises.
- `revoke_git_gate_provisioned_keys`: revokes a gitea key when the
id-file is present, skips static-provider entries and missing
id-files, raises on a missing token.
bot_bottle/git_gate.py: 70% -> 99% (unit only). Two remaining partial
branches are inner conditionals on the alias/owner-repo paths.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
Third per-module ratchet under ADR 0004. Add a parsing/serialization
suite for the egress engine's core:
- route validation rejections: payload/route shape, host, auth pairing,
git block, every matches sub-field (paths/methods/headers type +
regex-compile + unknown-key), and the dlp block (detector type/name,
outbound_on_match, unknown key)
- a full valid route round-trips; detectors:false disables
- parse_config log-level validation + load_config invalid-YAML
- route_to_yaml_dict: minimal/auth/git/dlp/matches with default-omission
- evaluate_matches: exact/prefix/regex paths, method filter, exact +
regex header matching (match and non-match)
egress_addon_core.py: 84% -> 99%. The two remaining missed statements
are defensive guards (an unreachable separator-return and a
no-matching-path-type fallthrough).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
First per-module ratchet under ADR 0004. Extend the adapter flow suite
to cover the remaining behavioural gaps:
- inbound response DLP: injection block (403), warn (logged, forwarded),
and LOG_FULL response logging
- WebSocket inbound (server->client) scanning: injection kills the
connection; warn does not; no-websocket is a no-op
- redaction scrubs the token in a header and the request path, not just
the body
- supervise queue-write OSError fails closed (403)
- _token_allow_timeout_from_env: unset/valid/non-numeric/non-positive
- SIGHUP handler reloads routes; a reload failure keeps the last good
config
- LOG_FULL logs the forwarded request
egress_addon.py: 76% -> 94%. The remaining misses are the low-value
edges (no-SIGHUP platform, hostname-redaction-fails-closed) called out
in the egress adapter PR.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
Adopt ADR 0004: stop chasing a single global coverage number and
measure what matters instead.
- Omit the genuinely-interactive `cli/init.py` shell (read_tty_line
prompt loops) alongside the existing `cli/tui.py`, with a rationale
comment in .coveragerc. Subprocess/backend orchestration is NOT
omitted — it stays visible and is scored via the integration suite.
- scripts/coverage.sh runs unit + integration under one coverage
measurement (the policy's yardstick) and can report the critical
security/logic core held to the >=90% target.
- scripts/diff_coverage.py is a stdlib-only gate (no diff-cover dep):
new/changed executable lines must be >=90% covered. This is the
enforced regression guard; the global number is informational.
- CI gains a `coverage` job: combined report + the diff-coverage gate.
- Unit-test `cli/__init__.py` dispatch/exit-code mapping (it's logic,
not I/O, so it earns tests rather than an omit).
Combined unit+integration coverage now reports 83% global / 87% across
the critical modules; per-module ratcheting toward 90% is the ongoing
work this policy frames.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
The mitmproxy adapter `egress_addon.py` was omitted from coverage
because it can't import on the host (mitmproxy is sidecar-only) and
only its log-redaction helpers were exercised. Add a request/response
flow suite that stubs mitmproxy and drives the adapter glue:
introspection, allowlist enforcement, auth strip+inject, git
push/fetch blocking, the outbound-DLP block/redact/supervise policy
branches (including the operator approval round-trip), inbound
response scanning, and WebSocket frame scanning.
Removes the `bot_bottle/egress_addon.py` omit from `.coveragerc`;
the adapter now reports ~76% covered.
Closes#286
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NkwFXLFff9PYPy4wgVBJp9
Three UX improvements requested in #270 review:
- filter_multiselect: Space toggles selection, Enter confirms (was both)
- bottle picker: bottles with extends chains show ancestry labels
(e.g. 'claude-dev <- bot-bottle-dev <- dev') for at-a-glance lineage
- preflight: replaces key-value summary with YAML of the resolved manifest
Tab switches focus to the selected-order panel; K/J shift the
highlighted item up/down; Space/Enter removes it. The filter list dims
while the order panel is active. Help line updates per focus mode.
- `bottle:` in agent frontmatter is now optional; agents without it
are portable and require bottles to be selected at launch.
- Adds `filter_multiselect` to `tui.py`: multi-select picker with
ordered selection list, Space/Enter to toggle, Ctrl-D to confirm.
- `ManifestIndex` gains `all_bottle_names` and `load_for_agent` accepts
`bottle_names: tuple[str, ...]` to merge bottles in order at runtime.
- `merge_bottles_runtime` in `manifest_extends.py` applies the same
field-merge rules as `extends:` to pre-resolved bottle objects.
- `BottleSpec` gains `bottle_names`; `_validate` and `write_launch_metadata`
thread it through so `resume` replays the same bottle configuration.
- `cmd_start` shows the bottle multiselect after agent selection,
pre-populated from the agent's `bottle:` field when present.
- Existing agents with `bottle:` declared continue to work unchanged.
Allow extends: to accept a list of bottle names in addition to a plain
string. Parents are resolved independently and folded left-to-right
into a single combined parent before the child is merged on top, so
orthogonal concerns (base env, networking, agent provider) can live in
separate bottles without forcing a linear chain.
Merge rules for the parent fold: env dict-merge with later winning on
collision; git-gate.user per-field overlay; git-gate.repos union by
name with later winning per-field on same name; egress.routes
concatenated; all scalar fields (supervise, agent_provider, egress.log)
use last-wins. The existing child-wins-over-all-parents rule is
unchanged. Cycle detection, diamond deduplication, and missing/invalid
parent errors all work across multi-parent graphs.
Closes#268
Closes#258.
`egress_render_routes` and `_render_match_entry` now pass all manifest
strings (host, auth_scheme, token_env, path/header values) through
`_yaml_str_escape` before interpolating into double-quoted YAML scalars,
preventing stray `"` or newlines from corrupting routes.yaml.
`git_gate_render_gitconfig` now calls `_gitconfig_validate_value` on
each Upstream value (and the derived alias) before writing the
`insteadOf` line, rejecting any value containing a newline that would
inject arbitrary gitconfig keys.
The constant now covers the daemon path, the HTTP backend access-hook,
and the git http-backend CGI subprocess, so 'daemon' in the name was
too narrow. Updated the comment to list all three current uses.
Closes#255. Without timeouts, a hung upstream during the access-hook
or git http-backend CGI call (git_http_backend.py) and a stalled Gitea
API during deploy-key provisioning (contrib/gitea/deploy_key_provisioner.py)
could wedge a sidecar indefinitely. Adds GIT_HTTP_BACKEND_TIMEOUT_SECS
(30s) to both subprocess.run calls in the HTTP backend, mirroring the
existing GIT_GATE_DAEMON_TIMEOUT_SECS on the daemon path. Adds
_API_TIMEOUT_SECS (30s) and _KEYGEN_TIMEOUT_SECS (10s) to the Gitea
provisioner's urlopen and ssh-keygen calls. Tests verify the timeout
values are forwarded in all four call sites.
Introduce _RpcClientError and _RpcInternalError as distinct subclasses
of _RpcError so the dispatcher can handle bad requests and server-side
faults differently — returning client errors verbatim and logging
internal faults with their cause before replying ERR_INTERNAL.
Wrap write_proposal and archive_proposal IO with _RpcInternalError
so OS failures surface through the typed path instead of the bare
Exception fallback. All existing raise _RpcError(...) call sites
converted to _RpcClientError.
Closes#253
An empty or non-numeric Status: header from git http-backend raised
ValueError/IndexError that escaped the handler thread. Wrap the parse
in a try/except and fall back to HTTP 500 instead.
Closes#254
`container build` resolves -f relative to the current working directory,
not the build context, so builds failed from any cwd other than the repo
root. Anchor a relative Dockerfile to the context before passing it.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
_log_request and _log_response wrote headers and bodies to stderr verbatim.
_log_request also included the sidecar-injected upstream Authorization value,
exposing live bearer tokens on every allowed request under LOG_FULL.
Apply redact_tokens to all header values and bodies in both log functions;
exclude the authorization header from _log_request entirely since its value
is always a live sidecar-injected credential by the time _log_request runs.
Closes#257
EgressPlan gains a `canary: str` field (default "") populated in Egress.prepare()
using secrets.token_urlsafe(32). Each launched bottle:
- sidecar receives EGRESS_TOKEN_CANARY=<value> (literal env entry, scanned by
existing known-secrets detector without any detector code changes)
- agent receives BOT_BOTTLE_CANARY=<value> (visible fake secret that signals
exfiltration with zero false positives if it appears in outbound traffic)
Docker compose and macos-container backends updated; smolmachines shares docker
compose and so picks this up automatically. Unit tests cover canary uniqueness,
detection via scan_known_secrets, and EgressPlan backward-compat default.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- _alnum_projection(): strip non-alphanumeric chars for separator-injection detection
- scan_known_secrets() gains two extra passes per secret after exact-variant matching:
alnum-projection exact match (catches hyphens/spaces between secret chars) and a
sliding-window partial-match scan (catches chunked substrings ≥ PARTIAL_MATCH_MIN_LEN)
- scan_known_secrets() accepts sensitive_prefixes param (default ("EGRESS_TOKEN_",))
so redact_tokens and call-sites can extend the scanned env-var prefix set
- scan_entropy() warn-only detector flagging windows with Shannon entropy ≥ 5.5 bits/char
- "entropy" added to OUTBOUND_DETECTOR_NAMES; scan_outbound opts it in only when
explicitly listed in dlp.outbound_detectors (never part of the default "all" set)
- scan_outbound reads BOT_BOTTLE_SENSITIVE_PREFIXES from environ to extend
scan_known_secrets beyond EGRESS_TOKEN_* without schema changes
- Binary bodies decoded via latin-1 fallback (bijective byte↔codepoint) instead
of utf-8 errors=replace, preserving ASCII secret strings in binary payloads
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>