Quality evaluation: main repository scorecard #154

New Issue

2026-06-02T14:42:12-04:00

didericis-codex commented

2026-06-02 14:42:12 -04:00

1. Chain of Thought Analysis

Scope evaluated: bot-bottle on main, primarily Python. I inspected the core manifest model, backend abstraction, Docker/smolmachines launch paths, git-gate, egress/pipelock security logic, and representative tests. Unit suite result: 863 tests passed.
Architecture: strong but not exemplary. BottleBackend, BottlePlan, Bottle, and backend-specific subclasses in bot_bottle/backend/__init__.py give the repo a coherent backend abstraction. Egress and pipelock planning are separated into pure host-side plan builders and backend lifecycle code. The main weakness is mixed-domain manifest/git responsibility: GitEntry still carries repository identity, upstream parsing, SSH identity file, known-host trust, and alias rewrite fields in bot_bottle/manifest.py. That is modular, but the abstraction leaks across manifest parsing, git-gate rendering, compose mounts, and backend validation.
Readability: above pass. The project uses expressive names, high-context docstrings, dataclasses, and clear PRD comments. Good examples are egress_resolve_token_values in bot_bottle/egress.py and decide in bot_bottle/egress_addon_core.py. The penalty is size and accreted complexity: bot_bottle/cli/dashboard.py is 2103 lines, bot_bottle/manifest.py is 1036 lines, and several modules encode historical PRD migrations inline.
Resilience: strong fail-closed behavior in security surfaces. Manifest parsing rejects malformed fields with contextual ManifestErrors, egress blocks unset tokens, and git-gate refuses fetch/push when known-host files are absent. The downgrade is broad exception swallowing in Docker teardown: bot_bottle/backend/docker/launch.py catches BaseException and discards it, which can hide cleanup failures and resource leaks.
Testability: high. There are 79 Python test files and 863 unit tests passing. Many important behaviors are pure and directly tested: route parsing/decision logic, pipelock YAML rendering, manifest validation, compose rendering, and backend parity. The remaining gap is side-effect-heavy launch/provisioning: subprocess, Docker, smolvm, and generated shell scripts are mocked or rendered-tested, but not fully covered through realistic integration for every failure mode.
SecOps: good but not exemplary. The repo is security-oriented: secrets avoid dataclass storage in EgressPlan.token_env_map, token values are injected through process env rather than compose files, pipelock blocks request bodies and scans all headers, and egress blocks smart-HTTP git pushes. The serious issue is shell rendering in git-gate: bot_bottle/git_gate.py single-quotes manifest-controlled u.name and u.upstream_url, but GitEntry.Name is only checked for non-empty in bot_bottle/manifest.py. A quote in a manifest value can break the generated shell script.

2. Normalized Score Report

{
  "evaluation_metadata": {
    "target_scope": "bot-bottle repository on main",
    "primary_language": "Python"
  },
  "metrics": {
    "architecture_and_modularity": 4,
    "readability_and_maintainability": 4,
    "error_handling_and_resilience": 4,
    "testability_and_mocking": 4,
    "security_and_performance": 3
  },
  "composite_score": 3.8
}

3. Atomic Refactoring Playbook

High Priority (To lift Score 1/2 to 3):
- No dimension is currently at Score 1/2. The repo is above the minimum quality gate.
Medium Priority (To lift Score 3 to 4/5):
- Harden git-gate shell rendering in bot_bottle/git_gate.py. Replace manual single-quote assumptions with shlex.quote and add manifest validation for safe repo names.
- Replace silent teardown swallowing in bot_bottle/backend/docker/launch.py with structured warning capture.
- Split bot_bottle/manifest.py into manifest_git.py, manifest_egress.py, and manifest_agent.py, keeping Manifest.from_json_obj as the facade.
- Extract dashboard state/update logic from bot_bottle/cli/dashboard.py into a small model module, for example dashboard_state.py, and leave the CLI file mostly as rendering/input orchestration.
- Add regression tests for shell escaping in git_gate_render_entrypoint: malicious Name and Upstream values containing ', whitespace, and shell metacharacters should either be rejected by manifest validation or rendered safely with shlex.quote.

### 1. Chain of Thought Analysis 1. Scope evaluated: `bot-bottle` on `main`, primarily Python. I inspected the core manifest model, backend abstraction, Docker/smolmachines launch paths, git-gate, egress/pipelock security logic, and representative tests. Unit suite result: `863 tests` passed. 2. Architecture: strong but not exemplary. `BottleBackend`, `BottlePlan`, `Bottle`, and backend-specific subclasses in `bot_bottle/backend/__init__.py` give the repo a coherent backend abstraction. Egress and pipelock planning are separated into pure host-side plan builders and backend lifecycle code. The main weakness is mixed-domain manifest/git responsibility: `GitEntry` still carries repository identity, upstream parsing, SSH identity file, known-host trust, and alias rewrite fields in `bot_bottle/manifest.py`. That is modular, but the abstraction leaks across manifest parsing, git-gate rendering, compose mounts, and backend validation. 3. Readability: above pass. The project uses expressive names, high-context docstrings, dataclasses, and clear PRD comments. Good examples are `egress_resolve_token_values` in `bot_bottle/egress.py` and `decide` in `bot_bottle/egress_addon_core.py`. The penalty is size and accreted complexity: `bot_bottle/cli/dashboard.py` is 2103 lines, `bot_bottle/manifest.py` is 1036 lines, and several modules encode historical PRD migrations inline. 4. Resilience: strong fail-closed behavior in security surfaces. Manifest parsing rejects malformed fields with contextual `ManifestError`s, egress blocks unset tokens, and git-gate refuses fetch/push when known-host files are absent. The downgrade is broad exception swallowing in Docker teardown: `bot_bottle/backend/docker/launch.py` catches `BaseException` and discards it, which can hide cleanup failures and resource leaks. 5. Testability: high. There are 79 Python test files and 863 unit tests passing. Many important behaviors are pure and directly tested: route parsing/decision logic, pipelock YAML rendering, manifest validation, compose rendering, and backend parity. The remaining gap is side-effect-heavy launch/provisioning: subprocess, Docker, smolvm, and generated shell scripts are mocked or rendered-tested, but not fully covered through realistic integration for every failure mode. 6. SecOps: good but not exemplary. The repo is security-oriented: secrets avoid dataclass storage in `EgressPlan.token_env_map`, token values are injected through process env rather than compose files, pipelock blocks request bodies and scans all headers, and egress blocks smart-HTTP git pushes. The serious issue is shell rendering in git-gate: `bot_bottle/git_gate.py` single-quotes manifest-controlled `u.name` and `u.upstream_url`, but `GitEntry.Name` is only checked for non-empty in `bot_bottle/manifest.py`. A quote in a manifest value can break the generated shell script. ### 2. Normalized Score Report ```json { "evaluation_metadata": { "target_scope": "bot-bottle repository on main", "primary_language": "Python" }, "metrics": { "architecture_and_modularity": 4, "readability_and_maintainability": 4, "error_handling_and_resilience": 4, "testability_and_mocking": 4, "security_and_performance": 3 }, "composite_score": 3.8 } ``` ### 3. Atomic Refactoring Playbook * **High Priority (To lift Score 1/2 to 3):** - [x] No dimension is currently at Score 1/2. The repo is above the minimum quality gate. * **Medium Priority (To lift Score 3 to 4/5):** - [x] Harden git-gate shell rendering in `bot_bottle/git_gate.py`. Replace manual single-quote assumptions with `shlex.quote` and add manifest validation for safe repo names. - [x] Replace silent teardown swallowing in `bot_bottle/backend/docker/launch.py` with structured warning capture. - [x] Split `bot_bottle/manifest.py` into `manifest_git.py`, `manifest_egress.py`, and `manifest_agent.py`, keeping `Manifest.from_json_obj` as the facade. - [x] Extract dashboard state/update logic from `bot_bottle/cli/dashboard.py` into a small model module, for example `dashboard_state.py`, and leave the CLI file mostly as rendering/input orchestration. - [x] Add regression tests for shell escaping in `git_gate_render_entrypoint`: malicious `Name` and `Upstream` values containing `'`, whitespace, and shell metacharacters should either be rejected by manifest validation or rendered safely with `shlex.quote`.

didericis commented

2026-06-02 14:45:23 -04:00

The mixed-domain manifest/git responsibility is intentional: we want to ensure that all git repositories MUST pass through git-gate. I agree the manifest shape is a bit strange/confusing. Would simply changing the key name to git-gate and making the key values more bespoke so it's clear this is an abstraction above normal git/ssh features improve that issue?

The mixed-domain manifest/git responsibility is intentional: we want to ensure that all git repositories MUST pass through git-gate. I agree the manifest shape is a bit strange/confusing. Would simply changing the key name to `git-gate` and making the key values more bespoke so it's clear this is an abstraction above normal git/ssh features improve that issue?

didericis-claude commented

2026-06-02 22:27:00 -04:00

The intentional routing is the right call — the abstraction boundary being blurry in the manifest is the real issue, not the underlying architecture.

On the rename: yes, git-gate would be clearer than git. Right now a reader seeing git.remotes in a manifest could reasonably think it maps to standard git remote config; a git-gate key makes explicit that this is a specialized abstraction with its own lifecycle, not a pass-through to git.

For "more bespoke" values, the main leverage point is IdentityFile: that is a credential path the gate manages, not something a manifest author should need to know about. If git-gate owns credential provisioning entirely, IdentityFile could be dropped from the manifest schema and injected at prepare-time instead — that would tighten the abstraction further and remove one more footgun for manifest authors.

One orthogonal thing worth addressing regardless of the rename: the shell escaping gap flagged in the evaluation. git_gate_render_entrypoint() passes name and upstream_url into a POSIX shell script; if a manifest declares a repo name containing a single quote, the generated script will misparse. shlex.quote() on both values before interpolation closes that, and a regression test with a pathological name (e.g. o'reilly) would pin it against regressions.

The intentional routing is the right call — the abstraction boundary being blurry in the manifest is the real issue, not the underlying architecture. On the rename: yes, `git-gate` would be clearer than `git`. Right now a reader seeing `git.remotes` in a manifest could reasonably think it maps to standard git remote config; a `git-gate` key makes explicit that this is a specialized abstraction with its own lifecycle, not a pass-through to git. For "more bespoke" values, the main leverage point is `IdentityFile`: that is a credential path the gate manages, not something a manifest author should need to know about. If git-gate owns credential provisioning entirely, `IdentityFile` could be dropped from the manifest schema and injected at prepare-time instead — that would tighten the abstraction further and remove one more footgun for manifest authors. One orthogonal thing worth addressing regardless of the rename: the shell escaping gap flagged in the evaluation. `git_gate_render_entrypoint()` passes `name` and `upstream_url` into a POSIX shell script; if a manifest declares a repo name containing a single quote, the generated script will misparse. `shlex.quote()` on both values before interpolation closes that, and a regression test with a pathological name (e.g. `o'reilly`) would pin it against regressions.

didericis-claude referenced this issue

2026-06-02 22:30:42 -04:00

Harden git_gate.py shell rendering with shlex.quote and name validation #155

didericis-claude referenced this issue

2026-06-02 22:30:42 -04:00

Replace silent BaseException swallowing in Docker teardown with structured warning capture #156

didericis-claude referenced this issue

2026-06-02 22:30:43 -04:00

Refactor manifest.py into domain-specific modules #157

didericis-claude referenced this issue

2026-06-02 22:30:43 -04:00

Extract dashboard state logic into a separate model module #158

didericis-claude referenced this issue

2026-06-02 22:30:43 -04:00

Add regression tests for shell escaping with malicious Name/Upstream values #159

didericis-claude commented

2026-06-03 12:12:32 -04:00

Quality Eval — `main` (composite 4.6 / 5)

Dimension	Score
Architecture & Modularity	4
Readability & Maintainability	5
Error Handling & Resilience	4
Testability & Mocking	5
Security & Performance	5

Findings

Architecture (4) — Strong layering: abstract dataclasses/ABCs in bot_bottle/{egress,pipelock,git_gate,supervise,deploy_key_provisioner}.py, concrete impls under bot_bottle/backend/docker/*, manifest split across manifest_{agent,egress,git,extends,schema,loader,util}.py. Clean prepare → launch separation (e.g. backend/docker/compose.py is a pure renderer above its lifecycle helpers). Contrib pattern (deploy_key_provisioner.py:36–52) lazy-imports per-provider modules. Marks deducted because bot_bottle/cli/dashboard.py is 1741 lines mixing curses rendering, key dispatch, and multiple operator-edit flows; the recent dashboard_model.py extraction (f0ca4e3) is right-direction but incomplete — view code still calls approve() directly (dashboard.py:1610).
Readability (5) — Module docstrings universally lead with PRD references and explain why (pipelock.py:78–99 on BIP-39 seed-phrase rationale; sidecar_init.py:57–68 on EGRESS_TOKEN_* stripping; git_gate.py:183–192 on chmod || true). Self-documenting names, consistent from __future__ import annotations, only one TODO/FIXME across the entire package (cli/start.py:153).
Resilience (4) — Custom Die (log.py:21–31) carries the message so curses callers can re-surface it after endwin(). Structured JSON-RPC error codes in supervise_server.py:60–66. Gitea provisioner returns typed HTTP/URL errors with body context (gitea/deploy_key_provisioner.py:66–75) and treats 404-on-delete as success. Marks deducted: a handful of except OSError: pass cleanup paths (dashboard.py:322) and three broad except Exception are pragmatic but unstructured; log.py is print()-to-stderr with no levels — fine for a CLI, limits the anchor.
Testability (5) — Strong test pyramid: unit (parsers, provisioners, planning, manifest, dashboard model, capability-apply, smolmachines), integration (pipelock allow/block, sidecar bundle bringup, sandbox escape), canaries. ABCs (Egress, GitGate, DeployKeyProvisioner) and frozen-dataclass plans make mocking trivial. Pure renderers (bottle_plan_to_compose, yaml_subset.parse_yaml_subset, pipelock_effective_allowlist) are deterministic. Callable injection in prepare_with_preflight (cli/start.py:77–104) lets curses and stderr/stdin call sites share a launch core.
SecOps (5) — This is a security project and the rigor shows.
- No shell=True anywhere; no hardcoded credentials.
- Token values never enter dataclasses (egress.py:95–99) — only env-var names are carried (token_env_map).
- Bundle PID-1 supervisor strips EGRESS_TOKEN_* from non-egress daemons to prevent pipelock from DLP-403ing egress's own injected Authorization (sidecar_init.py:57–80).
- Proposals/responses are atomically written 0o600 (supervise.py:249–256).
- Deploy keys are generated in tempfile.TemporaryDirectory() (gitea/deploy_key_provisioner.py:32–45).
- pre-receive gitleaks-scans only the new commit range, not full ancestry (git_gate.py:253–270) — both safe and performant.
- SSH push uses StrictHostKeyChecking=yes -o IdentitiesOnly=yes -o BatchMode=yes (git_gate.py:281).
- Hand-rolled yaml_subset deliberately avoids YAML coercion gotchas (Norway, octal dates).
- supervise_server enforces a Content-Length ceiling before reading the body (supervise_server.py:612–614).

Refactoring playbook (lift 3 → 4/5)

Split bot_bottle/cli/dashboard.py (1741 lines) — continue the dashboard_model.py extraction: move pure rendering (_draw_picker_modal, _preflight_modal, _backend_picker_modal, _detail_view, the frame painter near :1500) into dashboard_view.py; move orchestration flows (_operator_edit_flow, _stop_bottle_flow, new-agent flow) into dashboard_flows.py. cmd_dashboard becomes a thin dispatch loop and approve() stops being called from inside view code.
Promote logging to a structured facade in bot_bottle/log.py — add level + event + key/value fields (still stdlib logging). The three broad except Exception sites (dashboard.py:1096,1330; supervise_server.py:630) gain traceback IDs so a curses crash leaves something diagnostic in the audit log.
Centralize "best-effort cleanup" — try: os.unlink(...); except OSError: pass recurs (e.g. dashboard.py:320–323). A small unlink_if_exists(path) helper (or contextlib.suppress(OSError) at the call site) removes noise and gives one place to add an info()/metric later.
Narrow gitea/deploy_key_provisioner.py:117–121 — _read_error_body's bare-Exception catch is fine in intent but tighten to (OSError, UnicodeDecodeError) (the only things exc.read().decode(...) can plausibly raise) so the surface is explicit.
Extract per-daemon contributions in backend/docker/compose.py:143–261 — _sidecar_bundle_service walks four sub-plans and mutates parallel env/volumes accumulators. A _pipelock_contribution(plan) -> (env, volumes, aliases)-style helper per daemon shortens the function, makes each contribution independently unit-testable, and lets the next sidecar slot in without touching the aggregator.

— Generated by /quality-eval (Staff-engineer rubric, 5 dims, scored against main @ 83463f1).

## Quality Eval — `main` (composite **4.6 / 5**) | Dimension | Score | | :-- | :-- | | Architecture & Modularity | 4 | | Readability & Maintainability | 5 | | Error Handling & Resilience | 4 | | Testability & Mocking | 5 | | Security & Performance | 5 | ### Findings - **Architecture (4)** — Strong layering: abstract dataclasses/ABCs in `bot_bottle/{egress,pipelock,git_gate,supervise,deploy_key_provisioner}.py`, concrete impls under `bot_bottle/backend/docker/*`, manifest split across `manifest_{agent,egress,git,extends,schema,loader,util}.py`. Clean `prepare → launch` separation (e.g. `backend/docker/compose.py` is a pure renderer above its lifecycle helpers). Contrib pattern (`deploy_key_provisioner.py:36–52`) lazy-imports per-provider modules. **Marks deducted** because `bot_bottle/cli/dashboard.py` is 1741 lines mixing curses rendering, key dispatch, and multiple operator-edit flows; the recent `dashboard_model.py` extraction (`f0ca4e3`) is right-direction but incomplete — view code still calls `approve()` directly (`dashboard.py:1610`). - **Readability (5)** — Module docstrings universally lead with PRD references and explain *why* (`pipelock.py:78–99` on BIP-39 seed-phrase rationale; `sidecar_init.py:57–68` on `EGRESS_TOKEN_*` stripping; `git_gate.py:183–192` on `chmod || true`). Self-documenting names, consistent `from __future__ import annotations`, only one TODO/FIXME across the entire package (`cli/start.py:153`). - **Resilience (4)** — Custom `Die` (`log.py:21–31`) carries the message so curses callers can re-surface it after `endwin()`. Structured JSON-RPC error codes in `supervise_server.py:60–66`. Gitea provisioner returns typed HTTP/URL errors with body context (`gitea/deploy_key_provisioner.py:66–75`) and treats 404-on-delete as success. **Marks deducted**: a handful of `except OSError: pass` cleanup paths (`dashboard.py:322`) and three broad `except Exception` are pragmatic but unstructured; `log.py` is `print()`-to-stderr with no levels — fine for a CLI, limits the anchor. - **Testability (5)** — Strong test pyramid: unit (parsers, provisioners, planning, manifest, dashboard model, capability-apply, smolmachines), integration (pipelock allow/block, sidecar bundle bringup, sandbox escape), canaries. ABCs (`Egress`, `GitGate`, `DeployKeyProvisioner`) and frozen-dataclass plans make mocking trivial. Pure renderers (`bottle_plan_to_compose`, `yaml_subset.parse_yaml_subset`, `pipelock_effective_allowlist`) are deterministic. Callable injection in `prepare_with_preflight` (`cli/start.py:77–104`) lets curses and stderr/stdin call sites share a launch core. - **SecOps (5)** — This is a security project and the rigor shows. - No `shell=True` anywhere; no hardcoded credentials. - Token values never enter dataclasses (`egress.py:95–99`) — only env-var *names* are carried (`token_env_map`). - Bundle PID-1 supervisor strips `EGRESS_TOKEN_*` from non-egress daemons to prevent pipelock from DLP-403ing egress's own injected `Authorization` (`sidecar_init.py:57–80`). - Proposals/responses are atomically written 0o600 (`supervise.py:249–256`). - Deploy keys are generated in `tempfile.TemporaryDirectory()` (`gitea/deploy_key_provisioner.py:32–45`). - `pre-receive` gitleaks-scans only the *new* commit range, not full ancestry (`git_gate.py:253–270`) — both safe and performant. - SSH push uses `StrictHostKeyChecking=yes -o IdentitiesOnly=yes -o BatchMode=yes` (`git_gate.py:281`). - Hand-rolled `yaml_subset` deliberately avoids YAML coercion gotchas (Norway, octal dates). - `supervise_server` enforces a `Content-Length` ceiling before reading the body (`supervise_server.py:612–614`). ### Refactoring playbook (lift 3 → 4/5) - [ ] **Split `bot_bottle/cli/dashboard.py` (1741 lines)** — continue the `dashboard_model.py` extraction: move pure rendering (`_draw_picker_modal`, `_preflight_modal`, `_backend_picker_modal`, `_detail_view`, the frame painter near `:1500`) into `dashboard_view.py`; move orchestration flows (`_operator_edit_flow`, `_stop_bottle_flow`, new-agent flow) into `dashboard_flows.py`. `cmd_dashboard` becomes a thin dispatch loop and `approve()` stops being called from inside view code. - [ ] **Promote logging to a structured facade in `bot_bottle/log.py`** — add `level` + `event` + key/value fields (still stdlib `logging`). The three broad `except Exception` sites (`dashboard.py:1096,1330`; `supervise_server.py:630`) gain traceback IDs so a curses crash leaves something diagnostic in the audit log. - [ ] **Centralize "best-effort cleanup"** — `try: os.unlink(...); except OSError: pass` recurs (e.g. `dashboard.py:320–323`). A small `unlink_if_exists(path)` helper (or `contextlib.suppress(OSError)` at the call site) removes noise and gives one place to add an `info()`/metric later. - [ ] **Narrow `gitea/deploy_key_provisioner.py:117–121`** — `_read_error_body`'s bare-`Exception` catch is fine in intent but tighten to `(OSError, UnicodeDecodeError)` (the only things `exc.read().decode(...)` can plausibly raise) so the surface is explicit. - [ ] **Extract per-daemon contributions in `backend/docker/compose.py:143–261`** — `_sidecar_bundle_service` walks four sub-plans and mutates parallel `env`/`volumes` accumulators. A `_pipelock_contribution(plan) -> (env, volumes, aliases)`-style helper per daemon shortens the function, makes each contribution independently unit-testable, and lets the next sidecar slot in without touching the aggregator. — Generated by `/quality-eval` (Staff-engineer rubric, 5 dims, scored against `main` @ `83463f1`).

didericis closed this issue

2026-06-03 12:14:19 -04:00

didericis added the Kind/Testing label 2026-06-04 21:12:35 -04:00

Sign in to join this conversation.

3 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: didericis/bot-bottle#154

Quality evaluation: main repository scorecard #154

1. Chain of Thought Analysis

2. Normalized Score Report

3. Atomic Refactoring Playbook

Quality Eval — main (composite 4.6 / 5)

Findings

Refactoring playbook (lift 3 → 4/5)

Quality Eval — `main` (composite 4.6 / 5)