From 6315456a5910af1516eedd5208062cf764ebef77 Mon Sep 17 00:00:00 2001 From: codex Date: Tue, 2 Jun 2026 07:23:04 +0000 Subject: [PATCH] docs(prd): add manifest schema boundaries --- docs/prds/0033-manifest-schema-boundaries.md | 169 +++++++++++++++++++ 1 file changed, 169 insertions(+) create mode 100644 docs/prds/0033-manifest-schema-boundaries.md diff --git a/docs/prds/0033-manifest-schema-boundaries.md b/docs/prds/0033-manifest-schema-boundaries.md new file mode 100644 index 0000000..c9b7ab5 --- /dev/null +++ b/docs/prds/0033-manifest-schema-boundaries.md @@ -0,0 +1,169 @@ +# PRD 0033: Manifest Schema Boundaries + +- **Status:** Draft +- **Author:** didericis-codex +- **Created:** 2026-06-02 +- **Issue:** #117 + +## Summary + +Split the manifest loader's schema validation, filesystem loading, `extends:` +resolution, and compatibility passthrough policy into named internal boundaries +without changing the public manifest format. The goal is to make +`bot_bottle/manifest.py` cheaper to extend and review while preserving the +strict validation behavior that keeps manifest mistakes visible. + +## Problem + +`bot_bottle/manifest.py` has become a broad schema surface. It owns dataclass +models, per-field validators, per-section unknown-key policy, Markdown +frontmatter loading, two-pass bottle inheritance, merge semantics, and +effective agent-to-bottle overlays in one file. The logic is deterministic and +well covered, but the number of concerns makes schema changes expensive: +reviewers have to re-derive loader behavior, parse-time validation, and +post-parse composition rules together. + +One specific coupling is especially easy to miss: agent Markdown files are +allowed to double as Claude Code subagent files, so the manifest parser accepts +and ignores Claude Code frontmatter fields such as `name`, `description`, +`model`, `color`, and `memory`. That compatibility rule is encoded as a +passthrough allowlist alongside bot-bottle's own agent schema. If Claude Code +adds a frontmatter field and users start sharing files between +`~/.claude/agents/` and `.bot-bottle/agents/`, bot-bottle raises +`ManifestError` until the local passthrough policy is updated. + +The current shape is workable, but it creates unnecessary risk for future +manifest features. A new field can accidentally mix parsing, inheritance, and +compatibility concerns in the same edit, or update one entry path +(`from_json_obj`) without matching the Markdown path (`from_md_dirs`). + +## Goals / Success Criteria + +- Preserve the existing public manifest schema and runtime behavior. +- Keep `Manifest`, `Bottle`, `Agent`, `GitEntry`, `GitUser`, `AgentProvider`, + `EgressRoute`, `EgressConfig`, and `PipelockRoutePolicy` import-compatible + from `bot_bottle.manifest`. +- Move Markdown file discovery and frontmatter loading behind a small internal + loader boundary with tests that show `$HOME` bottles, `$HOME` agents, `$CWD` + agent overrides, and ignored `$CWD` bottles still behave as before. +- Move bottle `extends:` resolution and merge rules behind a named internal + resolver boundary with tests for inheritance, replacement, cycle detection, + missing parents, and per-field `git.user` overlays. +- Centralize top-level allowed-key policy for bottle and agent schemas so + unknown-key errors remain strict and the allowed set is visible in one place + per schema. +- Make Claude Code passthrough fields a named compatibility policy with focused + tests that distinguish accepted passthrough keys from bot-bottle schema keys + and true typos. +- Keep both entry points, `Manifest.from_json_obj` and + `Manifest.from_md_dirs`, covered by tests for shared validation and shared + inheritance behavior. + +## Non-goals + +- No manifest format changes. +- No migration away from Markdown frontmatter or the stdlib-only YAML subset + parser. +- No dependency on Pydantic, PyYAML, JSON Schema, or another schema framework. +- No relaxation of strict unknown-key validation for bot-bottle fields. +- No provider-specific workspace, auth, launch, or egress changes. +- No user-facing CLI behavior changes. + +## Scope + +In scope: + +- Internal module organization for manifest loading and composition. +- Validator helpers or schema-policy helpers that reduce duplicated + unknown-key and type-checking logic. +- Focused regression tests around the two existing load paths. +- Documentation comments that clarify compatibility policy where it is encoded. + +Out of scope: + +- Renaming public dataclass fields or changing their capitalization. +- Reworking callers outside the manifest boundary except for import updates + that are mechanically required by an internal split. +- Adding new manifest fields. +- Changing how `bot-bottle.json` legacy-file errors are reported. + +## Design + +Keep `bot_bottle.manifest` as the public facade. Existing imports should +continue to work from that module, even if implementation moves into internal +modules such as: + +- `bot_bottle/manifest_model.py` for dataclasses and field-level parsing. +- `bot_bottle/manifest_loader.py` for filesystem layout, Markdown + frontmatter loading, stale legacy-file checks, and `$CWD` override rules. +- `bot_bottle/manifest_extends.py` for raw-bottle inheritance, cycle checks, + and merge semantics. +- `bot_bottle/manifest_schema.py` for allowed-key sets, passthrough policy, + and small validation helpers. + +The exact filenames are not required. The required boundary is conceptual: +raw input loading, schema validation, bottle inheritance, and effective +agent-to-bottle overlays should be separable when reading and testing the code. + +`Manifest.from_json_obj` should continue to accept a raw JSON-like dict and +feed the same raw bottle resolver used by Markdown loading. `Manifest.from_md_dirs` +should perform only filesystem discovery and Markdown parsing before passing +the same raw sections into the same validator/composer path. That shared path +prevents a future schema field from working in one entry point but not the +other. + +Claude Code passthrough fields should be represented as an explicit +compatibility allowlist, named as such, and documented near the agent schema +policy. The parser should still ignore those fields after validation. Tests +should cover every passthrough field currently accepted and at least one +unknown field that remains an error. + +The `extends:` resolver should remain raw-dict based until after inheritance is +resolved. Merge rules stay unchanged: + +- scalar fields use child value when present. +- `env` merges by key with child values winning. +- `git.remotes` merges by upstream host, with child entries replacing duplicate + hosts and explicit empty maps clearing inherited remotes. +- `git.user` overlays per field. +- `egress` remains full-replace when declared by the child. +- cycles, missing parents, and self-reference remain `ManifestError`s. + +## Implementation Chunks + +1. Add focused characterization tests for agent allowed keys, Claude Code + passthrough fields, and parity between `from_json_obj` and Markdown loading. +2. Extract allowed-key and compatibility policy helpers while keeping + `bot_bottle.manifest` as the import surface. +3. Extract raw Markdown loading into a loader boundary and rerun existing + PRD 0011 tests unchanged. +4. Extract bottle inheritance and merge rules into a resolver boundary and + rerun existing PRD 0025 tests unchanged. +5. Trim `bot_bottle.manifest` to the public facade and model composition, + leaving compatibility imports for existing callers. + +Each chunk should be mergeable on its own and should keep the test suite green. + +## Testing Strategy + +Run the existing manifest-focused unit tests after each chunk: + +- `tests/unit/test_manifest_md_load.py` +- `tests/unit/test_manifest_extends.py` +- `tests/unit/test_manifest_git.py` +- `tests/unit/test_manifest_git_user.py` +- `tests/unit/test_manifest_agent_git_user.py` +- `tests/unit/test_manifest_egress.py` +- `tests/unit/test_manifest_runtime.py` + +Add new tests only where they lock down boundary behavior not already covered, +especially compatibility passthrough and entry-point parity. + +## Open Questions + +- Should the Claude Code passthrough allowlist intentionally track a documented + upstream schema, or should bot-bottle keep a narrow local allowlist and update + it only when users need a new shared-file field? +- Should the public facade continue exposing every helper that tests currently + import from `bot_bottle.manifest`, or should tests move to public behavior + only during this cleanup?