docs(prd): add manifest schema boundaries
test / unit (pull_request) Successful in 48s
test / integration (pull_request) Successful in 1m4s

This commit is contained in:
2026-06-02 07:23:04 +00:00
parent a81f0ffa49
commit 6315456a59
@@ -0,0 +1,169 @@
# PRD 0033: Manifest Schema Boundaries
- **Status:** Draft
- **Author:** didericis-codex
- **Created:** 2026-06-02
- **Issue:** #117
## Summary
Split the manifest loader's schema validation, filesystem loading, `extends:`
resolution, and compatibility passthrough policy into named internal boundaries
without changing the public manifest format. The goal is to make
`bot_bottle/manifest.py` cheaper to extend and review while preserving the
strict validation behavior that keeps manifest mistakes visible.
## Problem
`bot_bottle/manifest.py` has become a broad schema surface. It owns dataclass
models, per-field validators, per-section unknown-key policy, Markdown
frontmatter loading, two-pass bottle inheritance, merge semantics, and
effective agent-to-bottle overlays in one file. The logic is deterministic and
well covered, but the number of concerns makes schema changes expensive:
reviewers have to re-derive loader behavior, parse-time validation, and
post-parse composition rules together.
One specific coupling is especially easy to miss: agent Markdown files are
allowed to double as Claude Code subagent files, so the manifest parser accepts
and ignores Claude Code frontmatter fields such as `name`, `description`,
`model`, `color`, and `memory`. That compatibility rule is encoded as a
passthrough allowlist alongside bot-bottle's own agent schema. If Claude Code
adds a frontmatter field and users start sharing files between
`~/.claude/agents/` and `.bot-bottle/agents/`, bot-bottle raises
`ManifestError` until the local passthrough policy is updated.
The current shape is workable, but it creates unnecessary risk for future
manifest features. A new field can accidentally mix parsing, inheritance, and
compatibility concerns in the same edit, or update one entry path
(`from_json_obj`) without matching the Markdown path (`from_md_dirs`).
## Goals / Success Criteria
- Preserve the existing public manifest schema and runtime behavior.
- Keep `Manifest`, `Bottle`, `Agent`, `GitEntry`, `GitUser`, `AgentProvider`,
`EgressRoute`, `EgressConfig`, and `PipelockRoutePolicy` import-compatible
from `bot_bottle.manifest`.
- Move Markdown file discovery and frontmatter loading behind a small internal
loader boundary with tests that show `$HOME` bottles, `$HOME` agents, `$CWD`
agent overrides, and ignored `$CWD` bottles still behave as before.
- Move bottle `extends:` resolution and merge rules behind a named internal
resolver boundary with tests for inheritance, replacement, cycle detection,
missing parents, and per-field `git.user` overlays.
- Centralize top-level allowed-key policy for bottle and agent schemas so
unknown-key errors remain strict and the allowed set is visible in one place
per schema.
- Make Claude Code passthrough fields a named compatibility policy with focused
tests that distinguish accepted passthrough keys from bot-bottle schema keys
and true typos.
- Keep both entry points, `Manifest.from_json_obj` and
`Manifest.from_md_dirs`, covered by tests for shared validation and shared
inheritance behavior.
## Non-goals
- No manifest format changes.
- No migration away from Markdown frontmatter or the stdlib-only YAML subset
parser.
- No dependency on Pydantic, PyYAML, JSON Schema, or another schema framework.
- No relaxation of strict unknown-key validation for bot-bottle fields.
- No provider-specific workspace, auth, launch, or egress changes.
- No user-facing CLI behavior changes.
## Scope
In scope:
- Internal module organization for manifest loading and composition.
- Validator helpers or schema-policy helpers that reduce duplicated
unknown-key and type-checking logic.
- Focused regression tests around the two existing load paths.
- Documentation comments that clarify compatibility policy where it is encoded.
Out of scope:
- Renaming public dataclass fields or changing their capitalization.
- Reworking callers outside the manifest boundary except for import updates
that are mechanically required by an internal split.
- Adding new manifest fields.
- Changing how `bot-bottle.json` legacy-file errors are reported.
## Design
Keep `bot_bottle.manifest` as the public facade. Existing imports should
continue to work from that module, even if implementation moves into internal
modules such as:
- `bot_bottle/manifest_model.py` for dataclasses and field-level parsing.
- `bot_bottle/manifest_loader.py` for filesystem layout, Markdown
frontmatter loading, stale legacy-file checks, and `$CWD` override rules.
- `bot_bottle/manifest_extends.py` for raw-bottle inheritance, cycle checks,
and merge semantics.
- `bot_bottle/manifest_schema.py` for allowed-key sets, passthrough policy,
and small validation helpers.
The exact filenames are not required. The required boundary is conceptual:
raw input loading, schema validation, bottle inheritance, and effective
agent-to-bottle overlays should be separable when reading and testing the code.
`Manifest.from_json_obj` should continue to accept a raw JSON-like dict and
feed the same raw bottle resolver used by Markdown loading. `Manifest.from_md_dirs`
should perform only filesystem discovery and Markdown parsing before passing
the same raw sections into the same validator/composer path. That shared path
prevents a future schema field from working in one entry point but not the
other.
Claude Code passthrough fields should be represented as an explicit
compatibility allowlist, named as such, and documented near the agent schema
policy. The parser should still ignore those fields after validation. Tests
should cover every passthrough field currently accepted and at least one
unknown field that remains an error.
The `extends:` resolver should remain raw-dict based until after inheritance is
resolved. Merge rules stay unchanged:
- scalar fields use child value when present.
- `env` merges by key with child values winning.
- `git.remotes` merges by upstream host, with child entries replacing duplicate
hosts and explicit empty maps clearing inherited remotes.
- `git.user` overlays per field.
- `egress` remains full-replace when declared by the child.
- cycles, missing parents, and self-reference remain `ManifestError`s.
## Implementation Chunks
1. Add focused characterization tests for agent allowed keys, Claude Code
passthrough fields, and parity between `from_json_obj` and Markdown loading.
2. Extract allowed-key and compatibility policy helpers while keeping
`bot_bottle.manifest` as the import surface.
3. Extract raw Markdown loading into a loader boundary and rerun existing
PRD 0011 tests unchanged.
4. Extract bottle inheritance and merge rules into a resolver boundary and
rerun existing PRD 0025 tests unchanged.
5. Trim `bot_bottle.manifest` to the public facade and model composition,
leaving compatibility imports for existing callers.
Each chunk should be mergeable on its own and should keep the test suite green.
## Testing Strategy
Run the existing manifest-focused unit tests after each chunk:
- `tests/unit/test_manifest_md_load.py`
- `tests/unit/test_manifest_extends.py`
- `tests/unit/test_manifest_git.py`
- `tests/unit/test_manifest_git_user.py`
- `tests/unit/test_manifest_agent_git_user.py`
- `tests/unit/test_manifest_egress.py`
- `tests/unit/test_manifest_runtime.py`
Add new tests only where they lock down boundary behavior not already covered,
especially compatibility passthrough and entry-point parity.
## Open Questions
- Should the Claude Code passthrough allowlist intentionally track a documented
upstream schema, or should bot-bottle keep a narrow local allowlist and update
it only when users need a new shared-file field?
- Should the public facade continue exposing every helper that tests currently
import from `bot_bottle.manifest`, or should tests move to public behavior
only during this cleanup?