From 6315456a5910af1516eedd5208062cf764ebef77 Mon Sep 17 00:00:00 2001
From: codex <eric+codex@dideric.is>
Date: Tue, 2 Jun 2026 07:23:04 +0000
Subject: [PATCH] docs(prd): add manifest schema boundaries

---
 docs/prds/0033-manifest-schema-boundaries.md | 169 +++++++++++++++++++
 1 file changed, 169 insertions(+)
 create mode 100644 docs/prds/0033-manifest-schema-boundaries.md

diff --git a/docs/prds/0033-manifest-schema-boundaries.md b/docs/prds/0033-manifest-schema-boundaries.md
new file mode 100644
index 0000000..c9b7ab5
--- /dev/null
+++ b/docs/prds/0033-manifest-schema-boundaries.md
@@ -0,0 +1,169 @@
+# PRD 0033: Manifest Schema Boundaries
+
+- **Status:** Draft
+- **Author:** didericis-codex
+- **Created:** 2026-06-02
+- **Issue:** #117
+
+## Summary
+
+Split the manifest loader's schema validation, filesystem loading, `extends:`
+resolution, and compatibility passthrough policy into named internal boundaries
+without changing the public manifest format. The goal is to make
+`bot_bottle/manifest.py` cheaper to extend and review while preserving the
+strict validation behavior that keeps manifest mistakes visible.
+
+## Problem
+
+`bot_bottle/manifest.py` has become a broad schema surface. It owns dataclass
+models, per-field validators, per-section unknown-key policy, Markdown
+frontmatter loading, two-pass bottle inheritance, merge semantics, and
+effective agent-to-bottle overlays in one file. The logic is deterministic and
+well covered, but the number of concerns makes schema changes expensive:
+reviewers have to re-derive loader behavior, parse-time validation, and
+post-parse composition rules together.
+
+One specific coupling is especially easy to miss: agent Markdown files are
+allowed to double as Claude Code subagent files, so the manifest parser accepts
+and ignores Claude Code frontmatter fields such as `name`, `description`,
+`model`, `color`, and `memory`. That compatibility rule is encoded as a
+passthrough allowlist alongside bot-bottle's own agent schema. If Claude Code
+adds a frontmatter field and users start sharing files between
+`~/.claude/agents/` and `.bot-bottle/agents/`, bot-bottle raises
+`ManifestError` until the local passthrough policy is updated.
+
+The current shape is workable, but it creates unnecessary risk for future
+manifest features. A new field can accidentally mix parsing, inheritance, and
+compatibility concerns in the same edit, or update one entry path
+(`from_json_obj`) without matching the Markdown path (`from_md_dirs`).
+
+## Goals / Success Criteria
+
+- Preserve the existing public manifest schema and runtime behavior.
+- Keep `Manifest`, `Bottle`, `Agent`, `GitEntry`, `GitUser`, `AgentProvider`,
+  `EgressRoute`, `EgressConfig`, and `PipelockRoutePolicy` import-compatible
+  from `bot_bottle.manifest`.
+- Move Markdown file discovery and frontmatter loading behind a small internal
+  loader boundary with tests that show `$HOME` bottles, `$HOME` agents, `$CWD`
+  agent overrides, and ignored `$CWD` bottles still behave as before.
+- Move bottle `extends:` resolution and merge rules behind a named internal
+  resolver boundary with tests for inheritance, replacement, cycle detection,
+  missing parents, and per-field `git.user` overlays.
+- Centralize top-level allowed-key policy for bottle and agent schemas so
+  unknown-key errors remain strict and the allowed set is visible in one place
+  per schema.
+- Make Claude Code passthrough fields a named compatibility policy with focused
+  tests that distinguish accepted passthrough keys from bot-bottle schema keys
+  and true typos.
+- Keep both entry points, `Manifest.from_json_obj` and
+  `Manifest.from_md_dirs`, covered by tests for shared validation and shared
+  inheritance behavior.
+
+## Non-goals
+
+- No manifest format changes.
+- No migration away from Markdown frontmatter or the stdlib-only YAML subset
+  parser.
+- No dependency on Pydantic, PyYAML, JSON Schema, or another schema framework.
+- No relaxation of strict unknown-key validation for bot-bottle fields.
+- No provider-specific workspace, auth, launch, or egress changes.
+- No user-facing CLI behavior changes.
+
+## Scope
+
+In scope:
+
+- Internal module organization for manifest loading and composition.
+- Validator helpers or schema-policy helpers that reduce duplicated
+  unknown-key and type-checking logic.
+- Focused regression tests around the two existing load paths.
+- Documentation comments that clarify compatibility policy where it is encoded.
+
+Out of scope:
+
+- Renaming public dataclass fields or changing their capitalization.
+- Reworking callers outside the manifest boundary except for import updates
+  that are mechanically required by an internal split.
+- Adding new manifest fields.
+- Changing how `bot-bottle.json` legacy-file errors are reported.
+
+## Design
+
+Keep `bot_bottle.manifest` as the public facade. Existing imports should
+continue to work from that module, even if implementation moves into internal
+modules such as:
+
+- `bot_bottle/manifest_model.py` for dataclasses and field-level parsing.
+- `bot_bottle/manifest_loader.py` for filesystem layout, Markdown
+  frontmatter loading, stale legacy-file checks, and `$CWD` override rules.
+- `bot_bottle/manifest_extends.py` for raw-bottle inheritance, cycle checks,
+  and merge semantics.
+- `bot_bottle/manifest_schema.py` for allowed-key sets, passthrough policy,
+  and small validation helpers.
+
+The exact filenames are not required. The required boundary is conceptual:
+raw input loading, schema validation, bottle inheritance, and effective
+agent-to-bottle overlays should be separable when reading and testing the code.
+
+`Manifest.from_json_obj` should continue to accept a raw JSON-like dict and
+feed the same raw bottle resolver used by Markdown loading. `Manifest.from_md_dirs`
+should perform only filesystem discovery and Markdown parsing before passing
+the same raw sections into the same validator/composer path. That shared path
+prevents a future schema field from working in one entry point but not the
+other.
+
+Claude Code passthrough fields should be represented as an explicit
+compatibility allowlist, named as such, and documented near the agent schema
+policy. The parser should still ignore those fields after validation. Tests
+should cover every passthrough field currently accepted and at least one
+unknown field that remains an error.
+
+The `extends:` resolver should remain raw-dict based until after inheritance is
+resolved. Merge rules stay unchanged:
+
+- scalar fields use child value when present.
+- `env` merges by key with child values winning.
+- `git.remotes` merges by upstream host, with child entries replacing duplicate
+  hosts and explicit empty maps clearing inherited remotes.
+- `git.user` overlays per field.
+- `egress` remains full-replace when declared by the child.
+- cycles, missing parents, and self-reference remain `ManifestError`s.
+
+## Implementation Chunks
+
+1. Add focused characterization tests for agent allowed keys, Claude Code
+   passthrough fields, and parity between `from_json_obj` and Markdown loading.
+2. Extract allowed-key and compatibility policy helpers while keeping
+   `bot_bottle.manifest` as the import surface.
+3. Extract raw Markdown loading into a loader boundary and rerun existing
+   PRD 0011 tests unchanged.
+4. Extract bottle inheritance and merge rules into a resolver boundary and
+   rerun existing PRD 0025 tests unchanged.
+5. Trim `bot_bottle.manifest` to the public facade and model composition,
+   leaving compatibility imports for existing callers.
+
+Each chunk should be mergeable on its own and should keep the test suite green.
+
+## Testing Strategy
+
+Run the existing manifest-focused unit tests after each chunk:
+
+- `tests/unit/test_manifest_md_load.py`
+- `tests/unit/test_manifest_extends.py`
+- `tests/unit/test_manifest_git.py`
+- `tests/unit/test_manifest_git_user.py`
+- `tests/unit/test_manifest_agent_git_user.py`
+- `tests/unit/test_manifest_egress.py`
+- `tests/unit/test_manifest_runtime.py`
+
+Add new tests only where they lock down boundary behavior not already covered,
+especially compatibility passthrough and entry-point parity.
+
+## Open Questions
+
+- Should the Claude Code passthrough allowlist intentionally track a documented
+  upstream schema, or should bot-bottle keep a narrow local allowlist and update
+  it only when users need a new shared-file field?
+- Should the public facade continue exposing every helper that tests currently
+  import from `bot_bottle.manifest`, or should tests move to public behavior
+  only during this cleanup?