Files
bot-bottle/docs/prds/0033-manifest-schema-boundaries.md
T
didericis-codex 51831bf9c0
test / unit (pull_request) Successful in 36s
test / integration (pull_request) Successful in 57s
test / unit (push) Successful in 39s
test / integration (push) Successful in 56s
complete(prd): mark PRD 0033 active
2026-06-02 07:32:29 +00:00

7.7 KiB

PRD 0033: Manifest Schema Boundaries

  • Status: Active
  • Author: didericis-codex
  • Created: 2026-06-02
  • Issue: #125

Summary

Split the manifest loader's schema validation, filesystem loading, extends: resolution, and compatibility passthrough policy into named internal boundaries without changing the public manifest format. The goal is to make bot_bottle/manifest.py cheaper to extend and review while preserving the strict validation behavior that keeps manifest mistakes visible.

Problem

bot_bottle/manifest.py has become a broad schema surface. It owns dataclass models, per-field validators, per-section unknown-key policy, Markdown frontmatter loading, two-pass bottle inheritance, merge semantics, and effective agent-to-bottle overlays in one file. The logic is deterministic and well covered, but the number of concerns makes schema changes expensive: reviewers have to re-derive loader behavior, parse-time validation, and post-parse composition rules together.

One specific coupling is especially easy to miss: agent Markdown files are allowed to double as Claude Code subagent files, so the manifest parser accepts and ignores Claude Code frontmatter fields such as name, description, model, color, and memory. That compatibility rule is encoded as a passthrough allowlist alongside bot-bottle's own agent schema. If Claude Code adds a frontmatter field and users start sharing files between ~/.claude/agents/ and .bot-bottle/agents/, bot-bottle raises ManifestError until the local passthrough policy is updated.

The current shape is workable, but it creates unnecessary risk for future manifest features. A new field can accidentally mix parsing, inheritance, and compatibility concerns in the same edit, or update one entry path (from_json_obj) without matching the Markdown path (from_md_dirs).

Goals / Success Criteria

  • Preserve the existing public manifest schema and runtime behavior.
  • Keep Manifest, Bottle, Agent, GitEntry, GitUser, AgentProvider, EgressRoute, EgressConfig, and PipelockRoutePolicy import-compatible from bot_bottle.manifest.
  • Move Markdown file discovery and frontmatter loading behind a small internal loader boundary with tests that show $HOME bottles, $HOME agents, $CWD agent overrides, and ignored $CWD bottles still behave as before.
  • Move bottle extends: resolution and merge rules behind a named internal resolver boundary with tests for inheritance, replacement, cycle detection, missing parents, and per-field git.user overlays.
  • Centralize top-level allowed-key policy for bottle and agent schemas so unknown-key errors remain strict and the allowed set is visible in one place per schema.
  • Make Claude Code passthrough fields a named compatibility policy with focused tests that distinguish accepted passthrough keys from bot-bottle schema keys and true typos.
  • Keep both entry points, Manifest.from_json_obj and Manifest.from_md_dirs, covered by tests for shared validation and shared inheritance behavior.

Non-goals

  • No manifest format changes.
  • No migration away from Markdown frontmatter or the stdlib-only YAML subset parser.
  • No dependency on Pydantic, PyYAML, JSON Schema, or another schema framework.
  • No relaxation of strict unknown-key validation for bot-bottle fields.
  • No provider-specific workspace, auth, launch, or egress changes.
  • No user-facing CLI behavior changes.

Scope

In scope:

  • Internal module organization for manifest loading and composition.
  • Validator helpers or schema-policy helpers that reduce duplicated unknown-key and type-checking logic.
  • Focused regression tests around the two existing load paths.
  • Documentation comments that clarify compatibility policy where it is encoded.

Out of scope:

  • Renaming public dataclass fields or changing their capitalization.
  • Reworking callers outside the manifest boundary except for import updates that are mechanically required by an internal split.
  • Adding new manifest fields.
  • Changing how bot-bottle.json legacy-file errors are reported.

Design

Keep bot_bottle.manifest as the public facade. Existing imports should continue to work from that module, even if implementation moves into internal modules such as:

  • bot_bottle/manifest_model.py for dataclasses and field-level parsing.
  • bot_bottle/manifest_loader.py for filesystem layout, Markdown frontmatter loading, stale legacy-file checks, and $CWD override rules.
  • bot_bottle/manifest_extends.py for raw-bottle inheritance, cycle checks, and merge semantics.
  • bot_bottle/manifest_schema.py for allowed-key sets, passthrough policy, and small validation helpers.

The exact filenames are not required. The required boundary is conceptual: raw input loading, schema validation, bottle inheritance, and effective agent-to-bottle overlays should be separable when reading and testing the code.

Manifest.from_json_obj should continue to accept a raw JSON-like dict and feed the same raw bottle resolver used by Markdown loading. Manifest.from_md_dirs should perform only filesystem discovery and Markdown parsing before passing the same raw sections into the same validator/composer path. That shared path prevents a future schema field from working in one entry point but not the other.

Claude Code passthrough fields should be represented as an explicit compatibility allowlist, named as such, and documented near the agent schema policy. The parser should still ignore those fields after validation. Tests should cover every passthrough field currently accepted and at least one unknown field that remains an error.

The extends: resolver should remain raw-dict based until after inheritance is resolved. Merge rules stay unchanged:

  • scalar fields use child value when present.
  • env merges by key with child values winning.
  • git.remotes merges by upstream host, with child entries replacing duplicate hosts and explicit empty maps clearing inherited remotes.
  • git.user overlays per field.
  • egress remains full-replace when declared by the child.
  • cycles, missing parents, and self-reference remain ManifestErrors.

Implementation Chunks

  1. Add focused characterization tests for agent allowed keys, Claude Code passthrough fields, and parity between from_json_obj and Markdown loading.
  2. Extract allowed-key and compatibility policy helpers while keeping bot_bottle.manifest as the import surface.
  3. Extract raw Markdown loading into a loader boundary and rerun existing PRD 0011 tests unchanged.
  4. Extract bottle inheritance and merge rules into a resolver boundary and rerun existing PRD 0025 tests unchanged.
  5. Trim bot_bottle.manifest to the public facade and model composition, leaving compatibility imports for existing callers.

Each chunk should be mergeable on its own and should keep the test suite green.

Testing Strategy

Run the existing manifest-focused unit tests after each chunk:

  • tests/unit/test_manifest_md_load.py
  • tests/unit/test_manifest_extends.py
  • tests/unit/test_manifest_git.py
  • tests/unit/test_manifest_git_user.py
  • tests/unit/test_manifest_agent_git_user.py
  • tests/unit/test_manifest_egress.py
  • tests/unit/test_manifest_runtime.py

Add new tests only where they lock down boundary behavior not already covered, especially compatibility passthrough and entry-point parity.

Open Questions

  • Should the Claude Code passthrough allowlist intentionally track a documented upstream schema, or should bot-bottle keep a narrow local allowlist and update it only when users need a new shared-file field?
  • Should the public facade continue exposing every helper that tests currently import from bot_bottle.manifest, or should tests move to public behavior only during this cleanup?