bot-bottle/docs/research/manifest-format-and-grouping.md

# Manifest format and grouping

Two open questions for bot-bottle's manifest layer after PRD 0011:

1. **Grouping.** Keep bottles and agents in the same manifest file
   (today's shape), or split them — one file per bottle and one
   file per agent.
2. **Format.** Stay on JSON, switch to YAML, or move to a Markdown
   spec with YAML frontmatter. The Markdown option splits into two
   sub-flavors: reuse Claude Code's existing subagent format with
   bottle-specific extensions, or invent a bot-bottle-owned
   Markdown spec used for both agents and bottles.

The trust boundary from PRD 0011 — bottle infrastructure lives in
`$HOME`, agents may live in `$CWD` — is largely orthogonal to both
axes. But the choice of grouping and format changes how naturally
that boundary expresses on disk, and how comfortable the manifest
will be once a user has 5+ bottles and 10+ agents.

## Why this matters

Current shape: one JSON file at `$HOME/bot-bottle.json` (and
optionally `$CWD/bot-bottle.json` for cwd-defined agents). After
PRD 0011, the home file owns bottles + home agents; the cwd file is
agents-only.

The single-file shape works fine for the project's first 1-2
bottles. Real friction starts when:

- A user has 5-10 bottles for different projects, each carrying
  several `cred_proxy.routes` and a few `bottle.git` entries — the
  home file becomes hundreds of lines of nested JSON.
- Multiple humans share a `$HOME` manifest pattern (dotfiles repo,
  shared workstation, CI machine baseline) and want to compose
  pieces — JSON doesn't merge cleanly outside of the resolver.
- Per-agent prompts grow long. JSON forces them onto a single
  escaped line; multi-paragraph prompts become unreadable.
- Documentation (why does this bottle exist? what's the threat
  model for these credentials?) has nowhere natural to live in a
  JSON file; you end up with a sibling README that drifts from the
  config.

JSON's strengths (stable parser, machine-readable, stdlib-only) are
real and shouldn't be thrown away lightly. The question is whether
the inflection point has been reached.

## Axis 1 — grouping

### Option A: one file for both (current)

`$HOME/bot-bottle.json` contains `bottles:` and `agents:`. Cwd
file (optional) contains `agents:` only.

**Pros**

- Zero new lifecycle. One file to discover, edit, version, diff.
- Trust boundary lives entirely in the resolver — the on-disk
  shape doesn't enforce or surface it.
- Atomic edits: changing a bottle and the agents that reference it
  is one commit, one save.

**Cons**

- Scales linearly with bottles + agents. A user with 8 bottles and
  12 agents hits a ~600-line file even with terse formatting.
- Diff conflicts: two changes to unrelated agents touch the same
  file. Codeowners-style ownership doesn't apply cleanly.
- Discovery harder beyond a point: searching for one agent
  requires reading the whole file in a JSON parser, not
  filename-globbing.
- The trust boundary is invisible on disk — a reader can't tell at
  a glance which entries are home-trusted vs cwd-supplied; they
  have to know the resolver's rules.

### Option B: file per thing

Bottles live as `$HOME/.bot-bottle/bottles/<name>.<ext>`. Agents
live as `$HOME/.bot-bottle/agents/<name>.<ext>` (home agents)
and `$CWD/.bot-bottle/agents/<name>.<ext>` (cwd agents). The
resolver globs each directory.

**Pros**

- Scales to N bottles + N agents without any single file growing.
- Trust boundary expresses on disk: `$HOME/.bot-bottle/bottles/`
  is the only place bottles can come from. `$CWD/.bot-bottle/`
  can only contribute agents. No resolver logic needed to enforce
  it — the file paths are the enforcement.
- Aligns with Claude Code's existing model: each subagent already
  lives as `~/.claude/agents/<name>.md`. Claude Code users will
  recognize the directory shape.
- Per-file ownership / codeowners / diff workflows just work.
- Per-agent prompts grow without affecting other files.
- Documentation per bottle/agent can live in the file itself
  (e.g. comments, Markdown body).

**Cons**

- More lifecycle: creating, renaming, deleting agents/bottles
  becomes file ops (mkdir, mv, rm) instead of editing one file.
  Power users prefer that; new users may not.
- Discovery requires `ls`, not "grep one file." Tooling helps
  (e.g. `./cli.py list`) but the manifest is no longer a single
  artifact to email or ship.
- Atomicity: swapping a bottle name across agents touches
  multiple files. Git handles this fine; a one-shot text editor
  flow loses something.
- Backwards compatibility: existing users have one JSON file.
  Migration tool needed.

### Interaction with trust boundary

Option A keeps the resolver-enforced boundary from PRD 0011 as the
only enforcement.

Option B can express the boundary purely as filesystem layout:
`<home>/bottles/` is privileged; `<cwd>/` directory only has an
`agents/` subdirectory. The resolver becomes "glob the dirs, parse
each file, validate the cross-references." Strictly cleaner than
the current parse-and-reject logic, and more obvious to a reader
auditing the security posture.

## Axis 2 — format

### Option 1: stay on JSON

What we have today. The trust-boundary change in PRD 0011
preserves this format.

**Pros**

- Zero migration cost.
- Stdlib parser; no new dependency. The project's AGENTS.md sets
  "low dependencies by default" as a guideline.
- Stable, predictable parse semantics. No type-coercion gotchas.
- Tooling everywhere — IDE support, linters, jq.

**Cons**

- No native comments. (JSONC, JSON5, `_comment` fields are all
  workarounds.)
- Multi-line strings become escaped one-liners. Agent prompts
  longer than a sentence become unreadable.
- Trailing commas are an error. Hand-editing punishes small typos.
- Verbose: every key + value gets quotes; nested structures grow
  indent.

### Option 2: full YAML

`$HOME/bot-bottle.yaml` (or `.yml`). Parser pulls in PyYAML (or
ruamel.yaml).

**Pros**

- Comments, multi-line strings (block scalars), anchors for repeated
  blocks (e.g. shared egress allowlists across bottles).
- Common config language for ops tooling (Kubernetes,
  GitHub/Gitea Actions, Docker Compose, pipelock's own config).
- Less syntactic noise than JSON for nested data.

**Cons**

- **New runtime dependency.** The project today uses zero
  third-party Python packages for production code; YAML parsing
  pulls in PyYAML. (AGENTS.md: "Python, stdlib-first; low-deps by default.")
- YAML's footguns: indentation sensitivity, the Norway problem
  (`country: NO` → boolean False), implicit type coercion that's
  surprised non-trivial production projects.
- Specifying schemas in YAML is harder to validate strictly —
  parsers are forgiving where JSON is strict.
- No native escape hatch for executable content / templating, but
  users will reach for one (Jinja, Helm-style) and then we're in
  yaml-as-template-language territory.

### Option 3: reuse Claude Code's subagent spec (Markdown + YAML frontmatter), with bot-bottle extensions

Claude Code already stores subagents at `~/.claude/agents/<name>.md`
with YAML frontmatter and a Markdown body. Frontmatter today
carries fields like `name`, `description`, `model`, `color`,
`memory`; the body is the system prompt. Adding fields like
`bottle: dev` and a `bot_bottle:` sub-block to the same
frontmatter would make each bot-bottle agent a drop-in addition
to Claude Code's agent directory.

```markdown
---
name: implementer
description: Implements features against PRDs in this repo
model: opus
bottle: dev
bot_bottle:
  skills: [init-prd]
---

You are a feature-implementation agent running inside an
ephemeral bot-bottle sandbox. The host has copied the user's
project into /home/node/workspace...
```

Bottles don't fit Claude Code's agent schema — they're
infrastructure, not behavior. Either:

- (3a) Bottles stay JSON / YAML; only agents adopt the
  MD+frontmatter format. Mixed-format manifest.
- (3b) Bottles adopt MD+frontmatter too, using a bot-bottle-only
  schema. Then we're really doing option 4 for bottles + option 3
  for agents. Two formats but one parser.

**Pros**

- Existing Claude Code users already know this format and have a
  directory full of these files. The mental model is "an agent is
  a Markdown file."
- Each agent's prompt lives naturally as Markdown body — long
  prompts read well, can use headings/lists/code blocks.
- File-per-thing falls out automatically (one MD per agent).
- Claude Code may eventually consume bot-bottle's agent files
  directly, doubling their utility.

**Cons**

- **Coupling to Claude Code's spec.** Anthropic owns that schema;
  field names and semantics can change. Today's `model` /
  `description` / `memory` are stable, but tomorrow's may not be.
  Our `bottle:` / `bot_bottle:` extensions could collide with
  future official fields.
- The agent file's frontmatter starts to carry two unrelated
  schemas: Claude Code's (model, description) and ours (bottle,
  skills, ...). One file, two owners.
- Bottles still need a format choice (3a vs 3b above) — we don't
  escape that decision.
- Parsing MD+frontmatter is more work than JSON. Either pull a
  frontmatter library (python-frontmatter) or hand-parse the
  `---` block and feed it to PyYAML. Either way, a new dep.

### Option 4: invent a bot-bottle MD spec, used for both agents and bottles

```markdown
---
# $HOME/.bot-bottle/agents/implementer.md
bottle: dev
skills: [init-prd]
---

You are a feature-implementation agent running inside an
ephemeral bot-bottle sandbox...
```

```markdown
---
# $HOME/.bot-bottle/bottles/dev.md
cred_proxy:
  routes:
    - path: /anthropic/
      upstream: https://api.anthropic.com
      auth_scheme: Bearer
      token_ref: BOT_BOTTLE_CLAUDE_OAUTH_TOKEN
      role: anthropic-base-url
egress:
  allowlist: [example.com]
---

The dev bottle. Backs my work on personal projects: Anthropic
OAuth, the gitea instance at gitea.dideric.is, and an npm token
for publishing scoped packages.
```

**Pros**

- Single format, two directory layouts. One parser, one mental
  model.
- Bottle files get an MD body that's a natural home for
  documentation (why does this bottle exist? what tokens does it
  hold? who owns the keys?).
- Not coupled to Claude Code's schema; we own the spec.
- Trust boundary on disk: `$HOME/.bot-bottle/bottles/` is the
  only place bottles can come from; `$CWD/.bot-bottle/agents/`
  is the only thing cwd contributes.
- Agent files in this spec are *almost* compatible with Claude
  Code's subagent format. If we keep the `name` / `description`
  conventions, the same files can drop into `~/.claude/agents/`
  with no friction — best of both worlds without the formal
  coupling.

**Cons**

- Invents a format. Users learn one more thing (small thing — MD
  with frontmatter is widely understood).
- Bottle file bodies have no built-in use case beyond
  documentation; users may leave them empty, which looks weird
  ("why is this file partly Markdown?").
- Still requires a YAML parser for the frontmatter, so the
  dependency cost is the same as option 3.

## Synthesis

Combining axes:

|              | JSON                     | YAML                     | MD reuse (3)             | MD new (4)               |
|--------------|--------------------------|--------------------------|--------------------------|--------------------------|
| **Grouped (A)** | today                | yaml monolith            | not natural — MD wants per-file | not natural |
| **Per-file (B)** | dir of JSON files    | dir of YAML files        | best fit                 | best fit                 |

Per-file × MD-with-frontmatter is the natural shape on both axes —
the format wants to live one-per-file (the "MD doc with metadata"
pattern doesn't lend itself to monoliths), and the file-per-thing
grouping fits how users iterate on agents (write a prompt, save,
launch).

Between option 3 (reuse CC spec) and option 4 (new spec): the
appealing middle ground is "bot-bottle agents follow the CC
subagent shape closely (name / description / model + bottle and
skills extensions) so they drop into `~/.claude/agents/` as a
side effect, while bottles use the same MD+frontmatter shape but
with bot-bottle's own schema and live in a dedicated directory."
This:

- gives agents both a bot-bottle launch story AND a Claude Code
  invocation story from the same file;
- keeps bottles entirely under our schema (no Anthropic dependency
  for the security-load-bearing config);
- uses one parser, one body-format, two directories.

Cost of moving:

- New runtime dep: a YAML parser (PyYAML or a hand-parse-the-
  frontmatter shim). PyYAML is the safest choice if we accept the
  dependency.
- Resolver rewrite: glob two directories, parse each file, validate
  cross-references. Roughly the same complexity as today's JSON
  merge once the boundary check is in place.
- Migration tool: a one-shot script that splits today's JSON into
  per-file MD docs. Five minutes of work for the tool, five
  minutes of work for the user.
- Docs: README's manifest section gets rewritten. Worth doing
  alongside the move.

## Recommendation

Per-file MD with frontmatter (option B × option 4 with the option-3
agent compatibility). The format change clears the way for the
per-file grouping (which is the bigger UX win), and the per-file
shape is what makes the trust boundary self-documenting on disk.

The dependency cost (PyYAML) is the main thing that needs an
explicit yes from the user — bot-bottle today has zero
third-party Python deps for production code, and adopting one
crosses a clean architectural line. If "low deps" stays a hard
constraint, the alternative is to hand-parse the frontmatter block
and feed it to a minimal YAML subset parser (the keys
bot-bottle uses are all flat string/list/dict — no anchors, no
multi-line block scalars, no implicit type coercion).

If we don't want to commit to the move yet, the next-cheapest
option is keeping JSON but splitting into per-file (option B ×
option 1): `$HOME/.bot-bottle/bottles/<name>.json` +
`$HOME/.bot-bottle/agents/<name>.json`. Most of the scaling
wins; none of the body-prose or dependency story.

## Open questions

- **Does Claude Code object to extra frontmatter fields?** Test:
  drop a file with `bottle:` in `~/.claude/agents/` and see if CC
  warns / ignores / breaks. If it warns, we'd want a different
  field name (e.g. `bot-bottle-bottle`) or a namespaced block.
- **Migration story.** Is the project willing to ship a one-shot
  `./cli.py migrate-manifest` command that does the JSON → MD
  conversion? Or do users just rewrite by hand from the new docs?
- **Bottle file body content.** If most bottle .md files have an
  empty body, is the MD-with-frontmatter format still warranted?
  An alternative is YAML for bottles only (no body, but with
  comments) and MD+frontmatter for agents.
- **Dotfiles vs not.** `$HOME/.bot-bottle/` or
  `$HOME/bot-bottle/`? The hidden dotfile shape matches dev
  conventions (`.config/`, `.ssh/`); the visible shape signals
  "this is a real thing you own."
- **PyYAML hard dep, or minimal subset parser?** Trade-off between
  "honest about the dependency" and "stay stdlib-only."