Files
bot-bottle/docs/research/manifest-format-and-grouping.md
didericis-codex 18e3b62b72
test / unit (pull_request) Successful in 28s
test / integration (pull_request) Successful in 40s
test / unit (push) Successful in 31s
test / integration (push) Successful in 44s
docs: rename CLAUDE.md to AGENTS.md and rebrand provider-agnostic
Delete CLAUDE.md in favor of AGENTS.md as the orientation doc, rebrand
the project from Codex-bottle to provider-agnostic bot-bottle, and
repoint every CLAUDE.md reference across PRDs, research notes, the
implementer agent example, and the yaml_subset comment.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-28 20:36:47 -04:00

379 lines
15 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Manifest format and grouping
Two open questions for bot-bottle's manifest layer after PRD 0011:
1. **Grouping.** Keep bottles and agents in the same manifest file
(today's shape), or split them — one file per bottle and one
file per agent.
2. **Format.** Stay on JSON, switch to YAML, or move to a Markdown
spec with YAML frontmatter. The Markdown option splits into two
sub-flavors: reuse Claude Code's existing subagent format with
bottle-specific extensions, or invent a bot-bottle-owned
Markdown spec used for both agents and bottles.
The trust boundary from PRD 0011 — bottle infrastructure lives in
`$HOME`, agents may live in `$CWD` — is largely orthogonal to both
axes. But the choice of grouping and format changes how naturally
that boundary expresses on disk, and how comfortable the manifest
will be once a user has 5+ bottles and 10+ agents.
## Why this matters
Current shape: one JSON file at `$HOME/bot-bottle.json` (and
optionally `$CWD/bot-bottle.json` for cwd-defined agents). After
PRD 0011, the home file owns bottles + home agents; the cwd file is
agents-only.
The single-file shape works fine for the project's first 1-2
bottles. Real friction starts when:
- A user has 5-10 bottles for different projects, each carrying
several `cred_proxy.routes` and a few `bottle.git` entries — the
home file becomes hundreds of lines of nested JSON.
- Multiple humans share a `$HOME` manifest pattern (dotfiles repo,
shared workstation, CI machine baseline) and want to compose
pieces — JSON doesn't merge cleanly outside of the resolver.
- Per-agent prompts grow long. JSON forces them onto a single
escaped line; multi-paragraph prompts become unreadable.
- Documentation (why does this bottle exist? what's the threat
model for these credentials?) has nowhere natural to live in a
JSON file; you end up with a sibling README that drifts from the
config.
JSON's strengths (stable parser, machine-readable, stdlib-only) are
real and shouldn't be thrown away lightly. The question is whether
the inflection point has been reached.
## Axis 1 — grouping
### Option A: one file for both (current)
`$HOME/bot-bottle.json` contains `bottles:` and `agents:`. Cwd
file (optional) contains `agents:` only.
**Pros**
- Zero new lifecycle. One file to discover, edit, version, diff.
- Trust boundary lives entirely in the resolver — the on-disk
shape doesn't enforce or surface it.
- Atomic edits: changing a bottle and the agents that reference it
is one commit, one save.
**Cons**
- Scales linearly with bottles + agents. A user with 8 bottles and
12 agents hits a ~600-line file even with terse formatting.
- Diff conflicts: two changes to unrelated agents touch the same
file. Codeowners-style ownership doesn't apply cleanly.
- Discovery harder beyond a point: searching for one agent
requires reading the whole file in a JSON parser, not
filename-globbing.
- The trust boundary is invisible on disk — a reader can't tell at
a glance which entries are home-trusted vs cwd-supplied; they
have to know the resolver's rules.
### Option B: file per thing
Bottles live as `$HOME/.bot-bottle/bottles/<name>.<ext>`. Agents
live as `$HOME/.bot-bottle/agents/<name>.<ext>` (home agents)
and `$CWD/.bot-bottle/agents/<name>.<ext>` (cwd agents). The
resolver globs each directory.
**Pros**
- Scales to N bottles + N agents without any single file growing.
- Trust boundary expresses on disk: `$HOME/.bot-bottle/bottles/`
is the only place bottles can come from. `$CWD/.bot-bottle/`
can only contribute agents. No resolver logic needed to enforce
it — the file paths are the enforcement.
- Aligns with Claude Code's existing model: each subagent already
lives as `~/.claude/agents/<name>.md`. Claude Code users will
recognize the directory shape.
- Per-file ownership / codeowners / diff workflows just work.
- Per-agent prompts grow without affecting other files.
- Documentation per bottle/agent can live in the file itself
(e.g. comments, Markdown body).
**Cons**
- More lifecycle: creating, renaming, deleting agents/bottles
becomes file ops (mkdir, mv, rm) instead of editing one file.
Power users prefer that; new users may not.
- Discovery requires `ls`, not "grep one file." Tooling helps
(e.g. `./cli.py list`) but the manifest is no longer a single
artifact to email or ship.
- Atomicity: swapping a bottle name across agents touches
multiple files. Git handles this fine; a one-shot text editor
flow loses something.
- Backwards compatibility: existing users have one JSON file.
Migration tool needed.
### Interaction with trust boundary
Option A keeps the resolver-enforced boundary from PRD 0011 as the
only enforcement.
Option B can express the boundary purely as filesystem layout:
`<home>/bottles/` is privileged; `<cwd>/` directory only has an
`agents/` subdirectory. The resolver becomes "glob the dirs, parse
each file, validate the cross-references." Strictly cleaner than
the current parse-and-reject logic, and more obvious to a reader
auditing the security posture.
## Axis 2 — format
### Option 1: stay on JSON
What we have today. The trust-boundary change in PRD 0011
preserves this format.
**Pros**
- Zero migration cost.
- Stdlib parser; no new dependency. The project's AGENTS.md sets
"low dependencies by default" as a guideline.
- Stable, predictable parse semantics. No type-coercion gotchas.
- Tooling everywhere — IDE support, linters, jq.
**Cons**
- No native comments. (JSONC, JSON5, `_comment` fields are all
workarounds.)
- Multi-line strings become escaped one-liners. Agent prompts
longer than a sentence become unreadable.
- Trailing commas are an error. Hand-editing punishes small typos.
- Verbose: every key + value gets quotes; nested structures grow
indent.
### Option 2: full YAML
`$HOME/bot-bottle.yaml` (or `.yml`). Parser pulls in PyYAML (or
ruamel.yaml).
**Pros**
- Comments, multi-line strings (block scalars), anchors for repeated
blocks (e.g. shared egress allowlists across bottles).
- Common config language for ops tooling (Kubernetes,
GitHub/Gitea Actions, Docker Compose, pipelock's own config).
- Less syntactic noise than JSON for nested data.
**Cons**
- **New runtime dependency.** The project today uses zero
third-party Python packages for production code; YAML parsing
pulls in PyYAML. (AGENTS.md: "Python, stdlib-first; low-deps by default.")
- YAML's footguns: indentation sensitivity, the Norway problem
(`country: NO` → boolean False), implicit type coercion that's
surprised non-trivial production projects.
- Specifying schemas in YAML is harder to validate strictly —
parsers are forgiving where JSON is strict.
- No native escape hatch for executable content / templating, but
users will reach for one (Jinja, Helm-style) and then we're in
yaml-as-template-language territory.
### Option 3: reuse Claude Code's subagent spec (Markdown + YAML frontmatter), with bot-bottle extensions
Claude Code already stores subagents at `~/.claude/agents/<name>.md`
with YAML frontmatter and a Markdown body. Frontmatter today
carries fields like `name`, `description`, `model`, `color`,
`memory`; the body is the system prompt. Adding fields like
`bottle: dev` and a `bot_bottle:` sub-block to the same
frontmatter would make each bot-bottle agent a drop-in addition
to Claude Code's agent directory.
```markdown
---
name: implementer
description: Implements features against PRDs in this repo
model: opus
bottle: dev
bot_bottle:
skills: [init-prd]
---
You are a feature-implementation agent running inside an
ephemeral bot-bottle sandbox. The host has copied the user's
project into /home/node/workspace...
```
Bottles don't fit Claude Code's agent schema — they're
infrastructure, not behavior. Either:
- (3a) Bottles stay JSON / YAML; only agents adopt the
MD+frontmatter format. Mixed-format manifest.
- (3b) Bottles adopt MD+frontmatter too, using a bot-bottle-only
schema. Then we're really doing option 4 for bottles + option 3
for agents. Two formats but one parser.
**Pros**
- Existing Claude Code users already know this format and have a
directory full of these files. The mental model is "an agent is
a Markdown file."
- Each agent's prompt lives naturally as Markdown body — long
prompts read well, can use headings/lists/code blocks.
- File-per-thing falls out automatically (one MD per agent).
- Claude Code may eventually consume bot-bottle's agent files
directly, doubling their utility.
**Cons**
- **Coupling to Claude Code's spec.** Anthropic owns that schema;
field names and semantics can change. Today's `model` /
`description` / `memory` are stable, but tomorrow's may not be.
Our `bottle:` / `bot_bottle:` extensions could collide with
future official fields.
- The agent file's frontmatter starts to carry two unrelated
schemas: Claude Code's (model, description) and ours (bottle,
skills, ...). One file, two owners.
- Bottles still need a format choice (3a vs 3b above) — we don't
escape that decision.
- Parsing MD+frontmatter is more work than JSON. Either pull a
frontmatter library (python-frontmatter) or hand-parse the
`---` block and feed it to PyYAML. Either way, a new dep.
### Option 4: invent a bot-bottle MD spec, used for both agents and bottles
```markdown
---
# $HOME/.bot-bottle/agents/implementer.md
bottle: dev
skills: [init-prd]
---
You are a feature-implementation agent running inside an
ephemeral bot-bottle sandbox...
```
```markdown
---
# $HOME/.bot-bottle/bottles/dev.md
cred_proxy:
routes:
- path: /anthropic/
upstream: https://api.anthropic.com
auth_scheme: Bearer
token_ref: BOT_BOTTLE_CLAUDE_OAUTH_TOKEN
role: anthropic-base-url
egress:
allowlist: [example.com]
---
The dev bottle. Backs my work on personal projects: Anthropic
OAuth, the gitea instance at gitea.dideric.is, and an npm token
for publishing scoped packages.
```
**Pros**
- Single format, two directory layouts. One parser, one mental
model.
- Bottle files get an MD body that's a natural home for
documentation (why does this bottle exist? what tokens does it
hold? who owns the keys?).
- Not coupled to Claude Code's schema; we own the spec.
- Trust boundary on disk: `$HOME/.bot-bottle/bottles/` is the
only place bottles can come from; `$CWD/.bot-bottle/agents/`
is the only thing cwd contributes.
- Agent files in this spec are *almost* compatible with Claude
Code's subagent format. If we keep the `name` / `description`
conventions, the same files can drop into `~/.claude/agents/`
with no friction — best of both worlds without the formal
coupling.
**Cons**
- Invents a format. Users learn one more thing (small thing — MD
with frontmatter is widely understood).
- Bottle file bodies have no built-in use case beyond
documentation; users may leave them empty, which looks weird
("why is this file partly Markdown?").
- Still requires a YAML parser for the frontmatter, so the
dependency cost is the same as option 3.
## Synthesis
Combining axes:
| | JSON | YAML | MD reuse (3) | MD new (4) |
|--------------|--------------------------|--------------------------|--------------------------|--------------------------|
| **Grouped (A)** | today | yaml monolith | not natural — MD wants per-file | not natural |
| **Per-file (B)** | dir of JSON files | dir of YAML files | best fit | best fit |
Per-file × MD-with-frontmatter is the natural shape on both axes —
the format wants to live one-per-file (the "MD doc with metadata"
pattern doesn't lend itself to monoliths), and the file-per-thing
grouping fits how users iterate on agents (write a prompt, save,
launch).
Between option 3 (reuse CC spec) and option 4 (new spec): the
appealing middle ground is "bot-bottle agents follow the CC
subagent shape closely (name / description / model + bottle and
skills extensions) so they drop into `~/.claude/agents/` as a
side effect, while bottles use the same MD+frontmatter shape but
with bot-bottle's own schema and live in a dedicated directory."
This:
- gives agents both a bot-bottle launch story AND a Claude Code
invocation story from the same file;
- keeps bottles entirely under our schema (no Anthropic dependency
for the security-load-bearing config);
- uses one parser, one body-format, two directories.
Cost of moving:
- New runtime dep: a YAML parser (PyYAML or a hand-parse-the-
frontmatter shim). PyYAML is the safest choice if we accept the
dependency.
- Resolver rewrite: glob two directories, parse each file, validate
cross-references. Roughly the same complexity as today's JSON
merge once the boundary check is in place.
- Migration tool: a one-shot script that splits today's JSON into
per-file MD docs. Five minutes of work for the tool, five
minutes of work for the user.
- Docs: README's manifest section gets rewritten. Worth doing
alongside the move.
## Recommendation
Per-file MD with frontmatter (option B × option 4 with the option-3
agent compatibility). The format change clears the way for the
per-file grouping (which is the bigger UX win), and the per-file
shape is what makes the trust boundary self-documenting on disk.
The dependency cost (PyYAML) is the main thing that needs an
explicit yes from the user — bot-bottle today has zero
third-party Python deps for production code, and adopting one
crosses a clean architectural line. If "low deps" stays a hard
constraint, the alternative is to hand-parse the frontmatter block
and feed it to a minimal YAML subset parser (the keys
bot-bottle uses are all flat string/list/dict — no anchors, no
multi-line block scalars, no implicit type coercion).
If we don't want to commit to the move yet, the next-cheapest
option is keeping JSON but splitting into per-file (option B ×
option 1): `$HOME/.bot-bottle/bottles/<name>.json` +
`$HOME/.bot-bottle/agents/<name>.json`. Most of the scaling
wins; none of the body-prose or dependency story.
## Open questions
- **Does Claude Code object to extra frontmatter fields?** Test:
drop a file with `bottle:` in `~/.claude/agents/` and see if CC
warns / ignores / breaks. If it warns, we'd want a different
field name (e.g. `bot-bottle-bottle`) or a namespaced block.
- **Migration story.** Is the project willing to ship a one-shot
`./cli.py migrate-manifest` command that does the JSON → MD
conversion? Or do users just rewrite by hand from the new docs?
- **Bottle file body content.** If most bottle .md files have an
empty body, is the MD-with-frontmatter format still warranted?
An alternative is YAML for bottles only (no body, but with
comments) and MD+frontmatter for agents.
- **Dotfiles vs not.** `$HOME/.bot-bottle/` or
`$HOME/bot-bottle/`? The hidden dotfile shape matches dev
conventions (`.config/`, `.ssh/`); the visible shape signals
"this is a real thing you own."
- **PyYAML hard dep, or minimal subset parser?** Trade-off between
"honest about the dependency" and "stay stdlib-only."