Research: manifest format + grouping options #16

Merged
didericis merged 1 commits from manifest-format-research into main 2026-05-24 21:31:46 -04:00
@@ -0,0 +1,378 @@
# Manifest format and grouping
Two open questions for claude-bottle's manifest layer after PRD 0011:
1. **Grouping.** Keep bottles and agents in the same manifest file
(today's shape), or split them — one file per bottle and one
file per agent.
2. **Format.** Stay on JSON, switch to YAML, or move to a Markdown
spec with YAML frontmatter. The Markdown option splits into two
sub-flavors: reuse Claude Code's existing subagent format with
bottle-specific extensions, or invent a claude-bottle-owned
Markdown spec used for both agents and bottles.
The trust boundary from PRD 0011 — bottle infrastructure lives in
`$HOME`, agents may live in `$CWD` — is largely orthogonal to both
axes. But the choice of grouping and format changes how naturally
that boundary expresses on disk, and how comfortable the manifest
will be once a user has 5+ bottles and 10+ agents.
## Why this matters
Current shape: one JSON file at `$HOME/claude-bottle.json` (and
optionally `$CWD/claude-bottle.json` for cwd-defined agents). After
PRD 0011, the home file owns bottles + home agents; the cwd file is
agents-only.
The single-file shape works fine for the project's first 1-2
bottles. Real friction starts when:
- A user has 5-10 bottles for different projects, each carrying
several `cred_proxy.routes` and a few `bottle.git` entries — the
home file becomes hundreds of lines of nested JSON.
- Multiple humans share a `$HOME` manifest pattern (dotfiles repo,
shared workstation, CI machine baseline) and want to compose
pieces — JSON doesn't merge cleanly outside of the resolver.
- Per-agent prompts grow long. JSON forces them onto a single
escaped line; multi-paragraph prompts become unreadable.
- Documentation (why does this bottle exist? what's the threat
model for these credentials?) has nowhere natural to live in a
JSON file; you end up with a sibling README that drifts from the
config.
JSON's strengths (stable parser, machine-readable, stdlib-only) are
real and shouldn't be thrown away lightly. The question is whether
the inflection point has been reached.
## Axis 1 — grouping
### Option A: one file for both (current)
`$HOME/claude-bottle.json` contains `bottles:` and `agents:`. Cwd
file (optional) contains `agents:` only.
**Pros**
- Zero new lifecycle. One file to discover, edit, version, diff.
- Trust boundary lives entirely in the resolver — the on-disk
shape doesn't enforce or surface it.
- Atomic edits: changing a bottle and the agents that reference it
is one commit, one save.
**Cons**
- Scales linearly with bottles + agents. A user with 8 bottles and
12 agents hits a ~600-line file even with terse formatting.
- Diff conflicts: two changes to unrelated agents touch the same
file. Codeowners-style ownership doesn't apply cleanly.
- Discovery harder beyond a point: searching for one agent
requires reading the whole file in a JSON parser, not
filename-globbing.
- The trust boundary is invisible on disk — a reader can't tell at
a glance which entries are home-trusted vs cwd-supplied; they
have to know the resolver's rules.
### Option B: file per thing
Bottles live as `$HOME/.claude-bottle/bottles/<name>.<ext>`. Agents
live as `$HOME/.claude-bottle/agents/<name>.<ext>` (home agents)
and `$CWD/.claude-bottle/agents/<name>.<ext>` (cwd agents). The
resolver globs each directory.
**Pros**
- Scales to N bottles + N agents without any single file growing.
- Trust boundary expresses on disk: `$HOME/.claude-bottle/bottles/`
is the only place bottles can come from. `$CWD/.claude-bottle/`
can only contribute agents. No resolver logic needed to enforce
it — the file paths are the enforcement.
- Aligns with Claude Code's existing model: each subagent already
lives as `~/.claude/agents/<name>.md`. Claude Code users will
recognize the directory shape.
- Per-file ownership / codeowners / diff workflows just work.
- Per-agent prompts grow without affecting other files.
- Documentation per bottle/agent can live in the file itself
(e.g. comments, Markdown body).
**Cons**
- More lifecycle: creating, renaming, deleting agents/bottles
becomes file ops (mkdir, mv, rm) instead of editing one file.
Power users prefer that; new users may not.
- Discovery requires `ls`, not "grep one file." Tooling helps
(e.g. `./cli.py list`) but the manifest is no longer a single
artifact to email or ship.
- Atomicity: swapping a bottle name across agents touches
multiple files. Git handles this fine; a one-shot text editor
flow loses something.
- Backwards compatibility: existing users have one JSON file.
Migration tool needed.
### Interaction with trust boundary
Option A keeps the resolver-enforced boundary from PRD 0011 as the
only enforcement.
Option B can express the boundary purely as filesystem layout:
`<home>/bottles/` is privileged; `<cwd>/` directory only has an
`agents/` subdirectory. The resolver becomes "glob the dirs, parse
each file, validate the cross-references." Strictly cleaner than
the current parse-and-reject logic, and more obvious to a reader
auditing the security posture.
## Axis 2 — format
### Option 1: stay on JSON
What we have today. The trust-boundary change in PRD 0011
preserves this format.
**Pros**
- Zero migration cost.
- Stdlib parser; no new dependency. The project's CLAUDE.md sets
"low dependencies by default" as a guideline.
- Stable, predictable parse semantics. No type-coercion gotchas.
- Tooling everywhere — IDE support, linters, jq.
**Cons**
- No native comments. (JSONC, JSON5, `_comment` fields are all
workarounds.)
- Multi-line strings become escaped one-liners. Agent prompts
longer than a sentence become unreadable.
- Trailing commas are an error. Hand-editing punishes small typos.
- Verbose: every key + value gets quotes; nested structures grow
indent.
### Option 2: full YAML
`$HOME/claude-bottle.yaml` (or `.yml`). Parser pulls in PyYAML (or
ruamel.yaml).
**Pros**
- Comments, multi-line strings (block scalars), anchors for repeated
blocks (e.g. shared egress allowlists across bottles).
- Common config language for ops tooling (Kubernetes,
GitHub/Gitea Actions, Docker Compose, pipelock's own config).
- Less syntactic noise than JSON for nested data.
**Cons**
- **New runtime dependency.** The project today uses zero
third-party Python packages for production code; YAML parsing
pulls in PyYAML. (CLAUDE.md: "bash-first, low-deps by default.")
- YAML's footguns: indentation sensitivity, the Norway problem
(`country: NO` → boolean False), implicit type coercion that's
surprised non-trivial production projects.
- Specifying schemas in YAML is harder to validate strictly —
parsers are forgiving where JSON is strict.
- No native escape hatch for executable content / templating, but
users will reach for one (Jinja, Helm-style) and then we're in
yaml-as-template-language territory.
### Option 3: reuse Claude Code's subagent spec (Markdown + YAML frontmatter), with claude-bottle extensions
Claude Code already stores subagents at `~/.claude/agents/<name>.md`
with YAML frontmatter and a Markdown body. Frontmatter today
carries fields like `name`, `description`, `model`, `color`,
`memory`; the body is the system prompt. Adding fields like
`bottle: dev` and a `claude_bottle:` sub-block to the same
frontmatter would make each claude-bottle agent a drop-in addition
to Claude Code's agent directory.
```markdown
---
name: implementer
description: Implements features against PRDs in this repo
model: opus
bottle: dev
claude_bottle:
skills: [init-prd]
---
You are a feature-implementation agent running inside an
ephemeral claude-bottle sandbox. The host has copied the user's
project into /home/node/workspace...
```
Bottles don't fit Claude Code's agent schema — they're
infrastructure, not behavior. Either:
- (3a) Bottles stay JSON / YAML; only agents adopt the
MD+frontmatter format. Mixed-format manifest.
- (3b) Bottles adopt MD+frontmatter too, using a claude-bottle-only
schema. Then we're really doing option 4 for bottles + option 3
for agents. Two formats but one parser.
**Pros**
- Existing Claude Code users already know this format and have a
directory full of these files. The mental model is "an agent is
a Markdown file."
- Each agent's prompt lives naturally as Markdown body — long
prompts read well, can use headings/lists/code blocks.
- File-per-thing falls out automatically (one MD per agent).
- Claude Code may eventually consume claude-bottle's agent files
directly, doubling their utility.
**Cons**
- **Coupling to Claude Code's spec.** Anthropic owns that schema;
field names and semantics can change. Today's `model` /
`description` / `memory` are stable, but tomorrow's may not be.
Our `bottle:` / `claude_bottle:` extensions could collide with
future official fields.
- The agent file's frontmatter starts to carry two unrelated
schemas: Claude Code's (model, description) and ours (bottle,
skills, ...). One file, two owners.
- Bottles still need a format choice (3a vs 3b above) — we don't
escape that decision.
- Parsing MD+frontmatter is more work than JSON. Either pull a
frontmatter library (python-frontmatter) or hand-parse the
`---` block and feed it to PyYAML. Either way, a new dep.
### Option 4: invent a claude-bottle MD spec, used for both agents and bottles
```markdown
---
# $HOME/.claude-bottle/agents/implementer.md
bottle: dev
skills: [init-prd]
---
You are a feature-implementation agent running inside an
ephemeral claude-bottle sandbox...
```
```markdown
---
# $HOME/.claude-bottle/bottles/dev.md
cred_proxy:
routes:
- path: /anthropic/
upstream: https://api.anthropic.com
auth_scheme: Bearer
token_ref: CLAUDE_BOTTLE_OAUTH_TOKEN
role: anthropic-base-url
egress:
allowlist: [example.com]
---
The dev bottle. Backs my work on personal projects: Anthropic
OAuth, the gitea instance at gitea.dideric.is, and an npm token
for publishing scoped packages.
```
**Pros**
- Single format, two directory layouts. One parser, one mental
model.
- Bottle files get an MD body that's a natural home for
documentation (why does this bottle exist? what tokens does it
hold? who owns the keys?).
- Not coupled to Claude Code's schema; we own the spec.
- Trust boundary on disk: `$HOME/.claude-bottle/bottles/` is the
only place bottles can come from; `$CWD/.claude-bottle/agents/`
is the only thing cwd contributes.
- Agent files in this spec are *almost* compatible with Claude
Code's subagent format. If we keep the `name` / `description`
conventions, the same files can drop into `~/.claude/agents/`
with no friction — best of both worlds without the formal
coupling.
**Cons**
- Invents a format. Users learn one more thing (small thing — MD
with frontmatter is widely understood).
- Bottle file bodies have no built-in use case beyond
documentation; users may leave them empty, which looks weird
("why is this file partly Markdown?").
- Still requires a YAML parser for the frontmatter, so the
dependency cost is the same as option 3.
## Synthesis
Combining axes:
| | JSON | YAML | MD reuse (3) | MD new (4) |
|--------------|--------------------------|--------------------------|--------------------------|--------------------------|
| **Grouped (A)** | today | yaml monolith | not natural — MD wants per-file | not natural |
| **Per-file (B)** | dir of JSON files | dir of YAML files | best fit | best fit |
Per-file × MD-with-frontmatter is the natural shape on both axes —
the format wants to live one-per-file (the "MD doc with metadata"
pattern doesn't lend itself to monoliths), and the file-per-thing
grouping fits how users iterate on agents (write a prompt, save,
launch).
Between option 3 (reuse CC spec) and option 4 (new spec): the
appealing middle ground is "claude-bottle agents follow the CC
subagent shape closely (name / description / model + bottle and
skills extensions) so they drop into `~/.claude/agents/` as a
side effect, while bottles use the same MD+frontmatter shape but
with claude-bottle's own schema and live in a dedicated directory."
This:
- gives agents both a claude-bottle launch story AND a Claude Code
invocation story from the same file;
- keeps bottles entirely under our schema (no Anthropic dependency
for the security-load-bearing config);
- uses one parser, one body-format, two directories.
Cost of moving:
- New runtime dep: a YAML parser (PyYAML or a hand-parse-the-
frontmatter shim). PyYAML is the safest choice if we accept the
dependency.
- Resolver rewrite: glob two directories, parse each file, validate
cross-references. Roughly the same complexity as today's JSON
merge once the boundary check is in place.
- Migration tool: a one-shot script that splits today's JSON into
per-file MD docs. Five minutes of work for the tool, five
minutes of work for the user.
- Docs: README's manifest section gets rewritten. Worth doing
alongside the move.
## Recommendation
Per-file MD with frontmatter (option B × option 4 with the option-3
agent compatibility). The format change clears the way for the
per-file grouping (which is the bigger UX win), and the per-file
shape is what makes the trust boundary self-documenting on disk.
The dependency cost (PyYAML) is the main thing that needs an
explicit yes from the user — claude-bottle today has zero
third-party Python deps for production code, and adopting one
crosses a clean architectural line. If "low deps" stays a hard
constraint, the alternative is to hand-parse the frontmatter block
and feed it to a minimal YAML subset parser (the keys
claude-bottle uses are all flat string/list/dict — no anchors, no
multi-line block scalars, no implicit type coercion).
If we don't want to commit to the move yet, the next-cheapest
option is keeping JSON but splitting into per-file (option B ×
option 1): `$HOME/.claude-bottle/bottles/<name>.json` +
`$HOME/.claude-bottle/agents/<name>.json`. Most of the scaling
wins; none of the body-prose or dependency story.
## Open questions
- **Does Claude Code object to extra frontmatter fields?** Test:
drop a file with `bottle:` in `~/.claude/agents/` and see if CC
warns / ignores / breaks. If it warns, we'd want a different
field name (e.g. `claude-bottle-bottle`) or a namespaced block.
- **Migration story.** Is the project willing to ship a one-shot
`./cli.py migrate-manifest` command that does the JSON → MD
conversion? Or do users just rewrite by hand from the new docs?
- **Bottle file body content.** If most bottle .md files have an
empty body, is the MD-with-frontmatter format still warranted?
An alternative is YAML for bottles only (no body, but with
comments) and MD+frontmatter for agents.
- **Dotfiles vs not.** `$HOME/.claude-bottle/` or
`$HOME/claude-bottle/`? The hidden dotfile shape matches dev
conventions (`.config/`, `.ssh/`); the visible shape signals
"this is a real thing you own."
- **PyYAML hard dep, or minimal subset parser?** Trade-off between
"honest about the dependency" and "stay stdlib-only."