From 5c3c60cff48a4aa9d4ba49a0d6e094abc8967109 Mon Sep 17 00:00:00 2001 From: didericis Date: Mon, 25 May 2026 03:05:55 -0400 Subject: [PATCH] docs(prd-0012): explain why the MCP server is a sidecar, not in-container Captures the rationale for placing the MCP server outside the agent container. The bottle wall doesn't strictly require it (the operator TUI is the actual gate), but pattern consistency, audit metadata trust, connection lifecycle, future enforcement headroom, and pipelock cleanliness all argue for sidecar placement. Co-Authored-By: Claude Opus 4.7 --- docs/prds/0012-stuck-agent-recovery-flow.md | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/docs/prds/0012-stuck-agent-recovery-flow.md b/docs/prds/0012-stuck-agent-recovery-flow.md index 9162f63..2409efc 100644 --- a/docs/prds/0012-stuck-agent-recovery-flow.md +++ b/docs/prds/0012-stuck-agent-recovery-flow.md @@ -57,6 +57,20 @@ Three named categories, each with its own MCP tool. Ordered by remediation cost: The three tools are dispatched by name, so the operator's TUI knows which remediation engine to wire to which proposal. The agent must choose the right tool for what failed: a 403 from a credentialed request is a `cred-proxy-block`; a connection refused at the egress is a `pipelock-block`; a "command not found" or missing-skill error is a `capability-block`. +### Why the MCP server is a sidecar, not in-container + +The MCP server could in principle run inside the agent container. It doesn't, for reasons that are individually soft but together argue for the sidecar shape. + +The security boundary is the operator approving in the TUI plus the host-side config write, neither of which depends on where the MCP server runs. Even if a compromised agent forged a proposal end-to-end, a human still has to approve before cred-proxy, pipelock, or the Dockerfile change. So in-container isn't a hard violation of the bottle wall. + +It's still the wrong placement for five reasons: + +1. **Pattern consistency.** cred-proxy and pipelock are sidecars because boundary services don't live in the agent's process space. An in-container MCP server would be the first exception, and the next reader has to ask why this one is different. +2. **Audit metadata trust.** Sidecar-generated timestamps and bottle metadata are trusted; in-container metadata is whatever the agent wrote. Same outcome (the operator is still the gate) but weaker provenance. +3. **Connection lifecycle.** A sidecar holds the tool-call connection independently of the agent process — agent crash or restart doesn't orphan a pending operator response. +4. **Future enforcement headroom.** If the MCP server ever needs to *enforce* something (rate limits, dedup, schema-strict rejection), it has to be a trusted process. Building it in-container now means re-architecting later. +5. **Pipelock cleanliness.** Sidecar-on-internal-network is the same egress shape pipelock already permits. In-container would need a loopback exception in the allowlist. + ### New services / components - **MCP sidecar.** New per-bottle container on the bottle's internal network. Exposes the three tools (`cred-proxy-block`, `pipelock-block`, `capability-block`) to the agent. On a tool call: validates the proposed file syntactically (valid JSON for `routes.json`, parseable Dockerfile, etc.), persists the proposal to a host-mounted queue, and holds the tool-call connection open until the supervisor acts. On a response from the supervisor, returns `{status, notes}` to the agent. Whether this lives as its own container or as a mode of cred-proxy is an Open question; v1 plan is its own container to keep cred-proxy focused on credentials.