PRD: Claude forward_host_credentials #326

Open
didericis-claude wants to merge 8 commits from claude-forward-host-credentials into main
Collaborator

Closes #325.

PRD: docs/prds/prd-new-claude-forward-host-credentials.md

Summary

  • Adds forward_host_credentials: true support for agent_provider.template: claude, mirroring the existing Codex flow (PRD 0029).
  • New bot_bottle/contrib/claude/claude_auth.py reads the host Claude OAuth session key from ~/.claude.json (oauthAccount.sessionKey) and validates presence and expiry.
  • Session key goes only into provisioned_env (egress sidecar environment) — agent receives only CLAUDE_CODE_OAUTH_TOKEN=egress-placeholder.
  • Adds CLAUDE_HOST_CREDENTIAL_TOKEN_REF constant to egress.py and CLAUDE_HOST_CREDENTIAL_HOSTS to agent_provider.py.
  • Manifest validation updated: forward_host_credentials now accepted for claude and codex; setting both auth_token and forward_host_credentials on the same bottle is rejected.
  • 1666 unit tests pass.
Closes #325. PRD: [docs/prds/prd-new-claude-forward-host-credentials.md](https://gitea.dideric.is/didericis/bot-bottle/src/commit/f0d27863c25d78b57865f7544a1414ff922339f2/docs/prds/prd-new-claude-forward-host-credentials.md) ## Summary - Adds `forward_host_credentials: true` support for `agent_provider.template: claude`, mirroring the existing Codex flow (PRD 0029). - New `bot_bottle/contrib/claude/claude_auth.py` reads the host Claude OAuth session key from `~/.claude.json` (`oauthAccount.sessionKey`) and validates presence and expiry. - Session key goes only into `provisioned_env` (egress sidecar environment) — agent receives only `CLAUDE_CODE_OAUTH_TOKEN=egress-placeholder`. - Adds `CLAUDE_HOST_CREDENTIAL_TOKEN_REF` constant to `egress.py` and `CLAUDE_HOST_CREDENTIAL_HOSTS` to `agent_provider.py`. - Manifest validation updated: `forward_host_credentials` now accepted for `claude` and `codex`; setting both `auth_token` and `forward_host_credentials` on the same bottle is rejected. - 1666 unit tests pass.
didericis-claude added 6 commits 2026-07-01 17:15:29 -04:00
Moves the orchestrator into bot_bottle/orchestrator/ so one install gets
everything. Entry point is now `python -m bot_bottle.orchestrator run`.

- Add bot_bottle/orchestrator/ with all 14 modules (verbatim move; internal
  imports were already relative, so no changes inside orchestrator modules)
- Rewrite bootstrap.py: remove the lazy bot_bottle import guard, use direct
  relative imports from ..contrib.*
- Add bot_bottle/contrib/forge/base.py: ScopedForge (read-anywhere / write-scoped)
- Add bot_bottle/contrib/gitea/client.py: GiteaClient + GiteaForge (urllib.request only)
- Add bot_bottle/contrib/gitea/forge_state.py: ForgeState + SqliteForgeStateStore
- Add tests/unit/orchestrator/ (82 tests: 63 migrated + 19 new for contrib modules)

Closes #321
docs: mark fold-orchestrator PRD as Active
lint / lint (push) Successful in 2m4s
test / unit (pull_request) Successful in 56s
test / integration (pull_request) Successful in 22s
test / coverage (pull_request) Failing after 1m7s
df1f0e8f70
test: add coverage for orchestrator + gitea client (diff gate 77% → 98%)
lint / lint (push) Failing after 2m5s
test / unit (pull_request) Successful in 53s
test / integration (pull_request) Successful in 24s
test / coverage (pull_request) Successful in 1m12s
57290da1e8
Three new unit test modules:
- tests/unit/test_contrib_gitea_client.py — GiteaClient (urllib mocked)
  and GiteaForge delegation
- tests/unit/orchestrator/test_main.py — __main__ run/status commands
- tests/unit/orchestrator/test_bootstrap.py — _token, BotBottleStateStore,
  _to_forge_state/_to_record, make_forge, make_sidecar, build

Augments to existing suites:
- test_events: non-"created" comment action ignored
- test_lifecycle: _iso_now callable, untracked-issue comment ignored,
  untracked-PR closed ignored (covers _find_by_pr return-None path)
- test_runner: destroy command, _default_run via subprocess mock
- test_sidecar: _jsonable dataclass/list branches, OpLog.read on missing
  file, drain_done_events on corrupted file, socket _Handler invalid-JSON
  and empty-line paths, serve() with pre-existing socket path
- test_watchdog: _loop body covered by patching _TICK_SECS to 0.01s
- test_webhook: unknown GET path returns 404

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
fix: resolve pylint/pyright issues in new test files
lint / lint (push) Successful in 2m7s
test / unit (pull_request) Successful in 57s
test / integration (pull_request) Successful in 17s
test / coverage (pull_request) Successful in 1m4s
71699b3ecd
- test_contrib_gitea_client: remove unused Any import, fix _mock_response
  to use return_value instead of lambda (unknown lambda type), narrow
  HTTPError hdrs type, add type annotations to fake_urlopen helpers,
  suppress protected-access for _request tests
- test_bootstrap: annotate **kw as **kw: object, use dict literal,
  unpack server_address via index to avoid tuple type mismatch
- test_main: remove unused MagicMock import
- test_watchdog: guard store.get() result before accessing .status

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
feat(claude): add forward_host_credentials support
lint / lint (push) Successful in 2m19s
test / unit (pull_request) Successful in 1m2s
test / integration (pull_request) Successful in 22s
test / coverage (pull_request) Successful in 1m14s
f0d27863c2
Reads the host's Claude OAuth session key from ~/.claude.json at launch
and forwards it only to the egress sidecar (never to the agent), placing
a placeholder CLAUDE_CODE_OAUTH_TOKEN in the agent env so Claude Code
starts without seeing the real credential.

Mirrors the existing Codex forward_host_credentials flow (PRD 0029).
Adds claude_auth.py to extract and validate the sessionKey, a
CLAUDE_HOST_CREDENTIAL_TOKEN_REF constant in egress.py, and updates
manifest_agent.py to allow the flag for both 'codex' and 'claude'
templates. Also adds a mutual-exclusion check that rejects setting
both auth_token and forward_host_credentials together.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
didericis added 1 commit 2026-07-01 17:33:17 -04:00
fix(claude): read credentials from ~/.claude/.credentials.json
lint / lint (push) Successful in 2m10s
test / unit (pull_request) Successful in 53s
test / integration (pull_request) Successful in 16s
test / coverage (pull_request) Successful in 1m8s
bdd352570b
The actual OAuth token is in ~/.claude/.credentials.json under
claudeAiOauth.accessToken, not in ~/.claude.json.
~/.claude.json holds only UI state and profile metadata (oauthAccount
has no token fields). expiresAt in the credentials file is milliseconds,
not seconds.

Discovered after testing against Claude Code 2.1.198.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
didericis added 1 commit 2026-07-01 17:47:05 -04:00
fix(claude): fall back to macOS Keychain for credentials
lint / lint (push) Failing after 2m10s
test / unit (pull_request) Successful in 57s
test / integration (pull_request) Successful in 22s
test / coverage (pull_request) Successful in 1m6s
7278ee1157
On macOS, Claude Code stores credentials in the Keychain under
service "Claude Code-credentials" rather than in a file. When
~/.claude/.credentials.json is absent, shell out to:
  security find-generic-password -s "Claude Code-credentials" -w
and parse the result as the same JSON schema.

~/.claude.json holds only profile/UI metadata (oauthAccount has
no token fields). expiresAt in the credentials is milliseconds.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Author
Collaborator

--remote-control diagnosis (Agent 4, live)

Diagnosed why a bottle started with forward_host_credentials + --remote-control never reaches the Claude mobile/desktop app. Verified against the running bot-bottle-agent-4 (claude image) and its egress sidecar.

Root cause: forward_host_credentials forwards a bare access token, not the login session — an auth-mode mismatch, not a missing entitlement.

What checks out

  • startup_args forwarded — live cmdline of PID 1070: claude --dangerously-skip-permissions --remote-control --append-system-prompt-file …
  • Bearer injection works — egress swaps the real host token on api.anthropic.com; GET /api/claude_code/policy_limits → 200 OK proves the credential authenticates.
  • Bridge egress route present bridge.claudeusercontent.com is one of the 5 loaded routes in /etc/egress/routes.yaml (allowlisted, DLP off). Not the problem.

What's actually wrong

claude_auth.py::claude_host_access_token() extracts only claudeAiOauth.accessToken and discards refreshToken / expiresAt / scopes / subscriptionType / rateLimitTier. agent_provider.py (~181–199) then sets CLAUDE_CODE_OAUTH_TOKEN="egress-placeholder" in the guest, has egress swap the real bearer on api.anthropic.com only, and never writes ~/.claude/.credentials.json into the guest (confirmed absent).

So the guest runs in CLAUDE_CODE_OAUTH_TOKEN mode — the headless/CI claude setup-token path — a bare bearer with no session envelope. Live egress log:

GET https://api.anthropic.com/api/claude_code/settings       << 404 Not Found
GET https://api.anthropic.com/api/claude_code/policy_limits  << 200 OK

/api/claude_code/settings 404s for bare-token mode, so Claude aborts before opening the bridge WebSocket (0 server connect bridge.claudeusercontent.com, 0 live :443 connections). On the host, Claude reads the full claudeAiOauth object (session) → settings returns 200 → remote control works. Same credential, different mode.

Two structural reasons the current design can't do remote control

  1. Mode, not value — a bare bearer isn't a claudeAiOauth login session with scopes/subscription, which is what the settings endpoint gates on.
  2. Bridge auth isn't on api.anthropic.com — the remote-control WebSocket goes to bridge.claudeusercontent.com and carries its own session-derived auth. The egress bearer-swap (anthropic-only) can never authenticate the bridge, so the real credential has to live inside the guest.

Fix direction (with tradeoff)

Reconstitute the host login session in the guest: write the full claudeAiOauth object to guest ~/.claude/.credentials.json (or the guest keychain), and stop forcing CLAUDE_CODE_OAUTH_TOKEN (the env var likely overrides the session file).

Tradeoff: this reverses the deliberate "keep the token out of the guest, inject at egress" custody posture — the real access and refresh tokens would then sit on the guest filesystem. Remote control is fundamentally incompatible with egress-only custody, because the bridge WS auth originates in the guest.

## `--remote-control` diagnosis (Agent 4, live) Diagnosed why a bottle started with `forward_host_credentials` + `--remote-control` never reaches the Claude mobile/desktop app. Verified against the running `bot-bottle-agent-4` (claude image) and its egress sidecar. **Root cause: `forward_host_credentials` forwards a bare access token, not the login session — an auth-*mode* mismatch, not a missing entitlement.** ### What checks out - **startup_args forwarded ✅** — live cmdline of PID 1070: `claude --dangerously-skip-permissions --remote-control --append-system-prompt-file …` - **Bearer injection works ✅** — egress swaps the real host token on `api.anthropic.com`; `GET /api/claude_code/policy_limits → 200 OK` proves the credential authenticates. - **Bridge egress route present ✅** — `bridge.claudeusercontent.com` is one of the 5 loaded routes in `/etc/egress/routes.yaml` (allowlisted, DLP off). Not the problem. ### What's actually wrong `claude_auth.py::claude_host_access_token()` extracts **only** `claudeAiOauth.accessToken` and discards `refreshToken` / `expiresAt` / `scopes` / `subscriptionType` / `rateLimitTier`. `agent_provider.py` (~181–199) then sets `CLAUDE_CODE_OAUTH_TOKEN="egress-placeholder"` in the guest, has egress swap the real bearer on `api.anthropic.com` only, and **never writes `~/.claude/.credentials.json` into the guest** (confirmed absent). So the guest runs in `CLAUDE_CODE_OAUTH_TOKEN` mode — the headless/CI `claude setup-token` path — a bare bearer with no session envelope. Live egress log: ``` GET https://api.anthropic.com/api/claude_code/settings << 404 Not Found GET https://api.anthropic.com/api/claude_code/policy_limits << 200 OK ``` `/api/claude_code/settings` 404s for bare-token mode, so Claude aborts before opening the bridge WebSocket (0 `server connect bridge.claudeusercontent.com`, 0 live `:443` connections). On the host, Claude reads the full `claudeAiOauth` object (session) → settings returns 200 → remote control works. Same credential, different mode. ### Two structural reasons the current design can't do remote control 1. **Mode, not value** — a bare bearer isn't a `claudeAiOauth` login session with scopes/subscription, which is what the settings endpoint gates on. 2. **Bridge auth isn't on `api.anthropic.com`** — the remote-control WebSocket goes to `bridge.claudeusercontent.com` and carries its own session-derived auth. The egress bearer-swap (anthropic-only) can never authenticate the bridge, so the real credential has to live **inside the guest**. ### Fix direction (with tradeoff) Reconstitute the host login session in the guest: write the full `claudeAiOauth` object to guest `~/.claude/.credentials.json` (or the guest keychain), and stop forcing `CLAUDE_CODE_OAUTH_TOKEN` (the env var likely overrides the session file). **Tradeoff:** this reverses the deliberate "keep the token out of the guest, inject at egress" custody posture — the real access *and* refresh tokens would then sit on the guest filesystem. Remote control is fundamentally incompatible with egress-only custody, because the bridge WS auth originates in the guest.
Owner

@didericis-claude Different potential solution: could provide a --remote option when starting a bot-bottle that works generically across different agent TUIs and forwards the session to a mobile client.

This could be something very basic, like something that just renders piped ansi in a read only format, and then takes a text box that we then pipe in to the bottle. This could be another paid incentive to allow hacker types that like self hosting to more easily and securely act with TUIs (including for experimental models that may not have a great mobile experience yet) that run on their hardware. Thoughts?

@didericis-claude Different potential solution: could provide a `--remote` option when starting a bot-bottle that works generically across different agent TUIs and forwards the session to a mobile client. This could be something very basic, like something that just renders piped ansi in a read only format, and then takes a text box that we then pipe in to the bottle. This could be another paid incentive to allow hacker types that like self hosting to more easily and securely act with TUIs (including for experimental models that may not have a great mobile experience yet) that run on their hardware. Thoughts?
Author
Collaborator

I think this is the better direction, and the diagnosis above is actually the argument for it. Vendor-native remote-control forces two things we don't want: the real credential has to live in the guest, and the session has to route through the vendor's bridge (bridge.claudeusercontent.com for Claude, and whatever Codex/others use). A bot-bottle-native --remote sidesteps both — the agent session never leaves the box, the credential stays under our custody model, and it works for any TUI including experimental models with no mobile story. That's a cleaner custody story than the native path, not just a broader one.

A few design notes from thinking it through:

Input model is the one fork that matters. A read-only ANSI render + a line-buffered text box is enough to submit prompts, but TUIs like Claude/Codex need real key events for the interactive bits — approval prompts (y/n), arrow-key menus, Ctrl-C, ESC. If the mobile client can only push a line of text, you can drive the happy path but not approve a tool call or cancel a run, which is exactly what you'd want a phone for. I'd go straight to a full interactive terminal (xterm.js in a webview forwards keystrokes) — it's barely more work than the read-only version and avoids painting ourselves into a corner. Read-only can be a mode, not the ceiling.

Attach without disrupting the agent. Run the agent under a PTY broker from launch (tmux-style, or a small custom pty multiplexer) so the mirror can attach/detach and survive mobile reconnects without restarting the agent. This also gives us scrollback replay on connect for free.

Transport is the hard part, and it's the paid wedge. Bottles have no inbound path — everything is outbound through egress. So --remote needs an outbound-initiated reverse tunnel to a rendezvous the phone can reach. That relay is exactly the hosted (or self-hostable) control-plane piece: it's what the vendor bridge does for them, and owning it is the natural thing to charge for. It can likely ride the channel the supervise sidecar already opens rather than being a whole new plane.

Monetization: agreed, and it fits the existing framing. This is the control plane (phone-control), distinct from custody/vault. The self-hosting-hacker angle is real — "drive any TUI agent running on my own hardware, from my phone, securely, including models with no app" is a sharp pitch, and it's differentiated precisely because it's vendor-agnostic and custody-preserving.

Two things I'd want to pin down before building:

  1. Interactive terminal vs. structured control? Full xterm mirror is universal but dumb about semantics. Do we also want a structured layer (parse approval prompts → render native mobile buttons) for the common agents, with raw terminal as the fallback? That's more work but a much better phone UX.
  2. Relay trust boundary. The session stream can contain secrets on screen. Is the relay a dumb E2E-encrypted pipe (bottle↔phone, relay can't read), or does it terminate TLS and get to see/scan the stream? The custody story is a lot stronger if it's the former.

My lean: full interactive xterm mirror over a dumb E2E relay as the MVP, structured/native controls layered on later for the top agents.

I think this is the better direction, and the diagnosis above is actually the argument for it. Vendor-native remote-control forces two things we don't want: the real credential has to live in the guest, and the session has to route through the vendor's bridge (`bridge.claudeusercontent.com` for Claude, and whatever Codex/others use). A bot-bottle-native `--remote` sidesteps both — the agent session never leaves the box, the credential stays under our custody model, and it works for any TUI including experimental models with no mobile story. That's a cleaner custody story than the native path, not just a broader one. A few design notes from thinking it through: **Input model is the one fork that matters.** A read-only ANSI render + a line-buffered text box is enough to *submit prompts*, but TUIs like Claude/Codex need real key events for the interactive bits — approval prompts (y/n), arrow-key menus, Ctrl-C, ESC. If the mobile client can only push a line of text, you can drive the happy path but not approve a tool call or cancel a run, which is exactly what you'd want a phone for. I'd go straight to a full interactive terminal (xterm.js in a webview forwards keystrokes) — it's barely more work than the read-only version and avoids painting ourselves into a corner. Read-only can be a *mode*, not the ceiling. **Attach without disrupting the agent.** Run the agent under a PTY broker from launch (tmux-style, or a small custom pty multiplexer) so the mirror can attach/detach and survive mobile reconnects without restarting the agent. This also gives us scrollback replay on connect for free. **Transport is the hard part, and it's the paid wedge.** Bottles have no inbound path — everything is outbound through egress. So `--remote` needs an outbound-initiated reverse tunnel to a rendezvous the phone can reach. That relay is exactly the hosted (or self-hostable) control-plane piece: it's what the vendor bridge does for them, and owning it is the natural thing to charge for. It can likely ride the channel the `supervise` sidecar already opens rather than being a whole new plane. **Monetization: agreed, and it fits the existing framing.** This is the control plane (phone-control), distinct from custody/vault. The self-hosting-hacker angle is real — "drive any TUI agent running on my own hardware, from my phone, securely, including models with no app" is a sharp pitch, and it's differentiated precisely because it's vendor-agnostic and custody-preserving. Two things I'd want to pin down before building: 1. **Interactive terminal vs. structured control?** Full xterm mirror is universal but dumb about semantics. Do we also want a structured layer (parse approval prompts → render native mobile buttons) for the common agents, with raw terminal as the fallback? That's more work but a much better phone UX. 2. **Relay trust boundary.** The session stream can contain secrets on screen. Is the relay a dumb E2E-encrypted pipe (bottle↔phone, relay can't read), or does it terminate TLS and get to see/scan the stream? The custody story is a lot stronger if it's the former. My lean: full interactive xterm mirror over a dumb E2E relay as the MVP, structured/native controls layered on later for the top agents.
Owner

@didericis-claude Am thinking full xterm as a fallback, but a dumb read only interface as the primary UX.

I think we can for the most part get away with the following:

  1. A simple text input
  2. Arrow keys (above the text input box)

The interactive bits are I think almost always solvable via arrow keys. As a fallback can go into full terminal/interactive mode where keystrokes get piped to the terminal. Don't really want to maintain more structured controls that require parsing more of the agent input/output, as we end up competing with harness providers and any mobile solutions specific to that harness. Our selling point should be general applicability.

RE trust boundary, yes, should be e2e encrypted.

@didericis-claude Am thinking full xterm as a fallback, but a dumb read only interface as the primary UX. I think we can for the most part get away with the following: 1. A simple text input 2. Arrow keys (above the text input box) The interactive bits are I think almost always solvable via arrow keys. As a fallback can go into full terminal/interactive mode where keystrokes get piped to the terminal. Don't really want to maintain more structured controls that require parsing more of the agent input/output, as we end up competing with harness providers and any mobile solutions specific to that harness. Our selling point should be general applicability. RE trust boundary, yes, should be e2e encrypted.
Author
Collaborator

Agreed on dropping structured controls — that's the right call for the reason you gave. The moment we parse agent semantics we're on the harness vendor's turf, we break on their next release, and we've traded the one thing that's actually defensible (works with anything that speaks a PTY) for a UX race we'd lose. Generality is the moat; keep it dumb.

One architectural sharpening on "read-only primary, xterm fallback": I don't think those are two renderers — I think it's one renderer, two input layers, and that's simpler than it sounds.

The reason: agent TUIs (Claude, Codex) aren't line-oriented output — they're full-screen ANSI apps with cursor addressing, redraws, spinners, boxes, a status line that rewrites itself. A truly "dumb" text render that just appends the output stream will show escape-sequence garbage and redraw artifacts for these. To display them correctly at all you need a terminal emulator (xterm.js). So:

  • Render: always xterm.js. Not optional — it's what makes the output legible.
  • Input (primary): a curated affordance set — the text box plus a row of keys above it. Each button just emits the key bytes: up/down = \x1b[A/\x1b[B, enter = \r, and I'd add esc (\x1b), ctrl-c (\x03), and tab to your arrows — approvals and cancels need those, and that set covers virtually every Claude/Codex interactive prompt.
  • Input (fallback): a toggle that just exposes the full keyboard instead of the button set.

The nice part: primary and fallback send the same thing over the wire — key bytes on the E2E channel. "Full terminal mode" isn't a second code path, it's the same pipe with a different on-screen keyboard. One renderer, one wire protocol, one thing to maintain. Still 100% general, zero output parsing.

So I'd frame it as: one xterm mirror + a curated key bar (text + arrows + enter/esc/ctrl-c/tab), with a "raw keyboard" escape hatch. Read-only is then just "don't render the input bar" — a view permission, not a separate mode.

On E2E: 👍. That makes the relay a dumb encrypted pipe that never holds plaintext, which is the strong version of the custody story. The one real crypto design question it raises is pairing — how the phone and the bottle agree on the shared key without the relay ever seeing it. Cleanest is a pairing handshake at launch (QR or short code shown by the bottle/dashboard) that establishes the key out-of-band; the relay only ever routes ciphertext between paired endpoints. Worth deciding early since it shapes the transport.

Agreed on dropping structured controls — that's the right call for the reason you gave. The moment we parse agent semantics we're on the harness vendor's turf, we break on their next release, and we've traded the one thing that's actually defensible (works with *anything* that speaks a PTY) for a UX race we'd lose. Generality is the moat; keep it dumb. One architectural sharpening on "read-only primary, xterm fallback": I don't think those are two renderers — I think it's **one renderer, two input layers**, and that's simpler than it sounds. The reason: agent TUIs (Claude, Codex) aren't line-oriented output — they're full-screen ANSI apps with cursor addressing, redraws, spinners, boxes, a status line that rewrites itself. A truly "dumb" text render that just appends the output stream will show escape-sequence garbage and redraw artifacts for these. To display them correctly at all you need a terminal emulator (xterm.js). So: - **Render:** always xterm.js. Not optional — it's what makes the output legible. - **Input (primary):** a curated affordance set — the text box plus a row of keys above it. Each button just emits the key bytes: up/down = `\x1b[A`/`\x1b[B`, enter = `\r`, and I'd add **esc (`\x1b`), ctrl-c (`\x03`), and tab** to your arrows — approvals and cancels need those, and that set covers virtually every Claude/Codex interactive prompt. - **Input (fallback):** a toggle that just exposes the full keyboard instead of the button set. The nice part: primary and fallback send *the same thing over the wire* — key bytes on the E2E channel. "Full terminal mode" isn't a second code path, it's the same pipe with a different on-screen keyboard. One renderer, one wire protocol, one thing to maintain. Still 100% general, zero output parsing. So I'd frame it as: one xterm mirror + a curated key bar (text + arrows + enter/esc/ctrl-c/tab), with a "raw keyboard" escape hatch. Read-only is then just "don't render the input bar" — a view permission, not a separate mode. On E2E: 👍. That makes the relay a dumb encrypted pipe that never holds plaintext, which is the strong version of the custody story. The one real crypto design question it raises is **pairing** — how the phone and the bottle agree on the shared key without the relay ever seeing it. Cleanest is a pairing handshake at launch (QR or short code shown by the bottle/dashboard) that establishes the key out-of-band; the relay only ever routes ciphertext between paired endpoints. Worth deciding early since it shapes the transport.
Owner

The one real crypto design question it raises is pairing — how the phone and the bottle agree on the shared key without the relay ever seeing it. Cleanest is a pairing handshake at launch (QR or short code shown by the bottle/dashboard) that establishes the key out-of-band; the relay only ever routes ciphertext between paired endpoints. Worth deciding early since it shapes the transport.

Can't we just do public key exchange? Holepunch provides primitives for establishing e2e encrypted tunnels, which is one of the reasons I wanted to use their stack.

> The one real crypto design question it raises is pairing — how the phone and the bottle agree on the shared key without the relay ever seeing it. Cleanest is a pairing handshake at launch (QR or short code shown by the bottle/dashboard) that establishes the key out-of-band; the relay only ever routes ciphertext between paired endpoints. Worth deciding early since it shapes the transport. Can't we just do public key exchange? Holepunch provides primitives for establishing e2e encrypted tunnels, which is one of the reasons I wanted to use their stack.
Author
Collaborator

Yeah — and if we're on the Holepunch stack that's strictly better than a relay we operate, so let me walk back the "relay routes ciphertext" framing. With HyperDHT the transport already is the answer: peers are Ed25519 keypairs, connect(remotePublicKey) gives you a Noise-encrypted stream, and the DHT only coordinates the holepunch — when a direct P2P path can't be punched it falls back to routing through DHT nodes, but that path is still Noise E2E, so no intermediary ever sees plaintext. That satisfies the custody requirement by construction, and there's no server chokepoint for us to run.

The thing I'd flag is that public-key exchange doesn't remove the pairing step — it's the safe form of it. Two sub-problems remain, and they're both good news:

  1. Key distribution. The phone still has to learn the bottle's public key through a trusted channel. But now that's a public key, so a QR/short code showing it is fine to display — no secret ever hits a screen. Much better than the shared-secret version I described.
  2. Authorization, not just encryption. Noise gets you a confidential channel, but by default anyone who learns the bottle's pubkey can open one. The bottle needs to allowlist the specific device pubkey and reject the rest — HyperDHT exposes exactly this (the server-side firewall / accept-by-remote-key hook). So pairing = mutual pubkey pinning: bottle→phone via the code, phone→bottle into the allowlist.

And the nice part: Holepunch already ships a primitive for this — blind-pairing / invite codes, the same mechanism Keet uses to add a device to a room without a central server. So we probably don't hand-roll the handshake at all; the invite-code → key-exchange → allowlist flow is off the shelf. That's a good chunk of the reason to be on their stack, agreed.

One architectural consequence worth naming: pure P2P means there's no hosted pipe for us to meter, which kills the "charge for the relay" idea I floated earlier — but that's fine, arguably good. "We literally cannot see or route your session" is a stronger custody pitch than owning the wire, and it pushes the paid wedge back where it belongs: the control plane (device management, pairing UX, fleet/multi-bottle, the app itself), not the bytes.

Yeah — and if we're on the Holepunch stack that's strictly better than a relay we operate, so let me walk back the "relay routes ciphertext" framing. With HyperDHT the transport already *is* the answer: peers are Ed25519 keypairs, `connect(remotePublicKey)` gives you a Noise-encrypted stream, and the DHT only coordinates the holepunch — when a direct P2P path can't be punched it falls back to routing through DHT nodes, but that path is still Noise E2E, so no intermediary ever sees plaintext. That satisfies the custody requirement by construction, and there's no server chokepoint for us to run. The thing I'd flag is that public-key exchange doesn't *remove* the pairing step — it's the safe *form* of it. Two sub-problems remain, and they're both good news: 1. **Key distribution.** The phone still has to learn the bottle's public key through a trusted channel. But now that's a *public* key, so a QR/short code showing it is fine to display — no secret ever hits a screen. Much better than the shared-secret version I described. 2. **Authorization, not just encryption.** Noise gets you a confidential channel, but by default anyone who learns the bottle's pubkey can open one. The bottle needs to *allowlist* the specific device pubkey and reject the rest — HyperDHT exposes exactly this (the server-side firewall / accept-by-remote-key hook). So pairing = mutual pubkey pinning: bottle→phone via the code, phone→bottle into the allowlist. And the nice part: Holepunch already ships a primitive for this — `blind-pairing` / invite codes, the same mechanism Keet uses to add a device to a room without a central server. So we probably don't hand-roll the handshake at all; the invite-code → key-exchange → allowlist flow is off the shelf. That's a good chunk of the reason to be on their stack, agreed. One architectural consequence worth naming: pure P2P means there's no hosted pipe for us to meter, which kills the "charge for the relay" idea I floated earlier — but that's fine, arguably good. "We literally cannot see or route your session" is a stronger custody pitch than owning the wire, and it pushes the paid wedge back where it belongs: the control plane (device management, pairing UX, fleet/multi-bottle, the app itself), not the bytes.
Owner

Authorization, not just encryption. Noise gets you a confidential channel, but by default anyone who learns the bottle's pubkey can open one. The bottle needs to allowlist the specific device pubkey and reject the rest — HyperDHT exposes exactly this (the server-side firewall / accept-by-remote-key hook). So pairing = mutual pubkey pinning: bottle→phone via the code, phone→bottle into the allowlist.

RE authorization, we could gate that by the paid control plane. So firewall would just ask the trusted relay from the paid subscription service about whether a given public key were authorized. Would also do two things: 1) means the user doesn't need to be physically next to the host to pair/can launch remotely 2) provides audit trail for authorized pairs without revealing what gets passed through.

> Authorization, not just encryption. Noise gets you a confidential channel, but by default anyone who learns the bottle's pubkey can open one. The bottle needs to allowlist the specific device pubkey and reject the rest — HyperDHT exposes exactly this (the server-side firewall / accept-by-remote-key hook). So pairing = mutual pubkey pinning: bottle→phone via the code, phone→bottle into the allowlist. RE authorization, we could gate that by the paid control plane. So firewall would just ask the trusted relay from the paid subscription service about whether a given public key were authorized. Would also do two things: 1) means the user doesn't need to be physically next to the host to pair/can launch remotely 2) provides audit trail for authorized pairs without revealing what gets passed through.
Author
Collaborator

This is the right move, and it fixes the gap I left in my last comment. I said pure P2P leaves the control plane with no data-plane role — but you've found its actual role: it's the authorization authority, not the pipe. The session stays E2E P2P (control plane never sees a byte of it), while the control plane governs who is allowed to connect. That's the control-vs-custody split applied cleanly — control plane authorizes, custody stays local — and both of your benefits fall out of it: remote pairing (authorize a device from the dashboard without being at the host) and a metadata-only audit trail (who paired with what, when — never the contents).

One sharpening on the mechanism, because there's a failure mode to avoid: don't make the firewall a synchronous online oracle that phones home on every incoming connection. If it does, remote access dies whenever the control plane is unreachable, the sub lapses, or their service blips — you've coupled a P2P feature to a hard SaaS dependency.

Better: the control plane issues a signed, short-lived capability rather than answering per-connect. Pairing = the control plane signs a grant {device_pubkey, bottle_id, exp} with a key the bottle has pinned. The device presents that grant in the Noise handshake payload; the firewall verifies the signature offline against the pinned control-plane pubkey. That gets you all four properties at once:

  • Remote pairing — control plane issues the grant without the host present.
  • Audit — issuance is logged centrally (metadata only).
  • Offline resilience — the bottle verifies signatures with no live call; a control-plane blip doesn't lock you out.
  • Revocation — short TTL is the default lever; add an online revocation-check opportunistically when the control plane is reachable, so you fail-closed-with-lease rather than fail-brittle.

And keep the local allowlist as the base tier underneath it: the firewall consults local pinned keys first, then a valid signed grant. That way the free / fully-self-hosted path still works offline with no subscription (pair by pasting a pubkey), and the control-plane authority is a paid upgrade that adds remote pairing + fleet + audit — which lines up with the single-run-free / cross-run-and-fleet-paid split.

Net: control plane as the signing/audit authority over a data plane it can't read. That's the defensible version of the wedge — you're selling authorization and provenance, not bytes.

This is the right move, and it fixes the gap I left in my last comment. I said pure P2P leaves the control plane with no data-plane role — but you've found its actual role: it's the **authorization authority**, not the pipe. The session stays E2E P2P (control plane never sees a byte of it), while the control plane governs *who is allowed to connect*. That's the control-vs-custody split applied cleanly — control plane authorizes, custody stays local — and both of your benefits fall out of it: remote pairing (authorize a device from the dashboard without being at the host) and a metadata-only audit trail (who paired with what, when — never the contents). One sharpening on the *mechanism*, because there's a failure mode to avoid: don't make the firewall a synchronous online oracle that phones home on every incoming connection. If it does, remote access dies whenever the control plane is unreachable, the sub lapses, or their service blips — you've coupled a P2P feature to a hard SaaS dependency. Better: the control plane issues a **signed, short-lived capability** rather than answering per-connect. Pairing = the control plane signs a grant `{device_pubkey, bottle_id, exp}` with a key the bottle has pinned. The device presents that grant in the Noise handshake payload; the firewall verifies the signature **offline** against the pinned control-plane pubkey. That gets you all four properties at once: - **Remote pairing** — control plane issues the grant without the host present. - **Audit** — issuance is logged centrally (metadata only). - **Offline resilience** — the bottle verifies signatures with no live call; a control-plane blip doesn't lock you out. - **Revocation** — short TTL is the default lever; add an online revocation-check *opportunistically* when the control plane is reachable, so you fail-closed-with-lease rather than fail-brittle. And keep the **local allowlist as the base tier underneath it**: the firewall consults local pinned keys first, then a valid signed grant. That way the free / fully-self-hosted path still works offline with no subscription (pair by pasting a pubkey), and the control-plane authority is a paid *upgrade* that adds remote pairing + fleet + audit — which lines up with the single-run-free / cross-run-and-fleet-paid split. Net: control plane as the signing/audit authority over a data plane it can't read. That's the defensible version of the wedge — you're selling authorization and provenance, not bytes.
Some checks are pending
lint / lint (push) Failing after 2m10s
test / unit (pull_request) Successful in 57s
test / integration (pull_request) Successful in 22s
test / coverage (pull_request) Successful in 1m6s
This pull request can be merged automatically.
You are not authorized to merge this pull request.
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin claude-forward-host-credentials:claude-forward-host-credentials
git checkout claude-forward-host-credentials
Sign in to join this conversation.