T

didericis-claude aab450f2f7 feat(dlp): fragmentation resistance, entropy detector, broadened known-value scan

- _alnum_projection(): strip non-alphanumeric chars for separator-injection detection
- scan_known_secrets() gains two extra passes per secret after exact-variant matching:
  alnum-projection exact match (catches hyphens/spaces between secret chars) and a
  sliding-window partial-match scan (catches chunked substrings ≥ PARTIAL_MATCH_MIN_LEN)
- scan_known_secrets() accepts sensitive_prefixes param (default ("EGRESS_TOKEN_",))
  so redact_tokens and call-sites can extend the scanned env-var prefix set
- scan_entropy() warn-only detector flagging windows with Shannon entropy ≥ 5.5 bits/char
- "entropy" added to OUTBOUND_DETECTOR_NAMES; scan_outbound opts it in only when
  explicitly listed in dlp.outbound_detectors (never part of the default "all" set)
- scan_outbound reads BOT_BOTTLE_SENSITIVE_PREFIXES from environ to extend
  scan_known_secrets beyond EGRESS_TOKEN_* without schema changes
- Binary bodies decoded via latin-1 fallback (bijective byte↔codepoint) instead
  of utf-8 errors=replace, preserving ASCII secret strings in binary payloads

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-06-25 02:31:21 +00:00

.codex/skills/quality-eval

chore(skills): add quality evaluation skill

2026-06-02 18:42:48 +00:00

.gitea/workflows

ci: enforce pylint threshold

2026-06-10 06:30:03 +00:00

.githooks

…

bot_bottle

feat(dlp): fragmentation resistance, entropy detector, broadened known-value scan

2026-06-25 02:31:21 +00:00

docs

docs: draft PRD prd-new for strengthen-outbound-exfil-detection

2026-06-25 02:22:20 +00:00

examples

docs: correct stale role field and claude provider auth example

2026-06-23 17:53:18 -04:00

scripts

refactor: move agent Dockerfiles into their contrib directories

2026-06-08 23:05:14 -04:00

tests

feat(dlp): fragmentation resistance, entropy detector, broadened known-value scan

2026-06-25 02:31:21 +00:00

.dockerignore

…

.gitignore

…

.pylintrc

fix: resolve pylint and pyright linting issues

2026-06-04 12:40:36 -04:00

AGENTS.md

fix(macos-container): make backend the macos default

2026-06-10 22:25:00 -04:00

cli.py

…

Dockerfile.sidecars

feat(egress): implement PRD 0053 — DLP addon with Gateway API matches

2026-06-05 19:53:23 +00:00

LICENSE

…

pyrightconfig.json

fix: resolve all remaining 179 test file type errors with type: ignore

2026-06-04 11:30:51 -04:00

README.md

Default agent-provider routes to the redact on-match policy

2026-06-24 20:40:36 -04:00

requirements-dev.txt

ci: add dev requirements file and update workflow

2026-06-03 23:07:59 -04:00

README.md

bot-bottle

Problem: Developer wants to run a coding agent without supervision, but they don't want a prompt injected or misbehaving agent wrecking their environment or exfiltrating sensitive data.

Solution: Ephemeral, per agent "bottles" the agent cannot modify that scan all traffic for data exfiltration and limit capabilities and egress to only what the agent needs.

Features

Per-bottle egress allowlist — TLS-bumped HTTP/HTTPS chokepoint with a per-manifest host allowlist; per-route path/method/header matches filtering; outbound DLP scanning for known tokens and secrets, inbound DLP scanning for prompt-injection attempts; DoH and arbitrary hosts blocked by default.
Per-route token-match policy — each egress route picks what happens when the outbound DLP catches a token via dlp.outbound_on_match: supervise (default) holds the request and surfaces it in ./cli.py supervise for approval (an approved value is remembered for the life of the proxy); redact scrubs the value and forwards; block is a hard 403. Cuts false-positive friction without weakening default-deny.
Tokens the agent never sees — host secrets live in a sidecar; the agent dials http://sidecar:9099/<path> and the proxy strips inbound Authorization and injects the real token before forwarding. printenv in the agent shows proxy URLs only.
Gitleaks-scanned push (git-gate) — bottle.git remotes route through a per-bottle git daemon that gitleaks-scans incoming refs pre-receive and forwards clean refs upstream over SSH. The agent never holds the upstream credential.
Manifest-scoped skills + secrets — each bottle declares its skills, env, git identity, remotes, and egress routes; unknown keys die at load.
Trust boundary at $HOME — bottles (credentials, egress, remotes) live only under ~/.bot-bottle/bottles/. Repos may ship agents but not bottles, so a cloned repo can't redirect an env var to an attacker host.
Composable bottles (extends:) — keep provider/runtime policy in one base bottle (e.g. claude.md) and overlay task bottles on top.
Parallel, isolated bottles — each bottle runs in its own backend-owned isolation boundary; bottles don't share state or talk to each other.
Provider templates (Claude, Codex) — Dockerfile.claude / Dockerfile.codex, or a bottle-supplied Dockerfile. Claude auth via long-lived OAuth token; Codex via opt-in host device-auth forwarding.
gVisor auto-detect — on Linux hosts where runsc is registered with Docker, every bottle launches under it for a userspace syscall barrier; no manifest config required.
Apple Container backend (macOS default when available) — runs the agent and sidecar bundle with Apple's container CLI, using a host-only agent network plus a separate sidecar egress network.
Smolmachines backend — runs the agent in a libkrun micro-VM while the sidecar bundle stays in Docker. TSI and smolmachines DNS filtering close the raw DNS exfiltration gap that exists in the legacy Docker backend.
Legacy Docker backend — still available for examples, CI, and hosts without Apple Container via BOT_BOTTLE_BACKEND=docker or --backend=docker.

Architecture

On the default macOS Apple Container backend, a bottle is an agent container on a host-only internal network plus a sidecar bundle attached to both that internal network and a NAT egress network. The agent gets HTTP(S)_PROXY and CA bundle env vars pointing at the sidecar's internal-network IP, so HTTP/HTTPS traffic flows through the sidecar instead of direct egress. bottle.git / git-gate is intentionally deferred on this backend until a safe Apple Container key-delivery path exists.

On the smolmachines backend, a bottle is an agent micro-VM plus a Docker sidecar bundle for egress, git-gate, and supervise. The VM reaches the sidecars through a per-bottle loopback alias allowed by TSI; smolmachines handles DNS filtering below the guest OS.

On the legacy Docker backend, the same logical bottle is two containers per agent: an agent container and a sidecars container. They share a per-agent Docker --internal network; the agent has no default route off-box.

The Docker topology looks like this:

                            host  ( ./cli.py )
                                  │
                          starts  │  stops
                                  ▼
   ┌─────────────────────────── bottle ──────────────────────────────────┐
   │                                                                     │
   │   ┌──────────────────┐                   ┌──────────────────────┐   │
   │   │ agent image      │   HTTP(S) proxy   │ egress image         │   │
   │   │ (claude-code,    │ ─────────────────►│ (mitmproxy; TLS bump │   │  HTTPS to
   │   │  codex, etc)     │                   │  DLP scan, path      │───┼──►  allowlisted
   │   │                  │                   │  matching, auth      │   │     hosts
   │   │ environ: proxy   │                   │  injection)          │   │
   │   │ URLs only, no    │                   └──────────────────────┘   │
   │   │ real tokens      │                                              │
   │   │                  │    git proxy     ┌────────────────┐          │  SSH push/fetch
   │   │                  │ ────────────────►│ git-gate image │──────────┼──►  to bottle.git
   │   │                  │                  │ (gitleaks +    │          │      upstreams
   │   └──────────────────┘                  │  git daemon)   │          │     (direct — not
   │                                         └────────────────┘          │      via egress)
   │                                                                     │
   │   agent on internal network (no default route); egress and          │
   │   git-gate straddle internal + egress networks.                     │
   │   egress is the single HTTP/HTTPS chokepoint — all agent HTTP/HTTPS │
   │   traffic flows through it. git-gate's SSH egress is direct         │
   │   because egress is HTTP-only.                                      │
   └─────────────────────────────────────────────────────────────────────┘

When the agent exits, cli.py tears down every sidecar and both networks; nothing about a bottle persists between runs.

Quickstart

On compatible macOS hosts, the default backend requires Apple's container CLI and does not require Docker. The smolmachines backend requires Docker on the host for the sidecar bundle plus smolvm. The legacy Docker backend requires Docker. Claude bottles also need a long-lived Claude Code OAuth token (claude setup-token) exported as BOT_BOTTLE_CLAUDE_OAUTH_TOKEN.

Use BOT_BOTTLE_BACKEND=docker ./cli.py start <agent> on hosts where Apple Container is not installed and Docker is the desired backend.

./cli.py start <agent>   # builds the image on first run, drops you into claude

Manifest

Bottles and agents are Markdown files with YAML frontmatter under ~/.bot-bottle/. The Markdown body is the system prompt. Bottles live in ~/.bot-bottle/bottles/; agents may also be shipped by a repo at <repo>/.bot-bottle/agents/<name>.md.

Bottle (~/.bot-bottle/bottles/gitea-dev.md):

---
extends: claude          # inherit the Claude provider boundary

env:
  GIT_AUTHOR_NAME: didericis

git:
  user:
    name: "Eric Bauerfeld"
    email: "eric+claude@dideric.is"
  remotes:
    gitea.dideric.is:
      Name: bot-bottle
      Upstream: ssh://git@gitea.dideric.is:30009/didericis/bot-bottle.git
      IdentityFile: /Users/didericis/.ssh/id_ed25519_gitea
      KnownHostKey: ssh-ed25519 AAAA...

egress:
  routes:
    - host: gitea.dideric.is
      auth:
        scheme: token        # Bearer | token
        token_ref: BOT_BOTTLE_GITEA_TOKEN
      matches:               # optional — restrict to specific paths/methods/headers
        - paths:
            - {type: prefix, value: /api/v1/}
          methods: [GET, POST, PATCH, DELETE]
      dlp:                   # optional — per-route detector overrides (default: all on)
        outbound_detectors: [token_patterns, known_secrets]
        inbound_detectors: false   # disable response scanning for this host
---

The `gitea-dev` bottle. Provider auth via the inherited Claude route;
gitea over SSH for push, token over HTTPS for the API.

Agent (~/.bot-bottle/agents/gitea-helper.md):

---
bottle: gitea-dev
skills:
  - init-prd
---

You help maintain Gitea-hosted projects.

Egress route fields:

Field	Required	Description
`host`	yes	Hostname to allowlist. One entry per host.
`role`	no	Reserved for future use. The key is recognised but any value is currently rejected at load. Provider auth routes (e.g. Claude's `api.anthropic.com`) are injected automatically from `agent_provider.auth_token`, not via `role`.
`auth.scheme`	when `auth` present	`Bearer` or `token`. Injected by the proxy; the agent never sees the value.
`auth.token_ref`	when `auth` present	Env-var name holding the secret on the host.
`matches`	no	Array of `{paths, methods, headers}` filters. A request must match at least one entry (if any are given) to be forwarded.
`matches[].paths`	no	Array of `{type, value}`. `type` is `prefix` (default), `exact`, or `regex`.
`matches[].methods`	no	Array of HTTP method strings, e.g. `[GET, POST]`.
`matches[].headers`	no	Array of `{name, value, type}`. `type` is `exact` (default) or `regex`.
`dlp`	no	Per-route DLP overrides. Omit to use defaults (all detectors on).
`dlp.outbound_detectors`	no	`false` disables outbound scanning; list restricts to named detectors (`token_patterns`, `known_secrets`).
`dlp.inbound_detectors`	no	`false` disables inbound scanning; list restricts to named detectors (`naive_injection_detection`).
`dlp.outbound_on_match`	no	What to do when an outbound token is detected: `supervise` (default for manifest routes — hold for operator approval), `redact` (scrub the value and forward), or `block` (hard 403). Agent-provider routes (e.g. `api.anthropic.com`) default to `redact`.
`git.fetch`	no	`true` permits smart HTTP clone/fetch (`git-upload-pack`) for this host. Push (`git-receive-pack`) remains blocked.

When an outbound DLP detector matches a token, the route's dlp.outbound_on_match policy decides what happens. Under the default supervise, the proxy queues an egress-token-allow proposal for the operator's ./cli.py supervise TUI and holds the request open until it is answered (or EGRESS_TOKEN_ALLOW_TIMEOUT_SECONDS, default 300s, elapses — after which it fails closed). The operator never sees the raw token, only the host, method, path, and a redacted snippet; approving adds the value to an in-memory safelist for the life of the egress proxy. Under redact, the matched value is scrubbed from the body, headers, and path and the request is forwarded (failing closed if a match lands somewhere unredactable, like the hostname). Under block it stays a hard 403. Structural blocks (CRLF injection) and not-in-allowlist host blocks are always hard 403s regardless of policy.

More examples in examples/. Full design lives under docs/prds/; the trust-boundary rationale is in docs/prds/0011-per-file-md-manifest.md.

Trademarks

bot-bottle is an independent project and is not affiliated with, endorsed by, or sponsored by Anthropic, PBC. "Claude" and "Claude Code" are trademarks of Anthropic, PBC; the project name uses "claude" descriptively to indicate that the tool runs Claude Code inside a sandbox.

License

Description

Lightweight, self-hosted sandbox for AI coding agents that protects against prompt-injected or misbehaving agents: all egress traffic is TLS-inspected and secret-scanned, and credentials are injected at the proxy so the agent never sees them. No third-party platform in the loop, no trust required.

Readme Apache-2.0 30 MiB