Fourth and final step of PRD 0005. Two new end-to-end tests that
exercise the full chain agent -> mitmproxy(bump) -> addon ->
pipelock -> upstream and pin the two paths the addon implements.
- test_mitmproxy_blocks_secret_https_post: HTTPS variant of the
existing test_pipelock_blocks_secret_post. Posts a credential
pattern in the body over HTTPS through the bottle. mitmproxy
bumps the CONNECT (the agent trusts the per-bottle ephemeral CA
installed by provision_ca), the addon forwards the decrypted
request to pipelock, pipelock returns 403 with the known
`blocked: ...` body shape, and the addon short-circuits the
flow with status=403 + X-Pipelock-Bridge: block. The two-axis
assertion (status + header) proves the addon-mediated path is
what produced the block, not some other layer.
- test_mitmproxy_allows_normal_https: hits raw.githubusercontent.com
(a baked-in allowlist host) over HTTPS through the bottle.
Verifies the addon's allow path: mitmproxy bumps, addon
forwards to pipelock for the scan, pipelock allows, mitmproxy
proceeds to the real upstream, response comes back through. The
absence of X-Pipelock-Bridge on the response is the signal that
the addon didn't short-circuit. Body length sanity-checks that
the response is real upstream content, not a synthesized stub.
Both probes are stdlib-only Node (http.request CONNECT + tls.connect
on the tunneled socket) — pulling in undici as a dep would be the
clean way to do HTTPS-through-proxy but is out of scope.
The earlier integration tests still pass with mitmproxy in path:
their assertions hold under the new topology, though their semantic
coverage shifts (e.g. test_pipelock_allow_node now exercises
mitmproxy's CONNECT-200 path rather than pipelock's host allowlist
on CONNECT). Updating those tests is a follow-up.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Third step of PRD 0005. The preflight now surfaces the TLS-
intercept layer so the operator sees it before agreeing to launch.
- Text output: one new line under the egress summary —
"tls intercept : mitmproxy (per-bottle ephemeral CA, generated
at launch)".
- JSON output (--format=json contract): new
egress.mitm: { enabled: true, ca_fingerprint: null } block.
Fingerprint is always null at dry-run because the CA only
exists after the sidecar starts; real launches print it as a
stderr log line from provision_ca.
- Pin the new shape in the dry-run integration test.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Second step of PRD 0005. The mitmproxy sidecar from the previous
commit now actually runs alongside pipelock when a bottle launches.
- BottleBackend gains a non-abstract provision_ca with a default
no-op so non-Docker backends aren't forced to implement TLS
interception. provision() orchestrates ca → prompt → skills → ssh
→ git; CA goes first so trust is set up before anything else runs
inside the agent.
- DockerBottlePlan gains `mitmproxy_plan: MitmproxyProxyPlan`. The
prepare step builds it alongside the existing pipelock plan; no
new manifest schema or host-side scratch files.
- DockerBottleBackend grows self._mitm, threads it through prepare
and launch. Mirror of the existing self._proxy pattern.
- launch.py brings the mitmproxy sidecar up between pipelock and
the agent container, passing pipelock's service-name URL via
env. ExitStack callback handles teardown in reverse order.
- The agent's HTTPS_PROXY / HTTP_PROXY now point at mitmproxy (not
pipelock directly). Three new -e flags inject the CA trust trio
(NODE_EXTRA_CA_CERTS / SSL_CERT_FILE / REQUESTS_CA_BUNDLE) at
docker run time; Docker propagates those into docker exec so the
claude process sees them without per-exec threading.
- New provisioner backend/docker/provision/ca.py extracts the CA
cert from the running mitmproxy sidecar, copies it into the agent
at /usr/local/share/ca-certificates/claude-bottle-mitm.crt, runs
update-ca-certificates, and emits a stderr line with the SHA-256
fingerprint (stdlib ssl + hashlib; no subprocess).
Cleanup needs no change — `docker ps --filter name=^claude-bottle-`
already catches the new claude-bottle-mitm-<slug> containers.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
First step of PRD 0005. Three new files for the
mitmproxy-in-front-of-pipelock topology — wiring into the bottle
launch comes in the next commit.
- claude_bottle/mitmproxy/__init__.py: abstract MitmproxyProxy
base + MitmproxyProxyPlan. Mirrors the PipelockProxy shape
(prepare / start / stop) and adds extract_ca_cert for the CA
cert hand-off into the agent.
- claude_bottle/mitmproxy/addon.py: the vendored Python addon
mitmproxy loads inside the sidecar. Forwards each decrypted
request to pipelock as a plain HTTP forward-proxy call,
inspects the response, and short-circuits the flow with 403 on
a pipelock block (status=403 + body starts with `blocked: `,
pinned empirically against pipelock 2.3.0 in the impl spike).
Self-contained — no claude_bottle imports — so it loads in a
sidecar that doesn't have claude_bottle on its path.
- claude_bottle/backend/docker/mitmproxy.py: DockerMitmproxyProxy
with create / cp / network connect / start lifecycle. Pinned
to mitmproxy/mitmproxy@sha256:00b77b5d… (multi-arch manifest
for v12.2.3).
- tests/unit/test_mitmproxy_verdict.py: pins the verdict
fingerprint so a pipelock-side body shape change breaks loudly.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Re-grounds the design after walking the eight original open
questions interactively. Two structural changes:
- Topology A → A'. A spike confirmed mitmproxy's `upstream` mode
re-wraps decrypted flows in a new CONNECT to the upstream proxy,
which would have left pipelock seeing only ciphertext (the very
gap this PRD set out to close). The fix is to run mitmproxy in
`regular` mode and ship a vendored Python addon that forwards
each decrypted request to pipelock as a plain HTTP forward-proxy
call. Pipelock is unchanged.
- mitmproxy owns CA generation. The research note's preference
for a host-side openssl / cryptography CA turned out to be
unnecessary — mitmproxy generates a fresh CA on startup; the
public cert is `docker cp`'d into the agent. No new host-side
crypto deps. Dry-run can't render a fingerprint (CA doesn't
exist yet); launches print it once to stderr.
Other Q3–Q8 resolutions folded in: Debian-base `update-ca-certificates`
confirmed, mitmproxy 12 verified to speak h2 on both halves,
selective-bump deferred to v2, response-body and MCP scanning
deferred to v2, domain-fronting deferred to v2.
Open questions rewritten — what remains is addon-implementation
specifics (pipelock 403-body fingerprint, env-var inheritance
through docker exec, addon test fixtures).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Captures the design for putting a mitmproxy sidecar in front of
pipelock on the egress path so pipelock's body / header / MCP
scanners see plaintext for the HTTPS hosts in the default allowlist.
Implements Topology A from docs/research/tls-mitm-for-pipelock.md
with a per-bottle ephemeral CA, no manifest schema change in v1,
and selective-bumping deferred until a pinning host appears.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The no-side-effects assertion calls `docker network ls` and
`docker ps -a` to verify the dry run created nothing. Inside the
Gitea Actions job container, those exit non-zero against the
host-mounted docker socket — the same act_runner topology issue
that already excludes other integration tests from CI (see
docs/ci.md). The failure was silently swallowed under the default
check=False; the recent style sweep that added check=True surfaced
it.
Gate the docker-enumerating check on GITEA_ACTIONS so the JSON
contract — the more useful part of the test — keeps running on CI.
Consolidate the two count helpers into one that surfaces stderr in
the failure message instead of raising a context-free
CalledProcessError, so the next docker surprise is debuggable.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Survey of TLS-MITM tools (mitmproxy, Squid+ssl_bump, Go libraries) and
five candidate topologies for adding TLS termination to the egress path
so pipelock's DLP, subdomain-entropy, and MCP scanners can fire on
plaintext bodies. Recommends mitmproxy in front of pipelock for v1
with a per-bottle ephemeral CA.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds bottle.egress.dlp_action ("block" | "warn", default block) and
wires it into pipelock as request_body_scanning.action. Pipelock's
own default is "warn", which previously meant claude-bottle detected
credential patterns in outbound bodies but forwarded the request
anyway.
The matching integration test posts a manifest env var shaped like
a GitHub PAT to api.anthropic.com via plain HTTP forward proxy so
pipelock can see the body. Pipelock answers 403 from its body-scan
layer instead of forwarding to the upstream.
Behavior change: bottles without an explicit egress.dlp_action now
block on body-scan hits. Set egress.dlp_action: "warn" to restore
the prior detect-only behavior.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bottle.exec(script) -> ExecResult runs a POSIX shell script inside a
running bottle and returns captured stdout/stderr/returncode. The
Docker impl pipes the script via stdin to `docker exec -i ... sh -s`
so the source never crosses argv.
Two integration tests exercise it end-to-end through the pipelock
sidecar: a Node request to a non-allowlisted host (example.com)
returns 403 from pipelock; a Node CONNECT to an allowlisted host
(raw.githubusercontent.com) is tunneled with 200 Connection
Established. The 200/403 split on each verb is decided by pipelock
itself, isolating the allowlist decision from whatever the remote
might return.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The helper is a thin subprocess wrapper over `container_exists` +
`docker rm -f`, so it belongs alongside the other docker primitives
in util.py rather than as a private in launch.py.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Move the resolution, bring-up, and orphan-cleanup logic out of
backend.py into three topic-named modules. DockerBottleBackend becomes
a thin façade that wires the per-instance pipelock proxy and the
provision orchestrator into the free functions.
backend.py drops from ~360 to ~70 lines and each topic now reads
end-to-end in one place. Mirrors the existing provision/ split.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Make BottleBackend.prepare a template method that runs a cross-backend
_validate step (agent exists, named skills present on host, SSH
IdentityFiles resolve) and then delegates to a subclass-implemented
_resolve_plan for backend-specific resolution.
A future backend that overrides _resolve_plan can no longer forget to
validate skills or SSH keys; the validation runs unconditionally via
prepare. Backends with additional preconditions can override _validate
and chain via super().
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Avoids cross-instance state via class attribute; the proxy is now
constructed in __init__ alongside its owning backend.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The test overrides HOME to isolate the manifest under test from the
dev's real ~/claude-bottle.json. On Docker Desktop that override
also breaks docker CLI endpoint resolution, since the active context
is read from $HOME/.docker/config.json and the per-user socket lives
under $HOME/.docker/run/docker.sock. Forward the parent's resolved
endpoint via DOCKER_HOST so the subprocess reaches the same daemon
regardless of $HOME.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ResolvedEnv.forwarded now carries name->value pairs instead of names
whose values had been side-loaded into os.environ. The Docker backend
collects the dict (plus the renamed OAuth token) and passes it via
subprocess.run(env=...) so docker run -e NAME forwards by-name from
the child's environment, not the parent's.
Values are excluded from the dataclass repr (forwarded on ResolvedEnv,
forwarded_env on DockerBottlePlan) so accidental logging cannot leak
secret or interpolated values.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Silences pylint W1510 / ruff PLW1510 across the codebase. The choice
at each site reflects existing intent:
- check=True where the caller implicitly trusts success (docker ps /
network ls returning stdout, docker build, exec chown/chmod inside
provisioners).
- check=False where the caller inspects .returncode (race-retry on
docker run, pipelock sidecar lifecycle, network plumbing, exec_claude
propagating the session's exit code, best-effort cleanup paths).
No behavior change; check= defaults to False so the False sites are
semantically identical.
Adds pyrightconfig.json (strict, Python 3.11) covering cli.py,
claude_bottle/, and tests/. Fixes the 49 strict-mode errors:
- Type DockerBottle.teardown as Callable[[], None].
- ResolvedEnv default_factory uses parameterized list[str] / dict[str, str].
- Erase BottleBackend generics at the registry boundary
(BottleBackend[Any, Any]) since selection is runtime-driven and
callers use the unparameterized interface.
- DockerBottleBackend.launch returns Generator[DockerBottle, None, None];
@contextmanager now flags Iterator returns as deprecated.
- Sidestep cli.list submodule shadowing builtins.list in main()'s argv
annotation via an aliased re-import in cli/__init__.py.
- Cast cfg[...] results in test_pipelock_yaml at the dict[str, object]
boundary.
- Annotate write_fixture's fn parameter and _manifest_with_runtime's
return type.
DockerBottlePlan.print and .to_dict each pulled the same agent /
bottle / env_names / ssh_hosts / prompt-first-line out of the spec
before formatting. Extract a private _view() helper that returns a
small frozen _PlanView dataclass with those derived fields; both
methods consume it. Removes the duplicated derivation and the risk
that one renderer drifts from the other (the OAuth-name append in
particular existed twice).
Previously prepare wrote two on-disk artifacts that launch consumed:
agent.env (NAME=VALUE) and docker-args (paired -e\nNAME\n lines), with
launch parsing the second back into argv. Docker requires the literals
file on disk for --env-file, but the args-file round-trip was a pure
serialize/deserialize trip with hand-rolled line pairing logic.
Drop docker-args entirely. Pass forwarded names as a structured
tuple[str, ...] field on DockerBottlePlan; launch iterates it directly
to extend docker_args. _write_env_files becomes _write_env_file (only
the literals file remains).
Both prepare-time probing and launch-time race-retry generated the
same `<base>, <base>-2, ..., <base>-N` sequence with their own copies
of the suffix arithmetic and the 99-cap. Extract the candidate stream
into docker/util.container_name_candidates and have both call sites
walk it; each keeps its own predicate (probe vs. retry).
Also bumps the cap into a named constant (MAX_CONTAINER_SUFFIX) so
the two error messages can't drift.
Previously _run_agent_container set os.environ["CLAUDE_CODE_OAUTH_TOKEN"]
deep inside the launch path and added a one-off `-e` pair to docker_args,
which was the only env var to bypass the resolved.forwarded flow used
for everything else.
Move the os.environ mutation + name registration into prepare, right
after resolve_env, so the OAuth token rides the same forwarded-by-name
mechanism as secrets and interpolated entries. _run_agent_container
loses the special case entirely.
Parameterize BottleBackend over PlanT (bound to BottlePlan) and
CleanupT (bound to BottleCleanupPlan). DockerBottleBackend declares
itself BottleBackend[DockerBottlePlan, DockerBottleCleanupPlan], which
narrows every method's plan parameter to the concrete type and lets
the six `assert isinstance(plan, DockerBottlePlan)` lines on
launch/cleanup/provision_* go away.
The dict in get_bottle_backend keeps its unparameterized
BottleBackend element type so it can hold heterogeneous backend
specializations.
Replace the manual state-dict + per-resource branching teardown in
DockerBottleBackend.launch with an ExitStack: each resource registers
its own cleanup callback at the moment it's created, and stack.close()
unwinds in LIFO order. The previous form had to hand-coordinate four
nullable slots and re-check existence for the container; ExitStack
encodes the same semantics declaratively.
The smoke test now drives the production prepare/start path, which
calls network_create_internal. Under Gitea act_runner the docker
socket mount topology makes `docker network create --internal` fail
(or be invisible across the host/job-container boundary) — the same
limitation that test_orphan_cleanup.test_create_and_remove already
skips for. Match that skip here so CI goes green; the test still
runs in environments with a direct docker daemon.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PipelockProxy.prepare now accepts (bottle, slug, stage_dir) and derives
the yaml_path itself, so callers don't need to know the filename.
DockerBottleBackend.prepare_proxy becomes a one-line wrapper whose only
caller already has bottle and slug in scope, so it's inlined and
deleted.
The four lower-level helpers (pipelock_bottle_allowlist,
pipelock_bottle_ssh_hostnames, pipelock_bottle_ssh_ip_cidrs,
pipelock_bottle_ssh_trusted_domains) are one-line filters; testing
each in isolation duplicates coverage that pipelock_effective_allowlist
already provides end-to-end. The /32 CIDR suffix is the only behavior
beyond filtering, so it keeps a tiny dedicated test.
Drops the misplaced test_rejects_non_string_entry — that's manifest
validation, not allowlist resolution. Belongs in a manifest-validation
test file (which doesn't exist yet); leaving for a separate PR rather
than adding a one-branch sample here.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Move the --format=json-requires-dry-run check out of the integration
suite (it doesn't need Docker — argparse fails before any backend
runs) and tighten the assertion: previously asserted only that exit
code was nonzero, so any unrelated breakage (manifest resolution
failure, bad agent name, etc.) silently passed. Now asserts stderr
contains the actual flag-conflict message.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Compares smolmachines against the six subsystems in
agent-vm-isolation.md. smolmachines replaces the microVM runtime,
network attachment (libkrun TSI with built-in DNS-over-vsock filter),
vsock control plane, and Python lifecycle wrapper. Pipelock stays;
disk-image story shifts to OCI + writable overlay. Recommends adopting
smolmachines as the macOS VM backend after smoke-testing TSI
passthrough to a host-side pipelock.
Transcript-style notes on running an agent in a hardware-isolated
microVM on macOS. Covers Virtualization.framework / vfkit / libkrun
choices, hardware-isolation guarantees, driving VMs from Python
(subprocess or PyObjC), pipelock as the egress proxy, vsock for the
control channel, and egress enforcement via
VZFileHandleNetworkDeviceAttachment + gvisor-tap-vsock.
The old smoke test hand-rolled the docker create/cp/start sequence in
parallel with what DockerPipelockProxy.start already does, so any
divergence in production code wouldn't trip it. Rewritten to call
.prepare and .start directly and probe /health from a sibling curl
container on the same internal network — same access topology the
agent container uses in production.
In-network probing means the test no longer depends on a published
port, so it can run under act_runner (where host-loopback port
publishing isn't reachable from the job container).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
BottlePlan gains a to_dict method (abstract on the base, implemented
on DockerBottlePlan) returning a JSON-serializable view of the resolved
plan. `cli.py start --dry-run --format=json` prints it to stdout and
exits zero. --format=json without --dry-run is rejected — emitting JSON
during a real launch would race the y/N prompt.
The dry-run integration test now parses the JSON and asserts on
structured fields (agent, bottle, runtime, hosts sorted+deduped, etc.)
instead of regex-matching the human-readable preflight stdout. That
kills the magic-"8 hosts allowed" coupling — adding a new baked
default doesn't break the test.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Split pipelock config building from YAML rendering: pipelock_build_config
returns a dict, pipelock_render_yaml serializes it, and _build_pipelock_yaml
chains the two onto disk. Unchanged behavior — pipelock loads the same YAML.
The yaml test now asserts on the structured config dict, which is
robust to cosmetic YAML changes (key order, quoting). The two checks
that only make sense on the rendered output — file mode 0600 and
no-secret-leakage — stay against the on-disk content.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace the hand-maintained INTEGRATION_NAMES classifier (and the
bespoke run_tests.py around it) with a directory-driven split:
tests/unit/ unit tests, always run
tests/integration/ Docker-dependent, skip cleanly without Docker
tests/canaries/ upstream-regression checks, opt-in via
CLAUDE_BOTTLE_RUN_CANARIES=1
The pinned-pipelock-image check moves to the canary suite — it tests
upstream packaging, not our code, so it shouldn't gate every dev push.
A scheduled canaries.yml workflow runs it weekly.
The manifest-runtime tests collapse the four assertRaises cases for
distinct 'runtime' values into one subTest loop and drop the
error-message-wording assertions; the contract is "any value is
rejected", not "the error literally contains 'auto-detect'".
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Empty commit to seed the branch so a PR can be opened against main.
Actual test refactor work will follow.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Compares claude-bottle to endo-familiar, litterbox, agent-safehouse,
matchlock, tilde.run, boxlite, microsandbox, and smolmachines. Covers
isolation primitive, locality, agent integration, network policy, and
maturity, and notes three borrowable ideas (per-use SSH confirmation,
in-flight secret injection, microVM backend) that fit the current
bash-first / local-Docker stance.
Renames the file and rewrites the body around what actually shipped:
class-based BottleBackend ABC (not a free create_docker_bottle
function), the two-phase prepare/launch split, the backend/docker/
subpackage layout, env.py reshaped into a backend-neutral ResolvedEnv,
and PipelockProxy split between top-level and backend/docker/.
resolve_env_into(...) becomes resolve_env(manifest, agent) -> ResolvedEnv
(forwarded names + literals). The docker backend now owns env-file /
argv serialization and the --env-file newline check. Also drops stray
Docker references from manifest.py, pipelock.py, util.py, and trims
the duplicated command list from cli.py's docstring (usage() in
claude_bottle/cli/__init__.py is now the only listing).
Module name aligns with the others (manifest, pipelock, network,
log) — nouns/noun-y, not verb phrases. The function name now reads
naturally at the call site: resolve_env_into(manifest, agent,
env_file, args_file).
New file claude_bottle/backend/util.py for cross-backend host-side
helpers:
host_skill_dir(name) — resolves $HOME/.claude/skills/<name>
docker/util.py gains:
docker_exec_root(container, argv) — `docker exec -u 0` wrapper used
by SSH provisioning
DockerBottleBackend drops the two methods that wrapped these
(`_host_skill_dir`, `_docker_exec_root`) — they had no instance state
and just lived on the class for organizational reasons. Call sites
now use the imported functions directly.
Matches the allowlist-resolution helpers' shape: the caller resolves
the bottle once and passes it in. Signature drops from
(manifest, bottle_name, slug, yaml_path) to (bottle, slug, yaml_path).
DockerBottleBackend.prepare_proxy uses manifest.bottle_for(agent_name)
to get the bottle directly. Tests pass fixture.bottles[name].
prepare's docstring also explains what `slug` is: the lowercased,
hyphen-normalized agent identifier used as the suffix in every
per-agent resource name (agent container, pipelock container, the
internal/egress networks). It's stored on the plan so start can
derive the sidecar's container name.
Top-level pipelock.py drops the Manifest import — no longer used.