Spike: Apple Container networking for macos-container backend #230

Closed
opened 2026-06-10 15:12:40 -04:00 by didericis-codex · 6 comments
Collaborator

Summary

Verify whether Apple's container CLI can support bot-bottle's sidecar network enforcement model for the experimental macos-container backend. This is the next gating question from PR #229 / the macOS Container backend PRD.

Context

PR #229 adds the backend scaffold and intentionally keeps launch disabled until the networking shape is proven. A runnable backend must not let the agent bypass the egress sidecar through direct network access.

References:

Questions to answer

  1. Does container network create --internal prevent outbound internet access from attached containers?
  2. Can container run attach one container to multiple networks, either through repeated --network flags or an equivalent command?
  3. Can the sidecar bundle sit on both an internal agent network and an egress-capable network while the agent sits only on the internal network?
  4. Can Apple Container provide stable network aliases or another service-discovery mechanism equivalent to Docker Compose aliases (egress, git-gate, supervise)?
  5. Can a published sidecar port bound to a per-bottle loopback alias be reached from another Apple Container guest, and can reachability be constrained to that alias?
  6. What JSON/structured output is available from container list, container inspect, container image inspect, and container network inspect for robust enumeration and cleanup?
  7. Are labels supported on containers and networks in enough places to replace prefix-only discovery?

Suggested validation commands

  • Create an internal network and a normal network.
  • Start a sidecar-like container with a tiny HTTP server and published ports.
  • Start an agent-like container attached only to the internal network.
  • Confirm agent -> sidecar works.
  • Confirm agent -> internet fails unless routed through the sidecar.
  • Confirm host -> published sidecar port works through the intended loopback alias.
  • Capture and save container inspect / container network inspect JSON samples.

Acceptance criteria

  • Document the tested macOS version and Apple Container version.
  • Record exact commands and observed output.
  • State whether the Docker-style two-network sidecar topology is viable.
  • If not viable, recommend the next-best enforcement shape or explicitly block enabling launch.
  • Add or update repo docs/PRD notes with the result.
## Summary Verify whether Apple's `container` CLI can support bot-bottle's sidecar network enforcement model for the experimental `macos-container` backend. This is the next gating question from PR #229 / the macOS Container backend PRD. ## Context PR #229 adds the backend scaffold and intentionally keeps launch disabled until the networking shape is proven. A runnable backend must not let the agent bypass the egress sidecar through direct network access. References: - PR #229: https://gitea.dideric.is/didericis/bot-bottle/pulls/229 - Remaining-work comment: https://gitea.dideric.is/didericis/bot-bottle/pulls/229#issuecomment-1986 - PRD: `docs/prds/prd-new-macos-container-backend.md` - Issue #220 smolvm port exposure comment: https://gitea.dideric.is/didericis/bot-bottle/issues/220#issuecomment-1980 ## Questions to answer 1. Does `container network create --internal` prevent outbound internet access from attached containers? 2. Can `container run` attach one container to multiple networks, either through repeated `--network` flags or an equivalent command? 3. Can the sidecar bundle sit on both an internal agent network and an egress-capable network while the agent sits only on the internal network? 4. Can Apple Container provide stable network aliases or another service-discovery mechanism equivalent to Docker Compose aliases (`egress`, `git-gate`, `supervise`)? 5. Can a published sidecar port bound to a per-bottle loopback alias be reached from another Apple Container guest, and can reachability be constrained to that alias? 6. What JSON/structured output is available from `container list`, `container inspect`, `container image inspect`, and `container network inspect` for robust enumeration and cleanup? 7. Are labels supported on containers and networks in enough places to replace prefix-only discovery? ## Suggested validation commands - Create an internal network and a normal network. - Start a sidecar-like container with a tiny HTTP server and published ports. - Start an agent-like container attached only to the internal network. - Confirm agent -> sidecar works. - Confirm agent -> internet fails unless routed through the sidecar. - Confirm host -> published sidecar port works through the intended loopback alias. - Capture and save `container inspect` / `container network inspect` JSON samples. ## Acceptance criteria - Document the tested macOS version and Apple Container version. - Record exact commands and observed output. - State whether the Docker-style two-network sidecar topology is viable. - If not viable, recommend the next-best enforcement shape or explicitly block enabling launch. - Add or update repo docs/PRD notes with the result.
didericis added the Kind/Testing label 2026-06-10 15:13:41 -04:00
Author
Collaborator

I could not run the Apple Container networking spike on this host yet. This machine is macOS 15.6.1 and Apple Container networking features needed for the spike require macOS 26; the container CLI is also not installed here. We need to update macOS and install/start Apple Container before testing the questions in this issue.

I could not run the Apple Container networking spike on this host yet. This machine is macOS 15.6.1 and Apple Container networking features needed for the spike require macOS 26; the `container` CLI is also not installed here. We need to update macOS and install/start Apple Container before testing the questions in this issue.
Author
Collaborator

Reran the Apple Container networking spike on macOS 26.5.1 arm64 with Apple Container 1.0.0.

Result: the core two-network sidecar shape is viable, with caveats:

  • container network create --internal produces a hostOnly network; an agent attached only there could reach the sidecar by internal IP but could not ping 1.1.1.1 or fetch https://example.com.
  • Repeated --network flags work. The sidecar got both NAT and host-only interfaces.
  • Network order matters: the first network becomes the default route. Sidecar must attach to NAT first, then the agent/internal network.
  • The NAT gateway routed packets, but DNS via the gateway failed in this run. Explicit sidecar DNS (--dns 1.1.1.1 in the spike) made https://example.com work.
  • Container-name service discovery did not work by default from the agent. The agent could reach sidecar by the sidecar host-only IP, but not by bb-spike-230-sidecar.
  • Host loopback publish works and is constrained to the bound alias (127.0.0.31:18080 worked; 127.0.0.1:18080 failed), but guests could not reach that host loopback-published port through the host-only gateway. Guest-to-sidecar should use the internal network IP.
  • JSON/structured output is available from list/inspect/network/image/system commands and includes labels, networks, published ports, and per-network IPs.

I updated docs/research/apple-container-networking-spike.md with commands, observed output, answers, and recommendation. Recommendation: proceed with a narrow macos-container launch prototype, but do not assume Docker Compose aliases; inspect the sidecar internal IP and pass concrete endpoints to the agent.

Reran the Apple Container networking spike on macOS 26.5.1 arm64 with Apple Container 1.0.0. Result: the core two-network sidecar shape is viable, with caveats: - `container network create --internal` produces a `hostOnly` network; an agent attached only there could reach the sidecar by internal IP but could not ping `1.1.1.1` or fetch `https://example.com`. - Repeated `--network` flags work. The sidecar got both NAT and host-only interfaces. - Network order matters: the first network becomes the default route. Sidecar must attach to NAT first, then the agent/internal network. - The NAT gateway routed packets, but DNS via the gateway failed in this run. Explicit sidecar DNS (`--dns 1.1.1.1` in the spike) made `https://example.com` work. - Container-name service discovery did not work by default from the agent. The agent could reach sidecar by the sidecar host-only IP, but not by `bb-spike-230-sidecar`. - Host loopback publish works and is constrained to the bound alias (`127.0.0.31:18080` worked; `127.0.0.1:18080` failed), but guests could not reach that host loopback-published port through the host-only gateway. Guest-to-sidecar should use the internal network IP. - JSON/structured output is available from list/inspect/network/image/system commands and includes labels, networks, published ports, and per-network IPs. I updated `docs/research/apple-container-networking-spike.md` with commands, observed output, answers, and recommendation. Recommendation: proceed with a narrow `macos-container` launch prototype, but do not assume Docker Compose aliases; inspect the sidecar internal IP and pass concrete endpoints to the agent.
Owner

@didericis-codex follow up: I'd like you to spike whether we can route traffic through the sidecar egress so we don't need an explicit proxy declared that the agent respects. EX: given this bottle config:

egress:
  log: 2
  routes:
    - host: pypi.org
      dlp:
        inbound_detectors: false
    - host: files.pythonhosted.org
      dlp:
        inbound_detectors: false
    - host: 100.78.141.42

When the agent does a request to pypi.org, I'd like something to point it to the egress sidecar proxy without specifying HTTP_PROXY.

@didericis-codex follow up: I'd like you to spike whether we can route traffic through the sidecar egress so we don't need an explicit proxy declared that the agent respects. EX: given this bottle config: ```yaml egress: log: 2 routes: - host: pypi.org dlp: inbound_detectors: false - host: files.pythonhosted.org dlp: inbound_detectors: false - host: 100.78.141.42 ``` When the agent does a request to `pypi.org`, I'd like something to point it to the egress sidecar proxy without specifying `HTTP_PROXY`.
Author
Collaborator

Yes, this is the right next spike.

The previous run proved the basic isolation shape: an internal-only agent could not egress, a dual-homed sidecar could egress, and the agent could reach the sidecar by internal IP. It did not prove transparent routing. Today the viable path still requires the agent/client to be pointed at the sidecar explicitly.

I’ll spike whether we can make normal direct traffic, e.g. pypi.org:443, flow through the sidecar without HTTP_PROXY/HTTPS_PROXY. The concrete questions I’d test:

  • Can the sidecar become the agent network’s default gateway?
  • Can Apple Container support the sidecar doing forwarding/NAT/transparent proxying, and what caps/custom image are required?
  • If default-gateway routing is not possible, can DNS or host-level pf/vmnet rules safely redirect the traffic to the sidecar?
  • Can we preserve the existing route policy/DLP semantics at the sidecar once traffic is transparent?

I would treat this as a separate enforcement spike. The prior spike does not establish this, and I would not enable launch on the assumption that transparent routing works.

Yes, this is the right next spike. The previous run proved the basic isolation shape: an internal-only agent could not egress, a dual-homed sidecar could egress, and the agent could reach the sidecar by internal IP. It did not prove transparent routing. Today the viable path still requires the agent/client to be pointed at the sidecar explicitly. I’ll spike whether we can make normal direct traffic, e.g. `pypi.org:443`, flow through the sidecar without `HTTP_PROXY`/`HTTPS_PROXY`. The concrete questions I’d test: - Can the sidecar become the agent network’s default gateway? - Can Apple Container support the sidecar doing forwarding/NAT/transparent proxying, and what caps/custom image are required? - If default-gateway routing is not possible, can DNS or host-level `pf`/vmnet rules safely redirect the traffic to the sidecar? - Can we preserve the existing route policy/DLP semantics at the sidecar once traffic is transparent? I would treat this as a separate enforcement spike. The prior spike does not establish this, and I would not enable launch on the assumption that transparent routing works.
Author
Collaborator

Completed the transparent egress spike and committed the write-up on main.

Commit: 529d0bc docs: add Apple Container transparent egress spike
Doc: docs/research/apple-container-transparent-egress-spike.md

Blunt result: transparent egress is feasible, but not production-ready until we solve agent route setup without leaving CAP_NET_ADMIN in the agent runtime.

What worked:

  • A dual-homed sidecar with CAP_NET_ADMIN, IP forwarding, and iptables masquerade can route normal direct agent traffic out to the internet without HTTP_PROXY/HTTPS_PROXY.
  • mitmdump --mode transparent with PREROUTING redirects on the sidecar can intercept normal direct TCP 80/443 traffic.
  • Direct HTTP to example.com was logged by mitmproxy.
  • Direct HTTPS to pypi.org reached mitmproxy. It failed with normal cert verification until the client trusted/skipped the mitm CA, which is consistent with our existing sidecar CA model.
  • Running DNS on the sidecar is acceptable and works: I tested dnsmasq bound to the sidecar host-only IP, pointed the agent resolver at it, and direct HTTP/HTTPS still hit mitmproxy.

What did not work / remains the blocker:

  • Apple Container 1.0.0 does not expose a documented per-network gateway option.
  • An ordinary agent cannot replace its default route: ip route replace default via <sidecar> fails with Operation not permitted.
  • The successful route-through-sidecar tests required agent CAP_NET_ADMIN, which I would not accept as the final security posture.

Recommended next implementation spike: start the agent through a tiny backend-owned root init that sets route + DNS to the sidecar, drops network-admin capability / switches to the normal agent user, then execs the agent. Alternative paths are finding a real Apple Container gateway setting or doing host/vmnet-level redirection. For v1, keeping explicit proxy env vars is still the simpler safe path.

Completed the transparent egress spike and committed the write-up on `main`. Commit: `529d0bc docs: add Apple Container transparent egress spike` Doc: `docs/research/apple-container-transparent-egress-spike.md` Blunt result: transparent egress is feasible, but not production-ready until we solve agent route setup without leaving `CAP_NET_ADMIN` in the agent runtime. What worked: - A dual-homed sidecar with `CAP_NET_ADMIN`, IP forwarding, and iptables masquerade can route normal direct agent traffic out to the internet without `HTTP_PROXY`/`HTTPS_PROXY`. - `mitmdump --mode transparent` with PREROUTING redirects on the sidecar can intercept normal direct TCP 80/443 traffic. - Direct HTTP to `example.com` was logged by mitmproxy. - Direct HTTPS to `pypi.org` reached mitmproxy. It failed with normal cert verification until the client trusted/skipped the mitm CA, which is consistent with our existing sidecar CA model. - Running DNS on the sidecar is acceptable and works: I tested `dnsmasq` bound to the sidecar host-only IP, pointed the agent resolver at it, and direct HTTP/HTTPS still hit mitmproxy. What did not work / remains the blocker: - Apple Container 1.0.0 does not expose a documented per-network gateway option. - An ordinary agent cannot replace its default route: `ip route replace default via <sidecar>` fails with `Operation not permitted`. - The successful route-through-sidecar tests required agent `CAP_NET_ADMIN`, which I would not accept as the final security posture. Recommended next implementation spike: start the agent through a tiny backend-owned root init that sets route + DNS to the sidecar, drops network-admin capability / switches to the normal agent user, then execs the agent. Alternative paths are finding a real Apple Container gateway setting or doing host/vmnet-level redirection. For v1, keeping explicit proxy env vars is still the simpler safe path.
Owner

Ok, will keep the proxy env vars for the initial release.

Ok, will keep the proxy env vars for the initial release.
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: didericis/bot-bottle#230