docs: add Apple Container transparent egress spike

This commit is contained in:
2026-06-10 19:19:52 -04:00
parent 13e4af421d
commit 7644da4280
@@ -0,0 +1,476 @@
# Apple Container transparent egress spike
Issue: https://gitea.dideric.is/didericis/bot-bottle/issues/230#issuecomment-1994
## Summary
Transparent egress is mechanically possible on Apple Container 1.0.0,
but it is not a free property of the platform and it is not a drop-in
replacement for `HTTP_PROXY` yet.
The spike proved two separate things:
- Plain routing/NAT works if the sidecar has `CAP_NET_ADMIN`, IP
forwarding, and masquerade rules, and if the agent default route is
changed to the sidecar's host-only-network IP.
- Transparent mitmproxy interception works if the sidecar redirects
agent-facing TCP 80/443 traffic to `mitmdump --mode transparent`.
Direct HTTP was logged by mitmproxy. Direct HTTPS reached mitmproxy;
it failed with normal certificate verification until the client
skipped verification, which is consistent with bot-bottle's existing
requirement that agents trust the sidecar CA.
- Running DNS on the sidecar and pointing the agent at the sidecar's
host-only IP also works. This is cleaner than relying on forwarded
UDP DNS to a public resolver and gives the backend a natural place to
enforce or observe DNS policy.
The hard blocker is agent routing. Apple Container 1.0.0 exposes no
documented `--network` gateway option. An ordinary agent container
cannot replace its default route:
```console
$ container exec bb-spike-230t-agent sh -c \
'ip route replace default via 192.168.128.2 dev eth0; ip route'
default via 192.168.128.1 dev eth0
192.168.128.0/24 dev eth0 scope link src 192.168.128.3
ip: RTNETLINK answers: Operation not permitted
```
The successful route-through-sidecar tests used `--cap-add
CAP_NET_ADMIN` on the agent so the route could be changed after start.
That is not an acceptable final design by itself: it expands the
agent's kernel-facing privilege and lets the agent mutate its own
network namespace. A production design needs either a backend-owned
init/shim that sets the route then drops privilege in a way the agent
cannot regain, a platform-supported gateway option, or a different
network attachment layer.
## Environment
Tested on 2026-06-10:
```console
$ sw_vers
ProductName: macOS
ProductVersion: 26.5.1
BuildVersion: 25F80
$ uname -m
arm64
$ container --version
container CLI version 1.0.0 (build: release, commit: ee848e3)
```
Apple Container system status:
```json
{
"apiServerAppName": "container-apiserver",
"apiServerBuild": "release",
"apiServerCommit": "ee848e3ebfd7c73b04dd419683be54fb450b8779",
"apiServerVersion": "container-apiserver version 1.0.0 (build: release, commit: ee848e3)",
"appRoot": "/Users/didericis/Library/Application Support/com.apple.container/",
"installRoot": "/usr/local/",
"status": "running"
}
```
## Baseline
Networks:
```bash
container network create bb-spike-230t-agent \
--internal \
--label bot-bottle.spike=transparent-egress
container network create bb-spike-230t-egress \
--label bot-bottle.spike=transparent-egress
```
Sidecar, dual-homed with NAT first:
```bash
container run --name bb-spike-230t-sidecar \
--label bot-bottle.spike=transparent-egress \
--network bb-spike-230t-egress \
--network bb-spike-230t-agent \
--dns 1.1.1.1 \
--detach docker.io/alpine:latest sleep 1800
```
Agent, host-only network:
```bash
container run --name bb-spike-230t-agent \
--label bot-bottle.spike=transparent-egress \
--network bb-spike-230t-agent \
--detach docker.io/alpine:latest sleep 1800
```
Observed sidecar addresses:
```console
eth0 192.168.66.2/24 # NAT egress network
eth1 192.168.128.2/24 # host-only agent network
default via 192.168.66.1 dev eth0
nameserver 1.1.1.1
```
Observed agent baseline:
```console
eth0 192.168.128.3/24
default via 192.168.128.1 dev eth0
nameserver 192.168.128.1
wget: bad address 'pypi.org'
```
That confirms the previous spike's baseline: sidecar can egress, agent
cannot egress directly.
## Plain NAT Test
Relaunch sidecar and agent with `CAP_NET_ADMIN`:
```bash
container run --name bb-spike-230t-sidecar \
--label bot-bottle.spike=transparent-egress \
--network bb-spike-230t-egress \
--network bb-spike-230t-agent \
--dns 1.1.1.1 \
--cap-add CAP_NET_ADMIN \
--detach docker.io/alpine:latest sleep 1800
container run --name bb-spike-230t-agent \
--label bot-bottle.spike=transparent-egress \
--network bb-spike-230t-agent \
--cap-add CAP_NET_ADMIN \
--detach docker.io/alpine:latest sleep 1800
```
Configure sidecar forwarding:
```bash
container exec bb-spike-230t-sidecar sh -c '
apk add --no-cache iptables iproute2
sysctl -w net.ipv4.ip_forward=1
iptables -t nat -A POSTROUTING -s 192.168.128.0/24 -o eth0 -j MASQUERADE
iptables -A FORWARD -i eth1 -o eth0 -j ACCEPT
iptables -A FORWARD -i eth0 -o eth1 -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
'
```
Point the agent at the sidecar:
```bash
container exec bb-spike-230t-agent sh -c '
ip route replace default via 192.168.128.4 dev eth0
printf "nameserver 1.1.1.1\n" > /etc/resolv.conf
'
```
Normal direct PyPI fetch from the agent, with no proxy variables set:
```bash
container exec bb-spike-230t-agent sh -c '
for v in HTTP_PROXY HTTPS_PROXY http_proxy https_proxy ALL_PROXY all_proxy; do
if [ -n "$(printenv "$v")" ]; then echo "$v=SET"; fi
done
wget -T 10 -O- https://pypi.org/simple/pip/ | head -c 120
'
```
Observed:
```console
Connecting to pypi.org (151.101.0.223:443)
<!DOCTYPE html>
<html lang="en">
<head>
<meta name="pypi:repository-version" content="1.4">
```
Sidecar NAT counters increased:
```console
POSTROUTING MASQUERADE 3 packets / 168 bytes
FORWARD eth1 -> eth0 22 packets / 2806 bytes
FORWARD eth0 -> eth1 29 packets / 54781 bytes
```
Verdict: plain transparent routing through the sidecar works, but this
is only NAT. It does not apply bot-bottle's existing route allowlist,
authorization stripping/injection, or DLP logic.
## Transparent Mitmproxy Test
The current sidecar launcher uses explicit proxy mode:
```sh
MODE="--mode regular@9099"
exec mitmdump $CONFDIR_FLAG $MODE $LISTEN_HOST_FLAG $TRUST_FLAG -s /app/egress_addon.py
```
So transparent egress needs a launcher mode change plus iptables
redirects.
Run a test mitmproxy container:
```bash
container run --name bb-spike-230t-mitm \
--label bot-bottle.spike=transparent-egress \
--network bb-spike-230t-egress \
--network bb-spike-230t-agent \
--dns 1.1.1.1 \
--cap-add CAP_NET_ADMIN \
--detach mitmproxy/mitmproxy:11.1.3 \
sh -c 'apt-get update >/tmp/apt.log &&
apt-get install -y --no-install-recommends iptables iproute2 >>/tmp/apt.log &&
echo 1 > /proc/sys/net/ipv4/ip_forward &&
iptables -t nat -A PREROUTING -i eth1 -p tcp --dport 80 -j REDIRECT --to-port 8080 &&
iptables -t nat -A PREROUTING -i eth1 -p tcp --dport 443 -j REDIRECT --to-port 8080 &&
mitmdump --mode transparent@8080 --set showhost=true --set ssl_insecure=true --set confdir=/tmp/mitm -v'
```
The container listened successfully:
```console
Transparent Proxy listening at *:8080.
```
It had an agent-facing address of `192.168.128.7`. Point the agent at
it and set DNS:
```bash
container exec bb-spike-230t-agent sh -c '
ip route replace default via 192.168.128.7 dev eth0
printf "nameserver 1.1.1.1\n" > /etc/resolv.conf
'
```
DNS also needs NAT/forwarding because only TCP 80/443 is redirected:
```bash
container exec bb-spike-230t-mitm sh -c '
iptables -t nat -A POSTROUTING -s 192.168.128.0/24 -o eth0 -j MASQUERADE
iptables -A FORWARD -i eth1 -o eth0 -j ACCEPT
iptables -A FORWARD -i eth0 -o eth1 -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
'
```
An alternative, and likely better, DNS shape is to run a DNS forwarder on
the sidecar's host-only IP and point the agent at it. This was tested
with `dnsmasq`:
```bash
container exec bb-spike-230t-mitm sh -c '
apt-get install -y --no-install-recommends dnsmasq
cat >/tmp/dnsmasq.conf <<EOF
no-daemon
listen-address=192.168.128.7
bind-interfaces
server=1.1.1.1
log-queries
log-facility=-
EOF
(dnsmasq --conf-file=/tmp/dnsmasq.conf >/tmp/dnsmasq.log 2>&1 &)
sleep 1
ss -lunp | grep :53
'
```
Observed:
```console
UNCONN 0 0 192.168.128.7:53 0.0.0.0:* users:(("dnsmasq",pid=515,fd=4))
```
Point the agent to sidecar DNS:
```bash
container exec bb-spike-230t-agent sh -c '
printf "nameserver 192.168.128.7\n" > /etc/resolv.conf
nslookup pypi.org
'
```
Observed:
```console
Server: 192.168.128.7
Address: 192.168.128.7:53
Non-authoritative answer:
Name: pypi.org
Address: 151.101.128.223
Name: pypi.org
Address: 151.101.192.223
Name: pypi.org
Address: 151.101.64.223
Name: pypi.org
Address: 151.101.0.223
```
Direct HTTP from the agent worked and mitmproxy logged the request:
```console
$ container exec bb-spike-230t-agent sh -c \
'wget -T 10 -O- http://example.com | head -c 100'
Connecting to example.com (172.66.147.243:80)
<!doctype html><html lang="en"><head><title>Example Domain</title>
```
Mitmproxy log:
```console
192.168.128.5:39742: GET http://example.com/
Host: example.com
User-Agent: Wget
<< 200 OK 559b
```
After switching the agent to sidecar DNS, direct HTTP still hit
mitmproxy:
```console
192.168.128.5:50784: GET http://example.com/
Host: example.com
User-Agent: Wget
<< 200 OK 559b
```
Direct HTTPS from the agent reached mitmproxy but failed certificate
verification, as expected when the client does not trust the mitmproxy
CA:
```console
$ container exec bb-spike-230t-agent sh -c \
'wget -T 10 -O- https://pypi.org/simple/pip/ | head -c 100'
Connecting to pypi.org (151.101.128.223:443)
... certificate verify failed ...
```
Mitmproxy log:
```console
Client TLS handshake failed. The client does not trust the proxy's
certificate for pypi.org (tlsv1 alert unknown ca)
```
With verification disabled, the same direct URL succeeded and mitmproxy
logged the full HTTPS request:
```console
$ container exec bb-spike-230t-agent sh -c \
'wget --no-check-certificate -T 10 -O- https://pypi.org/simple/pip/ | head -c 100'
Connecting to pypi.org (151.101.128.223:443)
<!DOCTYPE html>
<html lang="en">
<head>
<meta name="pypi:repository-version" content="1.4">
```
Mitmproxy log:
```console
192.168.128.5:32802: GET https://pypi.org/simple/pip/
Host: pypi.org
User-Agent: Wget
<< 200 OK 103k
```
After switching the agent to sidecar DNS, direct HTTPS still hit
mitmproxy:
```console
192.168.128.5:50254: GET https://pypi.org/simple/pip/
Host: pypi.org
User-Agent: Wget
<< 200 OK 103k
```
Verdict: transparent mitmproxy mode works in this topology. The bot
agent would still need the egress CA installed, which bot-bottle already
does for explicit proxy mode.
## Answers
### Can the sidecar become the agent network's default gateway?
Not directly through Apple Container's documented CLI. The installed
`container run --help` documents `--network
<name>[,mac=XX:XX:XX:XX:XX:XX][,mtu=VALUE]`; it does not document a
gateway option.
The route can be changed after container start only if the agent has
`CAP_NET_ADMIN`. Without it, `ip route replace default via <sidecar>`
fails with `Operation not permitted`.
### Can Apple Container support sidecar forwarding/NAT/transparent proxying?
Yes. A dual-homed sidecar with `CAP_NET_ADMIN` can enable IP forwarding,
set iptables NAT/forwarding rules, and route agent traffic out through
the NAT network.
Transparent mitmproxy interception also works with `PREROUTING`
redirects to `mitmdump --mode transparent`.
### What capabilities/custom image are required?
At minimum:
- sidecar needs `CAP_NET_ADMIN`;
- sidecar image needs `iptables`/`iproute2` or equivalent nftables
tooling;
- sidecar should run a DNS listener on its host-only IP, or otherwise
provide a controlled resolver path for the agent;
- sidecar launcher needs a transparent mode variant;
- agent route must be changed to the sidecar's host-only IP;
- agent DNS should point to the sidecar DNS listener;
- agent must trust the sidecar CA for HTTPS interception.
The tested agent route mutation required agent `CAP_NET_ADMIN`, which
should not be accepted as the final design without a privilege-dropping
init/shim story.
### Can host-level `pf` or vmnet rules replace agent route mutation?
Not tested. The successful transparent paths did not use host `pf`;
they used container-local routing and iptables. Host-level `pf` remains
a possible escape hatch if Apple Container cannot set a custom gateway
and we reject agent `CAP_NET_ADMIN`.
### Can existing route policy and DLP semantics be preserved?
Likely, but not fully validated in this spike. Mitmproxy transparent
mode produced normal HTTP flows with correct `Host` values for both
HTTP and HTTPS. The existing `egress_addon.py` hooks should still see
`flow.request.pretty_host`, method, path, headers, and response bodies.
But the current sidecar entrypoint only starts `mitmdump` in regular
explicit-proxy mode. A real implementation must add a transparent mode
launcher and then run the existing egress addon test suite against
transparent flows.
## Recommendation
Do not switch `macos-container` to transparent egress yet, but keep it
as a plausible implementation path.
The next implementation spike should focus on removing the agent
`CAP_NET_ADMIN` requirement. Acceptable options:
- find or add an Apple Container-supported default-gateway setting;
- start the agent through a tiny root init that sets route/DNS, drops
capabilities, and then execs the agent as the normal user;
- include a sidecar DNS service and set the agent resolver to the
sidecar's host-only IP as part of that init/setup path;
- avoid routing mutation by using host/vmnet-level packet redirection;
- explicitly decide that route mutation is only a convenience layer and
keep explicit proxy env vars for v1.
Bluntly: transparent egress is feasible, but not production-ready until
the agent route can be controlled without leaving network-admin power in
the agent runtime.