9b929d0684
_log_request and _log_response wrote headers and bodies to stderr verbatim. _log_request also included the sidecar-injected upstream Authorization value, exposing live bearer tokens on every allowed request under LOG_FULL. Apply redact_tokens to all header values and bodies in both log functions; exclude the authorization header from _log_request entirely since its value is always a live sidecar-injected credential by the time _log_request runs. Closes #257
86 lines
4.2 KiB
Markdown
86 lines
4.2 KiB
Markdown
# PRD prd-new: LOG_FULL egress logging credential redaction
|
|
|
|
- **Status:** Draft
|
|
- **Author:** claude
|
|
- **Created:** 2026-06-25
|
|
- **Issue:** #257
|
|
|
|
## Summary
|
|
|
|
The `LOG_FULL` egress logging path (`_log_request` and `_log_response` in `egress_addon.py`) writes request/response headers and bodies to stderr without redaction and includes the sidecar-injected upstream `Authorization` header verbatim. This PR applies `redact_tokens` to header values and bodies in both log functions and strips the injected `Authorization` header from request logs entirely.
|
|
|
|
## Problem
|
|
|
|
`LOG_FULL` (log level 2) is intended for debugging egress traffic. When active it calls `_log_request` and `_log_response`. Both functions have two related bugs:
|
|
|
|
1. **Injected `Authorization` header exposure.** `_log_request` is called *after* the sidecar injects upstream credentials (`flow.request.headers["authorization"] = decision.inject_authorization`). The full header dict — including the live credential — is serialized to stderr. Any log collector that ingests the egress container's stderr will receive the upstream bearer token in plaintext.
|
|
|
|
2. **Unredacted bodies and header values.** Neither `_log_request` nor `_log_response` passes body or header values through `redact_tokens`. By contrast, `_req_ctx` (used for block/warn events) already calls `redact_tokens` on path and host. Any provisioned secret or recognized token pattern that appears in a request body, response body, or non-Authorization header value will be logged verbatim under `LOG_FULL`.
|
|
|
|
These two bugs compose: an agent that enables `LOG_FULL` and simultaneously triggers a request that carries a known token gains a write path from credentials → egress logs.
|
|
|
|
## Goals / Success Criteria
|
|
|
|
- `_log_request` never logs the `authorization` header in any form.
|
|
- `_log_request` applies `redact_tokens(value, env=os.environ)` to every other header value before serializing.
|
|
- `_log_request` applies `redact_tokens(body, env=os.environ)` to the request body before logging.
|
|
- `_log_response` applies `redact_tokens(value, env=os.environ)` to every response header value before logging.
|
|
- `_log_response` applies `redact_tokens(body, env=os.environ)` to the response body before logging.
|
|
- Unit tests cover each of the five cases above.
|
|
|
|
## Non-goals
|
|
|
|
- Redacting host or path in the full-log path (already covered by `_req_ctx` for block/warn events; `_log_request` already calls `redact_tokens` on host and path).
|
|
- Suppressing `LOG_FULL` or adding a new log level.
|
|
- Changing the outbound DLP scan logic.
|
|
|
|
## Design
|
|
|
|
### `_log_request`
|
|
|
|
```python
|
|
def _log_request(self, flow: http.HTTPFlow) -> None:
|
|
headers = {
|
|
k: redact_tokens(v, env=os.environ)
|
|
for k, v in flow.request.headers.items()
|
|
if k.lower() != "authorization"
|
|
}
|
|
body = redact_tokens(flow.request.get_text(strict=False) or "", env=os.environ)
|
|
sys.stderr.write(
|
|
json.dumps({
|
|
"event": "egress_request",
|
|
"host": redact_tokens(flow.request.pretty_host, env=os.environ),
|
|
"method": flow.request.method,
|
|
"path": redact_tokens(flow.request.path, env=os.environ),
|
|
"headers": headers,
|
|
"body": body,
|
|
})
|
|
+ "\n"
|
|
)
|
|
```
|
|
|
|
The `authorization` key is excluded because by the time `_log_request` is called the sidecar has already injected the upstream credential (`decision.inject_authorization`). Logging it would write a live bearer token to stderr on every allowed request. There is no safe subset to log — the value is always a live credential or empty.
|
|
|
|
### `_log_response`
|
|
|
|
```python
|
|
def _log_response(self, flow: http.HTTPFlow) -> None:
|
|
headers = {
|
|
k: redact_tokens(v, env=os.environ)
|
|
for k, v in flow.response.headers.items()
|
|
}
|
|
body = redact_tokens(flow.response.get_text(strict=False) or "", env=os.environ)
|
|
sys.stderr.write(
|
|
json.dumps({
|
|
"event": "egress_response",
|
|
"host": flow.request.pretty_host,
|
|
"status": flow.response.status_code,
|
|
"headers": headers,
|
|
"body": body,
|
|
})
|
|
+ "\n"
|
|
)
|
|
```
|
|
|
|
Response headers don't carry injected credentials, so no header name is suppressed — only the values are scrubbed by `redact_tokens`.
|