Research doc: close open questions with decisions from review — hard cutover on path_allowlist, drop glob (regex sufficient), stick with Gateway API OR semantics for headers, case-insensitive method names. PRD 0053: adopt Gateway API HTTPRoute match vocabulary (paths, methods, headers) as the route schema replacement for path_allowlist. Add MatchEntry / PathMatch / HeaderMatch types to EgressRoute design; cite the route matching research doc; fold match restructure into chunk 1 alongside the dlp block.
15 KiB
YAML route matching formats: paths, headers, and methods
Question
Bot-bottle's egress manifest currently supports exact-host matching and
a flat list of path prefixes (path_allowlist). As the DLP work (PRD 0053)
and future route hardening evolve, we may want more expressive matching:
glob-style path patterns (/api/*/data), header predicates (Content-Type,
Accept), and per-method rules (GET allowed, POST blocked). What established
YAML-based formats exist for declaring this kind of route matching, and
which design choices should bot-bottle adopt?
Summary
Four formats stand out as well-designed, widely deployed references:
Kubernetes Gateway API HTTPRoute, Envoy RouteConfiguration,
AWS ALB listener rules, and Traefik dynamic routing. A fifth,
Istio VirtualService, is worth noting but is largely superseded by
Gateway API for new designs.
Recommendation for bot-bottle: adopt the Gateway API HTTPRoute
match vocabulary as a direct model. It is the most carefully designed of
the four, has a published spec, handles all three requirements cleanly, and
its match object nests naturally into a YAML route block alongside
bot-bottle's existing host, path_allowlist, and auth fields.
Envoy's format is more powerful but far more verbose and harder to
validate by hand; ALB rules use a flat predicate list that does not
compose well; Traefik uses string expressions rather than structured YAML.
Current bot-bottle route schema
egress:
routes:
- host: api.github.com
path_allowlist:
- /repos/myorg/
auth:
scheme: Bearer
token_ref: EGRESS_TOKEN_0
Matching today: exact host + path-prefix list. No method or header awareness.
Format 1: Kubernetes Gateway API HTTPRoute
Spec: gateway.networking.k8s.io/v1 Maturity: GA (v1.0+, 2023). Backed by SIG Network; shipping in GKE, EKS, AKS, Istio, Envoy Gateway, Cilium, Traefik v3.
Match object
rules:
- matches:
- path:
type: Exact # Exact | PathPrefix | RegularExpression
value: /api/v1/data
headers:
- name: Content-Type
type: Exact # Exact | RegularExpression
value: application/json
queryParams:
- name: version
type: Exact
value: "2"
method: GET # GET | POST | PUT | DELETE | PATCH | …
A matches entry is a logical AND across all predicates within it. Multiple
entries in the matches list are ORed: the rule fires if any entry matches.
Path matching
type |
Semantics |
|---|---|
Exact |
Full path must equal value (no trailing-slash equivalence) |
PathPrefix |
Path must start with value; /api matches /api/v1 but not /apiv1 |
RegularExpression |
RE2-syntax regex; implementations may differ on anchoring |
Glob-style paths (/api/*/data): Gateway API does not define a glob
type. The intent is to use RegularExpression for that case:
/api/[^/]+/data replaces /api/*/data. This is unambiguous and widely
understood.
Header matching
headers:
- name: Content-Type
type: Exact
value: application/json
- name: X-Request-Id
type: RegularExpression
value: "[0-9a-f]{8}-.*"
All headers entries must match (AND semantics). Missing a header is a
non-match (no "header absent" type in v1; implementations add it as an
extension).
Method matching
method: GET
Single method per match entry. To allow GET and POST, use two match entries (OR semantics at the matches level):
matches:
- path:
type: PathPrefix
value: /api/v1
method: GET
- path:
type: PathPrefix
value: /api/v1
method: POST
Strengths / weaknesses
Strengths: spec-backed, implementation-tested, composable AND/OR
semantics, explicit about what is not supported (no glob, no header-absent),
good field naming (type + value pattern is consistent throughout).
Weaknesses: verbosity when expressing OR across methods; regex is the only path wildcard mechanism; no body matching.
Format 2: Envoy RouteConfiguration
Spec: envoy.config.route.v3.RouteMatch Maturity: Widely deployed (Istio data plane, AWS App Mesh, solo.io Gloo). Defined in protobuf; YAML is the human-readable rendering.
Match object
match:
path: /exact/path # exact match
# OR
prefix: /api/ # prefix match
# OR
safe_regex:
google_re2: {}
regex: "/api/v[0-9]+/.*"
# OR
path_separated_prefix: /api/v1 # prefix with segment boundary enforcement
headers:
- name: content-type
string_match:
exact: application/json
# OR
prefix: text/
# OR
safe_regex:
google_re2: {}
regex: "application/(json|xml)"
invert_match: false # negate the predicate
- name: x-custom-header
present_match: true # just check presence
query_parameters:
- name: version
string_match:
exact: "2"
Method is matched via a pseudo-header:
headers:
- name: :method
string_match:
exact: GET
Multiple methods require an OR combinator (or_match), available in
Envoy v1.21+:
headers:
- name: :method
or_match:
value_matchers:
- string_match:
exact: GET
- string_match:
exact: POST
Path matching
| Field | Semantics |
|---|---|
prefix |
Path starts with value (any suffix allowed) |
path |
Exact match |
safe_regex |
RE2 regex (Google RE2 safety guarantees) |
path_separated_prefix |
Like prefix but only matches at segment boundaries (/api/v1 won't match /api/v10) |
connect_matcher |
CONNECT method only |
Glob (/api/*/data): use safe_regex: /api/[^/]+/data.
Strengths / weaknesses
Strengths: most expressive format surveyed; invert_match, present_match,
OR combinators, pseudo-header method matching; handles every edge case.
Weaknesses: very verbose; protobuf-origin field names are not
self-evident; or_match nesting is awkward; hard to validate in a
lightweight schema check; not appropriate as a user-facing YAML format
without a wrapping DSL.
Format 3: AWS ALB Listener Rules
Spec: AWS Elastic Load Balancing API — Conditions
Maturity: GA, widely used in AWS infrastructure-as-code (CloudFormation,
Terraform aws_lb_listener_rule).
Match object (Terraform / CloudFormation rendering)
conditions:
- field: path-pattern
path_pattern_config:
values:
- /api/*
- /health
- field: http-header
http_header_config:
http_header_name: Content-Type
values:
- application/json
- application/x-www-form-urlencoded
- field: http-request-method
http_request_method_config:
values:
- GET
- POST
- field: host-header
host_header_config:
values:
- "*.example.com"
- api.example.com
- field: query-string
query_string_config:
values:
- key: version
value: "2"
All conditions in a rule are ANDed. Multiple values within a single condition are ORed. Up to 5 conditions per rule.
Path matching
ALB natively supports glob patterns in path-pattern:
*matches any sequence of characters (including/).?matches any single character.
This is the only surveyed format with first-class glob support. /api/*/data
is valid and unambiguous. No regex support.
Header matching
Header conditions match against the header value. Multiple values are ORed.
The header name is fixed per condition block; to AND two header predicates,
add two separate http-header conditions. Case-insensitive matching on
values.
Method matching
- field: http-request-method
http_request_method_config:
values:
- GET
- POST
Multiple values are ORed (GET or POST). Up to 40 methods per rule.
Strengths / weaknesses
Strengths: first-class glob path matching (the only format surveyed
with * and ?); multi-value OR within a condition block is concise for
the common case; method matching is a flat list, easy to write.
Weaknesses: maximum 5 conditions per rule; no regex; no header-absent
predicate; no request-body matching; the field + *_config naming is
awkward (the field name is a string enum that determines which sibling key
is relevant — a schema-validation anti-pattern); tied to AWS semantics
(target groups, priority integers).
Format 4: Traefik Dynamic Routing
Spec: Traefik Router Rule syntax
Maturity: GA, widely deployed in Kubernetes (IngressRoute CRD) and
Docker-Compose setups. Traefik v3 aligns with Gateway API for Kubernetes
routes but keeps its own expression syntax for the rule field.
Match expression (string, embedded in YAML)
http:
routers:
my-router:
rule: >
Host(`api.example.com`) &&
PathPrefix(`/api/v1`) &&
Method(`GET`, `POST`) &&
Header(`Content-Type`, `application/json`)
service: my-service
&& = AND, || = OR. Parentheses for grouping.
Available matchers:
| Matcher | Example |
|---|---|
Host |
Host("api.example.com") |
HostRegexp |
HostRegexp(".*\.example\.com") |
Path |
Path("/exact/path") |
PathPrefix |
PathPrefix("/api/v1") |
PathRegexp |
PathRegexp("/api/v[0-9]+/.*") |
Method |
Method("GET", "POST") |
Header |
Header("Content-Type", "application/json") |
HeaderRegexp |
HeaderRegexp("Accept", "application/.*") |
Query |
Query("version", "2") |
QueryRegexp |
QueryRegexp("id", "[0-9]+") |
ClientIP |
ClientIP("10.0.0.0/8") |
Glob paths: not supported directly. Use PathRegexp instead.
Strengths / weaknesses
Strengths: the most expressive and concise format for complex boolean
combinations (AND/OR/NOT in a single line); Method("GET", "POST") is
the cleanest multi-method syntax surveyed; full regex support on every
field; Traefik v3 supports this inside Kubernetes CRDs.
Weaknesses: the rule is a string embedded in YAML, not a structured object — it cannot be validated with JSON Schema and is harder to generate programmatically; no structured round-trip; no glob, only regex.
Comparison table
| Gateway API | Envoy | AWS ALB | Traefik | |
|---|---|---|---|---|
| Path: exact | ✅ Exact |
✅ path |
✅ exact value | ✅ Path() |
| Path: prefix | ✅ PathPrefix |
✅ prefix / path_separated_prefix |
✅ (via glob /*) |
✅ PathPrefix() |
Path: glob (/a/*/b) |
❌ (use regex) | ❌ (use regex) | ✅ native | ❌ (use regex) |
| Path: regex | ✅ RegularExpression |
✅ safe_regex |
❌ | ✅ PathRegexp() |
| Header: exact | ✅ | ✅ | ✅ | ✅ |
| Header: regex | ✅ | ✅ | ❌ | ✅ |
| Header: absent | ❌ (extension) | ✅ present_match: false |
❌ | ❌ |
| Method matching | ✅ (one per entry; OR via multiple entries) | ✅ (via :method pseudo-header) |
✅ (list = OR) | ✅ Method("GET","POST") |
| AND semantics | predicates within one matches entry |
all conditions | all conditions entries |
&& operator |
| OR semantics | multiple matches entries |
or_match combinator |
multiple values in one condition | || operator |
| Schema-validatable | ✅ (CRD/JSON Schema) | ✅ (protobuf) | ✅ (CloudFormation schema) | ❌ (embedded string) |
| Human-writable | ✅ | ⚠️ verbose | ✅ | ✅ |
| Generatable | ✅ | ✅ | ✅ | ⚠️ (string concat) |
Design choices worth adopting
1. Match object as a structured peer to host
Gateway API's separation of concerns maps well onto bot-bottle's existing
schema. Instead of a flat path_allowlist, a match block nests all
predicates:
egress:
routes:
- host: api.github.com
match:
paths:
- type: prefix # exact | prefix | glob | regex
value: /repos/myorg/
headers:
- name: Content-Type
value: application/json
methods: [GET, POST]
auth:
scheme: Bearer
token_ref: EGRESS_TOKEN_0
All predicates within match are ANDed. A list of paths entries is
ORed (first match wins — same as the current path_allowlist semantics).
2. Path type enum (exact | prefix | regex)
Use three named types rather than inferring from the value's syntax. This
avoids the ambiguity that plagues .gitignore and nginx location patterns
where the same string can mean different things depending on leading characters.
prefix: mirrors currentpath_allowlistsemantics.regex: RE2 for wildcard and advanced cases. Reject at load time if the pattern fails to compile. Covers every case glob would handle —/api/[^/]+/datais the/api/*/dataequivalent.
Glob-style syntax is not included: it adds a third path-matching language on top of prefix and regex without meaningful operator benefit, since regex is already required for any non-trivial wildcard.
3. Header matching as a list of {name, value, type} objects
Mirrors Gateway API exactly. ALL headers must match (AND). type defaults
to exact; regex is available. No header-absent for now (adds complexity,
low immediate need).
headers:
- name: Content-Type
value: application/json # type: exact (default)
- name: X-Internal-Key
value: "dev-[0-9]+"
type: regex
4. Method list as a flat enum list
Adopts ALB's conciseness. An empty or absent methods list means all
methods are permitted. Values are uppercased HTTP method names.
methods: [GET, HEAD]
5. Multiple match entries per route: OR semantics at the route level
If a route needs GET on one path and POST on a different path, use a
matches (plural) list where entries are ORed:
routes:
- host: api.example.com
matches:
- paths: [{type: prefix, value: /read}]
methods: [GET, HEAD]
- paths: [{type: exact, value: /write}]
methods: [POST, PUT]
This mirrors Gateway API's top-level OR; each entry is an AND of its predicates.
Decisions
The open questions raised during research were resolved in PR #196 review:
-
Backward compatibility: Hard cutover. The new
matchesstructure replacespath_allowlistentirely with no compatibility shim and no fallback parsing for the old format. Manifests usingpath_allowlistmust be migrated. -
Glob support: Dropped. Not strictly necessary —
regexcovers every case glob would handle. Fewer path-matching languages to document and validate. -
Header value OR: Stick with Gateway API. OR across header values requires a separate entry in the
matcheslist, not multiple values inside oneheadersblock. -
Method name case: Case-insensitive at parse time.
get,GET, andGetare all accepted and normalised to uppercase internally.