docs: research on YAML route matching formats (paths, headers, methods)
This commit is contained in:
@@ -0,0 +1,486 @@
|
||||
# YAML route matching formats: paths, headers, and methods
|
||||
|
||||
## Question
|
||||
|
||||
Bot-bottle's egress manifest currently supports exact-host matching and
|
||||
a flat list of path prefixes (`path_allowlist`). As the DLP work (PRD 0053)
|
||||
and future route hardening evolve, we may want more expressive matching:
|
||||
glob-style path patterns (`/api/*/data`), header predicates (Content-Type,
|
||||
Accept), and per-method rules (GET allowed, POST blocked). What established
|
||||
YAML-based formats exist for declaring this kind of route matching, and
|
||||
which design choices should bot-bottle adopt?
|
||||
|
||||
## Summary
|
||||
|
||||
Four formats stand out as well-designed, widely deployed references:
|
||||
**Kubernetes Gateway API `HTTPRoute`**, **Envoy `RouteConfiguration`**,
|
||||
**AWS ALB listener rules**, and **Traefik dynamic routing**. A fifth,
|
||||
Istio `VirtualService`, is worth noting but is largely superseded by
|
||||
Gateway API for new designs.
|
||||
|
||||
**Recommendation for bot-bottle:** adopt the Gateway API `HTTPRoute`
|
||||
match vocabulary as a direct model. It is the most carefully designed of
|
||||
the four, has a published spec, handles all three requirements cleanly, and
|
||||
its match object nests naturally into a YAML route block alongside
|
||||
bot-bottle's existing `host`, `path_allowlist`, and `auth` fields.
|
||||
Envoy's format is more powerful but far more verbose and harder to
|
||||
validate by hand; ALB rules use a flat predicate list that does not
|
||||
compose well; Traefik uses string expressions rather than structured YAML.
|
||||
|
||||
## Current bot-bottle route schema
|
||||
|
||||
```yaml
|
||||
egress:
|
||||
routes:
|
||||
- host: api.github.com
|
||||
path_allowlist:
|
||||
- /repos/myorg/
|
||||
auth:
|
||||
scheme: Bearer
|
||||
token_ref: EGRESS_TOKEN_0
|
||||
```
|
||||
|
||||
Matching today: exact host + path-prefix list. No method or header
|
||||
awareness.
|
||||
|
||||
---
|
||||
|
||||
## Format 1: Kubernetes Gateway API `HTTPRoute`
|
||||
|
||||
**Spec:** [gateway.networking.k8s.io/v1](https://gateway-api.sigs.k8s.io/reference/spec/#gateway.networking.k8s.io/v1.HTTPRouteMatch)
|
||||
**Maturity:** GA (v1.0+, 2023). Backed by SIG Network; shipping in GKE,
|
||||
EKS, AKS, Istio, Envoy Gateway, Cilium, Traefik v3.
|
||||
|
||||
### Match object
|
||||
|
||||
```yaml
|
||||
rules:
|
||||
- matches:
|
||||
- path:
|
||||
type: Exact # Exact | PathPrefix | RegularExpression
|
||||
value: /api/v1/data
|
||||
headers:
|
||||
- name: Content-Type
|
||||
type: Exact # Exact | RegularExpression
|
||||
value: application/json
|
||||
queryParams:
|
||||
- name: version
|
||||
type: Exact
|
||||
value: "2"
|
||||
method: GET # GET | POST | PUT | DELETE | PATCH | …
|
||||
```
|
||||
|
||||
A `matches` entry is a logical AND across all predicates within it. Multiple
|
||||
entries in the `matches` list are ORed: the rule fires if any entry matches.
|
||||
|
||||
### Path matching
|
||||
|
||||
| `type` | Semantics |
|
||||
|--------|-----------|
|
||||
| `Exact` | Full path must equal `value` (no trailing-slash equivalence) |
|
||||
| `PathPrefix` | Path must start with `value`; `/api` matches `/api/v1` but not `/apiv1` |
|
||||
| `RegularExpression` | RE2-syntax regex; implementations may differ on anchoring |
|
||||
|
||||
**Glob-style paths (`/api/*/data`):** Gateway API does not define a glob
|
||||
type. The intent is to use `RegularExpression` for that case:
|
||||
`/api/[^/]+/data` replaces `/api/*/data`. This is unambiguous and widely
|
||||
understood.
|
||||
|
||||
### Header matching
|
||||
|
||||
```yaml
|
||||
headers:
|
||||
- name: Content-Type
|
||||
type: Exact
|
||||
value: application/json
|
||||
- name: X-Request-Id
|
||||
type: RegularExpression
|
||||
value: "[0-9a-f]{8}-.*"
|
||||
```
|
||||
|
||||
All `headers` entries must match (AND semantics). Missing a header is a
|
||||
non-match (no "header absent" type in v1; implementations add it as an
|
||||
extension).
|
||||
|
||||
### Method matching
|
||||
|
||||
```yaml
|
||||
method: GET
|
||||
```
|
||||
|
||||
Single method per match entry. To allow GET and POST, use two match
|
||||
entries (OR semantics at the matches level):
|
||||
|
||||
```yaml
|
||||
matches:
|
||||
- path:
|
||||
type: PathPrefix
|
||||
value: /api/v1
|
||||
method: GET
|
||||
- path:
|
||||
type: PathPrefix
|
||||
value: /api/v1
|
||||
method: POST
|
||||
```
|
||||
|
||||
### Strengths / weaknesses
|
||||
|
||||
**Strengths:** spec-backed, implementation-tested, composable AND/OR
|
||||
semantics, explicit about what is not supported (no glob, no header-absent),
|
||||
good field naming (`type` + `value` pattern is consistent throughout).
|
||||
|
||||
**Weaknesses:** verbosity when expressing OR across methods; regex is
|
||||
the only path wildcard mechanism; no body matching.
|
||||
|
||||
---
|
||||
|
||||
## Format 2: Envoy `RouteConfiguration`
|
||||
|
||||
**Spec:** [envoy.config.route.v3.RouteMatch](https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/route/v3/route_components.proto#config-route-v3-routematch)
|
||||
**Maturity:** Widely deployed (Istio data plane, AWS App Mesh, solo.io
|
||||
Gloo). Defined in protobuf; YAML is the human-readable rendering.
|
||||
|
||||
### Match object
|
||||
|
||||
```yaml
|
||||
match:
|
||||
path: /exact/path # exact match
|
||||
# OR
|
||||
prefix: /api/ # prefix match
|
||||
# OR
|
||||
safe_regex:
|
||||
google_re2: {}
|
||||
regex: "/api/v[0-9]+/.*"
|
||||
# OR
|
||||
path_separated_prefix: /api/v1 # prefix with segment boundary enforcement
|
||||
|
||||
headers:
|
||||
- name: content-type
|
||||
string_match:
|
||||
exact: application/json
|
||||
# OR
|
||||
prefix: text/
|
||||
# OR
|
||||
safe_regex:
|
||||
google_re2: {}
|
||||
regex: "application/(json|xml)"
|
||||
invert_match: false # negate the predicate
|
||||
|
||||
- name: x-custom-header
|
||||
present_match: true # just check presence
|
||||
|
||||
query_parameters:
|
||||
- name: version
|
||||
string_match:
|
||||
exact: "2"
|
||||
```
|
||||
|
||||
Method is matched via a pseudo-header:
|
||||
|
||||
```yaml
|
||||
headers:
|
||||
- name: :method
|
||||
string_match:
|
||||
exact: GET
|
||||
```
|
||||
|
||||
Multiple methods require an OR combinator (`or_match`), available in
|
||||
Envoy v1.21+:
|
||||
|
||||
```yaml
|
||||
headers:
|
||||
- name: :method
|
||||
or_match:
|
||||
value_matchers:
|
||||
- string_match:
|
||||
exact: GET
|
||||
- string_match:
|
||||
exact: POST
|
||||
```
|
||||
|
||||
### Path matching
|
||||
|
||||
| Field | Semantics |
|
||||
|-------|-----------|
|
||||
| `prefix` | Path starts with value (any suffix allowed) |
|
||||
| `path` | Exact match |
|
||||
| `safe_regex` | RE2 regex (Google RE2 safety guarantees) |
|
||||
| `path_separated_prefix` | Like `prefix` but only matches at segment boundaries (`/api/v1` won't match `/api/v10`) |
|
||||
| `connect_matcher` | CONNECT method only |
|
||||
|
||||
Glob (`/api/*/data`): use `safe_regex`: `/api/[^/]+/data`.
|
||||
|
||||
### Strengths / weaknesses
|
||||
|
||||
**Strengths:** most expressive format surveyed; `invert_match`, `present_match`,
|
||||
OR combinators, pseudo-header method matching; handles every edge case.
|
||||
|
||||
**Weaknesses:** very verbose; protobuf-origin field names are not
|
||||
self-evident; `or_match` nesting is awkward; hard to validate in a
|
||||
lightweight schema check; not appropriate as a user-facing YAML format
|
||||
without a wrapping DSL.
|
||||
|
||||
---
|
||||
|
||||
## Format 3: AWS ALB Listener Rules
|
||||
|
||||
**Spec:** [AWS Elastic Load Balancing API — Conditions](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-listeners.html#rule-condition-types)
|
||||
**Maturity:** GA, widely used in AWS infrastructure-as-code (CloudFormation,
|
||||
Terraform `aws_lb_listener_rule`).
|
||||
|
||||
### Match object (Terraform / CloudFormation rendering)
|
||||
|
||||
```yaml
|
||||
conditions:
|
||||
- field: path-pattern
|
||||
path_pattern_config:
|
||||
values:
|
||||
- /api/*
|
||||
- /health
|
||||
- field: http-header
|
||||
http_header_config:
|
||||
http_header_name: Content-Type
|
||||
values:
|
||||
- application/json
|
||||
- application/x-www-form-urlencoded
|
||||
- field: http-request-method
|
||||
http_request_method_config:
|
||||
values:
|
||||
- GET
|
||||
- POST
|
||||
- field: host-header
|
||||
host_header_config:
|
||||
values:
|
||||
- "*.example.com"
|
||||
- api.example.com
|
||||
- field: query-string
|
||||
query_string_config:
|
||||
values:
|
||||
- key: version
|
||||
value: "2"
|
||||
```
|
||||
|
||||
All conditions in a rule are ANDed. Multiple values within a single
|
||||
condition are ORed. Up to 5 conditions per rule.
|
||||
|
||||
### Path matching
|
||||
|
||||
ALB natively supports glob patterns in `path-pattern`:
|
||||
- `*` matches any sequence of characters (including `/`).
|
||||
- `?` matches any single character.
|
||||
|
||||
This is the only surveyed format with first-class glob support. `/api/*/data`
|
||||
is valid and unambiguous. No regex support.
|
||||
|
||||
### Header matching
|
||||
|
||||
Header conditions match against the header value. Multiple values are ORed.
|
||||
The header name is fixed per condition block; to AND two header predicates,
|
||||
add two separate `http-header` conditions. Case-insensitive matching on
|
||||
values.
|
||||
|
||||
### Method matching
|
||||
|
||||
```yaml
|
||||
- field: http-request-method
|
||||
http_request_method_config:
|
||||
values:
|
||||
- GET
|
||||
- POST
|
||||
```
|
||||
|
||||
Multiple values are ORed (GET or POST). Up to 40 methods per rule.
|
||||
|
||||
### Strengths / weaknesses
|
||||
|
||||
**Strengths:** first-class glob path matching (the only format surveyed
|
||||
with `*` and `?`); multi-value OR within a condition block is concise for
|
||||
the common case; method matching is a flat list, easy to write.
|
||||
|
||||
**Weaknesses:** maximum 5 conditions per rule; no regex; no header-absent
|
||||
predicate; no request-body matching; the `field` + `*_config` naming is
|
||||
awkward (the field name is a string enum that determines which sibling key
|
||||
is relevant — a schema-validation anti-pattern); tied to AWS semantics
|
||||
(target groups, priority integers).
|
||||
|
||||
---
|
||||
|
||||
## Format 4: Traefik Dynamic Routing
|
||||
|
||||
**Spec:** [Traefik Router Rule syntax](https://doc.traefik.io/traefik/routing/routers/#rule)
|
||||
**Maturity:** GA, widely deployed in Kubernetes (IngressRoute CRD) and
|
||||
Docker-Compose setups. Traefik v3 aligns with Gateway API for Kubernetes
|
||||
routes but keeps its own expression syntax for the `rule` field.
|
||||
|
||||
### Match expression (string, embedded in YAML)
|
||||
|
||||
```yaml
|
||||
http:
|
||||
routers:
|
||||
my-router:
|
||||
rule: >
|
||||
Host(`api.example.com`) &&
|
||||
PathPrefix(`/api/v1`) &&
|
||||
Method(`GET`, `POST`) &&
|
||||
Header(`Content-Type`, `application/json`)
|
||||
service: my-service
|
||||
```
|
||||
|
||||
`&&` = AND, `||` = OR. Parentheses for grouping.
|
||||
|
||||
Available matchers:
|
||||
|
||||
| Matcher | Example |
|
||||
|---------|---------|
|
||||
| `Host` | `Host("api.example.com")` |
|
||||
| `HostRegexp` | `HostRegexp(".*\.example\.com")` |
|
||||
| `Path` | `Path("/exact/path")` |
|
||||
| `PathPrefix` | `PathPrefix("/api/v1")` |
|
||||
| `PathRegexp` | `PathRegexp("/api/v[0-9]+/.*")` |
|
||||
| `Method` | `Method("GET", "POST")` |
|
||||
| `Header` | `Header("Content-Type", "application/json")` |
|
||||
| `HeaderRegexp` | `HeaderRegexp("Accept", "application/.*")` |
|
||||
| `Query` | `Query("version", "2")` |
|
||||
| `QueryRegexp` | `QueryRegexp("id", "[0-9]+")` |
|
||||
| `ClientIP` | `ClientIP("10.0.0.0/8")` |
|
||||
|
||||
Glob paths: not supported directly. Use `PathRegexp` instead.
|
||||
|
||||
### Strengths / weaknesses
|
||||
|
||||
**Strengths:** the most expressive and concise format for complex boolean
|
||||
combinations (AND/OR/NOT in a single line); `Method("GET", "POST")` is
|
||||
the cleanest multi-method syntax surveyed; full regex support on every
|
||||
field; Traefik v3 supports this inside Kubernetes CRDs.
|
||||
|
||||
**Weaknesses:** the rule is a *string* embedded in YAML, not a structured
|
||||
object — it cannot be validated with JSON Schema and is harder to generate
|
||||
programmatically; no structured round-trip; no glob, only regex.
|
||||
|
||||
---
|
||||
|
||||
## Comparison table
|
||||
|
||||
| | Gateway API | Envoy | AWS ALB | Traefik |
|
||||
|---|---|---|---|---|
|
||||
| **Path: exact** | ✅ `Exact` | ✅ `path` | ✅ exact value | ✅ `Path()` |
|
||||
| **Path: prefix** | ✅ `PathPrefix` | ✅ `prefix` / `path_separated_prefix` | ✅ (via glob `/*`) | ✅ `PathPrefix()` |
|
||||
| **Path: glob** (`/a/*/b`) | ❌ (use regex) | ❌ (use regex) | ✅ native | ❌ (use regex) |
|
||||
| **Path: regex** | ✅ `RegularExpression` | ✅ `safe_regex` | ❌ | ✅ `PathRegexp()` |
|
||||
| **Header: exact** | ✅ | ✅ | ✅ | ✅ |
|
||||
| **Header: regex** | ✅ | ✅ | ❌ | ✅ |
|
||||
| **Header: absent** | ❌ (extension) | ✅ `present_match: false` | ❌ | ❌ |
|
||||
| **Method matching** | ✅ (one per entry; OR via multiple entries) | ✅ (via `:method` pseudo-header) | ✅ (list = OR) | ✅ `Method("GET","POST")` |
|
||||
| **AND semantics** | predicates within one `matches` entry | all conditions | all `conditions` entries | `&&` operator |
|
||||
| **OR semantics** | multiple `matches` entries | `or_match` combinator | multiple values in one condition | `\|\|` operator |
|
||||
| **Schema-validatable** | ✅ (CRD/JSON Schema) | ✅ (protobuf) | ✅ (CloudFormation schema) | ❌ (embedded string) |
|
||||
| **Human-writable** | ✅ | ⚠️ verbose | ✅ | ✅ |
|
||||
| **Generatable** | ✅ | ✅ | ✅ | ⚠️ (string concat) |
|
||||
|
||||
---
|
||||
|
||||
## Design choices worth adopting
|
||||
|
||||
### 1. Match object as a structured peer to `host`
|
||||
|
||||
Gateway API's separation of concerns maps well onto bot-bottle's existing
|
||||
schema. Instead of a flat `path_allowlist`, a `match` block nests all
|
||||
predicates:
|
||||
|
||||
```yaml
|
||||
egress:
|
||||
routes:
|
||||
- host: api.github.com
|
||||
match:
|
||||
paths:
|
||||
- type: prefix # exact | prefix | glob | regex
|
||||
value: /repos/myorg/
|
||||
headers:
|
||||
- name: Content-Type
|
||||
value: application/json
|
||||
methods: [GET, POST]
|
||||
auth:
|
||||
scheme: Bearer
|
||||
token_ref: EGRESS_TOKEN_0
|
||||
```
|
||||
|
||||
All predicates within `match` are ANDed. A list of `paths` entries is
|
||||
ORed (first match wins — same as the current `path_allowlist` semantics).
|
||||
|
||||
### 2. Path type enum (`exact` | `prefix` | `glob` | `regex`)
|
||||
|
||||
Use four named types rather than inferring from the value's syntax. This
|
||||
avoids the ambiguity that plagues `.gitignore` and `nginx location` patterns
|
||||
where the same string can mean different things depending on leading characters.
|
||||
|
||||
- `prefix`: mirrors current `path_allowlist` semantics.
|
||||
- `glob`: adopts ALB-style `*` (single segment) and `**` (multi-segment,
|
||||
covering `/api/*/data` and `/api/**/data`). Simpler for operators than
|
||||
writing regex.
|
||||
- `regex`: RE2 for advanced cases. Reject at load time if the pattern fails
|
||||
to compile.
|
||||
|
||||
**Glob semantics decision needed:** ALB's `*` matches across `/`; most
|
||||
shell-glob conventions treat `*` as intra-segment and `**` as cross-segment.
|
||||
The shell convention (`*` = no slash, `**` = slash-crossing) is less
|
||||
surprising to operators and avoids accidental over-matching.
|
||||
|
||||
### 3. Header matching as a list of `{name, value, type}` objects
|
||||
|
||||
Mirrors Gateway API exactly. ALL headers must match (AND). `type` defaults
|
||||
to `exact`; `regex` is available. No header-absent for now (adds complexity,
|
||||
low immediate need).
|
||||
|
||||
```yaml
|
||||
headers:
|
||||
- name: Content-Type
|
||||
value: application/json # type: exact (default)
|
||||
- name: X-Internal-Key
|
||||
value: "dev-[0-9]+"
|
||||
type: regex
|
||||
```
|
||||
|
||||
### 4. Method list as a flat enum list
|
||||
|
||||
Adopts ALB's conciseness. An empty or absent `methods` list means all
|
||||
methods are permitted. Values are uppercased HTTP method names.
|
||||
|
||||
```yaml
|
||||
methods: [GET, HEAD]
|
||||
```
|
||||
|
||||
### 5. Multiple `match` entries per route: OR semantics at the route level
|
||||
|
||||
If a route needs GET on one path and POST on a different path, use a
|
||||
`matches` (plural) list where entries are ORed:
|
||||
|
||||
```yaml
|
||||
routes:
|
||||
- host: api.example.com
|
||||
matches:
|
||||
- paths: [{type: prefix, value: /read}]
|
||||
methods: [GET, HEAD]
|
||||
- paths: [{type: exact, value: /write}]
|
||||
methods: [POST, PUT]
|
||||
```
|
||||
|
||||
This mirrors Gateway API's top-level OR; each entry is an AND of its
|
||||
predicates.
|
||||
|
||||
---
|
||||
|
||||
## Open questions
|
||||
|
||||
1. **Backward compatibility:** `path_allowlist` is the current field. If
|
||||
adopting a `match`/`matches` structure, keep `path_allowlist` as a
|
||||
deprecated alias? Or treat this as a breaking manifest version bump?
|
||||
2. **Glob segment semantics:** adopt shell convention (`*` = intra-segment,
|
||||
`**` = cross-segment) or ALB convention (`*` = anything including `/`)?
|
||||
The shell convention is safer; ALB's is simpler.
|
||||
3. **Header value OR:** Gateway API requires a separate match entry to OR
|
||||
header values. ALB allows multiple values in one condition. Which is
|
||||
less surprising for bot-bottle operators? The ALB approach is more
|
||||
concise for the common case (e.g., `Content-Type: [application/json,
|
||||
application/x-www-form-urlencoded]`).
|
||||
4. **Case sensitivity on method names:** normalize to uppercase at parse
|
||||
time (fail on unrecognised values) or case-insensitively?
|
||||
Reference in New Issue
Block a user