# YAML route matching formats: paths, headers, and methods ## Question Bot-bottle's egress manifest currently supports exact-host matching and a flat list of path prefixes (`path_allowlist`). As the DLP work (PRD 0052) and future route hardening evolve, we may want more expressive matching: glob-style path patterns (`/api/*/data`), header predicates (Content-Type, Accept), and per-method rules (GET allowed, POST blocked). What established YAML-based formats exist for declaring this kind of route matching, and which design choices should bot-bottle adopt? ## Summary Four formats stand out as well-designed, widely deployed references: **Kubernetes Gateway API `HTTPRoute`**, **Envoy `RouteConfiguration`**, **AWS ALB listener rules**, and **Traefik dynamic routing**. A fifth, Istio `VirtualService`, is worth noting but is largely superseded by Gateway API for new designs. **Recommendation for bot-bottle:** adopt the Gateway API `HTTPRoute` match vocabulary as a direct model. It is the most carefully designed of the four, has a published spec, handles all three requirements cleanly, and its match object nests naturally into a YAML route block alongside bot-bottle's existing `host`, `path_allowlist`, and `auth` fields. Envoy's format is more powerful but far more verbose and harder to validate by hand; ALB rules use a flat predicate list that does not compose well; Traefik uses string expressions rather than structured YAML. ## Current bot-bottle route schema ```yaml egress: routes: - host: api.github.com path_allowlist: - /repos/myorg/ auth: scheme: Bearer token_ref: EGRESS_TOKEN_0 ``` Matching today: exact host + path-prefix list. No method or header awareness. --- ## Format 1: Kubernetes Gateway API `HTTPRoute` **Spec:** [gateway.networking.k8s.io/v1](https://gateway-api.sigs.k8s.io/reference/spec/#gateway.networking.k8s.io/v1.HTTPRouteMatch) **Maturity:** GA (v1.0+, 2023). Backed by SIG Network; shipping in GKE, EKS, AKS, Istio, Envoy Gateway, Cilium, Traefik v3. ### Match object ```yaml rules: - matches: - path: type: Exact # Exact | PathPrefix | RegularExpression value: /api/v1/data headers: - name: Content-Type type: Exact # Exact | RegularExpression value: application/json queryParams: - name: version type: Exact value: "2" method: GET # GET | POST | PUT | DELETE | PATCH | … ``` A `matches` entry is a logical AND across all predicates within it. Multiple entries in the `matches` list are ORed: the rule fires if any entry matches. ### Path matching | `type` | Semantics | |--------|-----------| | `Exact` | Full path must equal `value` (no trailing-slash equivalence) | | `PathPrefix` | Path must start with `value`; `/api` matches `/api/v1` but not `/apiv1` | | `RegularExpression` | RE2-syntax regex; implementations may differ on anchoring | **Glob-style paths (`/api/*/data`):** Gateway API does not define a glob type. The intent is to use `RegularExpression` for that case: `/api/[^/]+/data` replaces `/api/*/data`. This is unambiguous and widely understood. ### Header matching ```yaml headers: - name: Content-Type type: Exact value: application/json - name: X-Request-Id type: RegularExpression value: "[0-9a-f]{8}-.*" ``` All `headers` entries must match (AND semantics). Missing a header is a non-match (no "header absent" type in v1; implementations add it as an extension). ### Method matching ```yaml method: GET ``` Single method per match entry. To allow GET and POST, use two match entries (OR semantics at the matches level): ```yaml matches: - path: type: PathPrefix value: /api/v1 method: GET - path: type: PathPrefix value: /api/v1 method: POST ``` ### Strengths / weaknesses **Strengths:** spec-backed, implementation-tested, composable AND/OR semantics, explicit about what is not supported (no glob, no header-absent), good field naming (`type` + `value` pattern is consistent throughout). **Weaknesses:** verbosity when expressing OR across methods; regex is the only path wildcard mechanism; no body matching. --- ## Format 2: Envoy `RouteConfiguration` **Spec:** [envoy.config.route.v3.RouteMatch](https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/route/v3/route_components.proto#config-route-v3-routematch) **Maturity:** Widely deployed (Istio data plane, AWS App Mesh, solo.io Gloo). Defined in protobuf; YAML is the human-readable rendering. ### Match object ```yaml match: path: /exact/path # exact match # OR prefix: /api/ # prefix match # OR safe_regex: google_re2: {} regex: "/api/v[0-9]+/.*" # OR path_separated_prefix: /api/v1 # prefix with segment boundary enforcement headers: - name: content-type string_match: exact: application/json # OR prefix: text/ # OR safe_regex: google_re2: {} regex: "application/(json|xml)" invert_match: false # negate the predicate - name: x-custom-header present_match: true # just check presence query_parameters: - name: version string_match: exact: "2" ``` Method is matched via a pseudo-header: ```yaml headers: - name: :method string_match: exact: GET ``` Multiple methods require an OR combinator (`or_match`), available in Envoy v1.21+: ```yaml headers: - name: :method or_match: value_matchers: - string_match: exact: GET - string_match: exact: POST ``` ### Path matching | Field | Semantics | |-------|-----------| | `prefix` | Path starts with value (any suffix allowed) | | `path` | Exact match | | `safe_regex` | RE2 regex (Google RE2 safety guarantees) | | `path_separated_prefix` | Like `prefix` but only matches at segment boundaries (`/api/v1` won't match `/api/v10`) | | `connect_matcher` | CONNECT method only | Glob (`/api/*/data`): use `safe_regex`: `/api/[^/]+/data`. ### Strengths / weaknesses **Strengths:** most expressive format surveyed; `invert_match`, `present_match`, OR combinators, pseudo-header method matching; handles every edge case. **Weaknesses:** very verbose; protobuf-origin field names are not self-evident; `or_match` nesting is awkward; hard to validate in a lightweight schema check; not appropriate as a user-facing YAML format without a wrapping DSL. --- ## Format 3: AWS ALB Listener Rules **Spec:** [AWS Elastic Load Balancing API — Conditions](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-listeners.html#rule-condition-types) **Maturity:** GA, widely used in AWS infrastructure-as-code (CloudFormation, Terraform `aws_lb_listener_rule`). ### Match object (Terraform / CloudFormation rendering) ```yaml conditions: - field: path-pattern path_pattern_config: values: - /api/* - /health - field: http-header http_header_config: http_header_name: Content-Type values: - application/json - application/x-www-form-urlencoded - field: http-request-method http_request_method_config: values: - GET - POST - field: host-header host_header_config: values: - "*.example.com" - api.example.com - field: query-string query_string_config: values: - key: version value: "2" ``` All conditions in a rule are ANDed. Multiple values within a single condition are ORed. Up to 5 conditions per rule. ### Path matching ALB natively supports glob patterns in `path-pattern`: - `*` matches any sequence of characters (including `/`). - `?` matches any single character. This is the only surveyed format with first-class glob support. `/api/*/data` is valid and unambiguous. No regex support. ### Header matching Header conditions match against the header value. Multiple values are ORed. The header name is fixed per condition block; to AND two header predicates, add two separate `http-header` conditions. Case-insensitive matching on values. ### Method matching ```yaml - field: http-request-method http_request_method_config: values: - GET - POST ``` Multiple values are ORed (GET or POST). Up to 40 methods per rule. ### Strengths / weaknesses **Strengths:** first-class glob path matching (the only format surveyed with `*` and `?`); multi-value OR within a condition block is concise for the common case; method matching is a flat list, easy to write. **Weaknesses:** maximum 5 conditions per rule; no regex; no header-absent predicate; no request-body matching; the `field` + `*_config` naming is awkward (the field name is a string enum that determines which sibling key is relevant — a schema-validation anti-pattern); tied to AWS semantics (target groups, priority integers). --- ## Format 4: Traefik Dynamic Routing **Spec:** [Traefik Router Rule syntax](https://doc.traefik.io/traefik/routing/routers/#rule) **Maturity:** GA, widely deployed in Kubernetes (IngressRoute CRD) and Docker-Compose setups. Traefik v3 aligns with Gateway API for Kubernetes routes but keeps its own expression syntax for the `rule` field. ### Match expression (string, embedded in YAML) ```yaml http: routers: my-router: rule: > Host(`api.example.com`) && PathPrefix(`/api/v1`) && Method(`GET`, `POST`) && Header(`Content-Type`, `application/json`) service: my-service ``` `&&` = AND, `||` = OR. Parentheses for grouping. Available matchers: | Matcher | Example | |---------|---------| | `Host` | `Host("api.example.com")` | | `HostRegexp` | `HostRegexp(".*\.example\.com")` | | `Path` | `Path("/exact/path")` | | `PathPrefix` | `PathPrefix("/api/v1")` | | `PathRegexp` | `PathRegexp("/api/v[0-9]+/.*")` | | `Method` | `Method("GET", "POST")` | | `Header` | `Header("Content-Type", "application/json")` | | `HeaderRegexp` | `HeaderRegexp("Accept", "application/.*")` | | `Query` | `Query("version", "2")` | | `QueryRegexp` | `QueryRegexp("id", "[0-9]+")` | | `ClientIP` | `ClientIP("10.0.0.0/8")` | Glob paths: not supported directly. Use `PathRegexp` instead. ### Strengths / weaknesses **Strengths:** the most expressive and concise format for complex boolean combinations (AND/OR/NOT in a single line); `Method("GET", "POST")` is the cleanest multi-method syntax surveyed; full regex support on every field; Traefik v3 supports this inside Kubernetes CRDs. **Weaknesses:** the rule is a *string* embedded in YAML, not a structured object — it cannot be validated with JSON Schema and is harder to generate programmatically; no structured round-trip; no glob, only regex. --- ## Comparison table | | Gateway API | Envoy | AWS ALB | Traefik | |---|---|---|---|---| | **Path: exact** | ✅ `Exact` | ✅ `path` | ✅ exact value | ✅ `Path()` | | **Path: prefix** | ✅ `PathPrefix` | ✅ `prefix` / `path_separated_prefix` | ✅ (via glob `/*`) | ✅ `PathPrefix()` | | **Path: glob** (`/a/*/b`) | ❌ (use regex) | ❌ (use regex) | ✅ native | ❌ (use regex) | | **Path: regex** | ✅ `RegularExpression` | ✅ `safe_regex` | ❌ | ✅ `PathRegexp()` | | **Header: exact** | ✅ | ✅ | ✅ | ✅ | | **Header: regex** | ✅ | ✅ | ❌ | ✅ | | **Header: absent** | ❌ (extension) | ✅ `present_match: false` | ❌ | ❌ | | **Method matching** | ✅ (one per entry; OR via multiple entries) | ✅ (via `:method` pseudo-header) | ✅ (list = OR) | ✅ `Method("GET","POST")` | | **AND semantics** | predicates within one `matches` entry | all conditions | all `conditions` entries | `&&` operator | | **OR semantics** | multiple `matches` entries | `or_match` combinator | multiple values in one condition | `\|\|` operator | | **Schema-validatable** | ✅ (CRD/JSON Schema) | ✅ (protobuf) | ✅ (CloudFormation schema) | ❌ (embedded string) | | **Human-writable** | ✅ | ⚠️ verbose | ✅ | ✅ | | **Generatable** | ✅ | ✅ | ✅ | ⚠️ (string concat) | --- ## Design choices worth adopting ### 1. Match object as a structured peer to `host` Gateway API's separation of concerns maps well onto bot-bottle's existing schema. Instead of a flat `path_allowlist`, a `match` block nests all predicates: ```yaml egress: routes: - host: api.github.com match: paths: - type: prefix # exact | prefix | glob | regex value: /repos/myorg/ headers: - name: Content-Type value: application/json methods: [GET, POST] auth: scheme: Bearer token_ref: EGRESS_TOKEN_0 ``` All predicates within `match` are ANDed. A list of `paths` entries is ORed (first match wins — same as the current `path_allowlist` semantics). ### 2. Path type enum (`exact` | `prefix` | `regex`) Use three named types rather than inferring from the value's syntax. This avoids the ambiguity that plagues `.gitignore` and `nginx location` patterns where the same string can mean different things depending on leading characters. - `prefix`: mirrors current `path_allowlist` semantics. - `regex`: RE2 for wildcard and advanced cases. Reject at load time if the pattern fails to compile. Covers every case glob would handle — `/api/[^/]+/data` is the `/api/*/data` equivalent. Glob-style syntax is not included: it adds a third path-matching language on top of prefix and regex without meaningful operator benefit, since regex is already required for any non-trivial wildcard. ### 3. Header matching as a list of `{name, value, type}` objects Mirrors Gateway API exactly. ALL headers must match (AND). `type` defaults to `exact`; `regex` is available. No header-absent for now (adds complexity, low immediate need). ```yaml headers: - name: Content-Type value: application/json # type: exact (default) - name: X-Internal-Key value: "dev-[0-9]+" type: regex ``` ### 4. Method list as a flat enum list Adopts ALB's conciseness. An empty or absent `methods` list means all methods are permitted. Values are uppercased HTTP method names. ```yaml methods: [GET, HEAD] ``` ### 5. Multiple `match` entries per route: OR semantics at the route level If a route needs GET on one path and POST on a different path, use a `matches` (plural) list where entries are ORed: ```yaml routes: - host: api.example.com matches: - paths: [{type: prefix, value: /read}] methods: [GET, HEAD] - paths: [{type: exact, value: /write}] methods: [POST, PUT] ``` This mirrors Gateway API's top-level OR; each entry is an AND of its predicates. --- ## Decisions The open questions raised during research were resolved in PR #196 review: 1. **Backward compatibility:** Hard cutover. The new `matches` structure replaces `path_allowlist` entirely with no compatibility shim and no fallback parsing for the old format. Manifests using `path_allowlist` must be migrated. 2. **Glob support:** Dropped. Not strictly necessary — `regex` covers every case glob would handle. Fewer path-matching languages to document and validate. 3. **Header value OR:** Stick with Gateway API. OR across header values requires a separate entry in the `matches` list, not multiple values inside one `headers` block. 4. **Method name case:** Case-insensitive at parse time. `get`, `GET`, and `Get` are all accepted and normalised to uppercase internally.