The situation
A platform team runs four web applications behind a single shared Application Load Balancer in eu-west-1:
- Marketing site (
www.example.com). Static React SPA served by a small Fargate service. - Customer app (
app.example.comandcustomers.example.com, both as aliases). Authenticated dashboard; Fargate service, larger fleet. - Admin console (
admin.example.com). Internal tool; private-only, behind VPN. - API (
api.example.com, with path prefixes/v1/*,/v2/*,/internal/*). Multiple backend services mounted under path prefixes.
One ALB, one TLS listener on 443 with a multi-SAN certificate, multiple target groups. The catalog started small and is growing; a second team is about to add their service under api.example.com/v3/*. The current rules were written incrementally and have started to misroute: a recent PR added a rule for /v3/* that, because of priority ordering, accidentally caught /v2/* too. The incident was resolved in minutes by swapping two rule priorities, but the team realised nobody actually understood the rules file.
The questions: how should the rules be organised so they don’t fight each other? Where do host-based rules end and path-based rules begin? And is one ALB the correct answer for four different services, or should the admin console split out onto its own?
What actually matters
Before refactoring the rules, it’s worth being precise about what the ALB’s rules engine actually does.
The first thing is rules evaluate in priority order, top to bottom, first match wins. Each listener has a default action (typically a catch-all fixed-response 404) and up to 100 rules. Each rule has a priority (1-50000), a set of conditions (host-header, path-pattern, http-header, http-request-method, query-string, source-ip), and an action (forward to target group, redirect, fixed-response, authenticate). The ALB walks the rules in priority order; the first rule whose conditions all match fires; subsequent rules are ignored for that request.
The second thing is rule conditions compose with AND, not OR. A single rule with host-header=api.example.com AND path-pattern=/v1/* only matches when both are true. To match multiple patterns with OR semantics, you add multiple values to the same condition (path-pattern: [/v1/*, /v1beta/*]) or you write multiple rules.
The third thing is host-based routing cost and path-based routing cost are the same. Both are string comparisons against the Host header or the request path. There’s no performance reason to prefer one over the other; the choice is organisational, “which axis of routing maps to which axis of ownership?”
The fourth thing is priority is the real design. If you have 30 rules and two of them could match the same request (e.g. path-pattern /v2/* and path-pattern /*), the priority decides which fires. The typical failure mode is someone adding a new rule with a priority that’s above a more-specific existing rule, catching traffic meant for the specific rule. Rule ordering is the source of most routing bugs.
The fifth thing is when to use one ALB vs many. Per-balancer baseline cost is modest; four balancers cost four-times-modest. Cost alone doesn’t drive the decision; the real decisions are blast radius (a misconfigured rule takes down all services sharing the ALB), team boundaries (is it reasonable for the admin team to share a listener config with the marketing team?), and security posture (the admin console might need different WAF rules, different client-authentication requirements, different SGs in front).
The sixth thing is advanced routing options beyond host and path. Listener rules can match on HTTP headers (e.g. X-Tenant: acme-*), HTTP methods (GET vs POST), query-string parameters (?tenant=acme), and source IP. These are useful for blue-green deployments (route 10% of traffic based on a cookie or header), feature flagging at the edge, and internal-only routes (source-ip ∈ office-cidrs). They also clutter the rules file if overused.
What we’ll filter on
Filters for each routing strategy:
- Matches the ownership boundary, does the rule structure mirror who owns what?
- Evaluates correctly under priority, does adding a rule risk breaking existing rules?
- Legible as the catalog grows, can a new engineer read the rules and predict routing?
- Cost at current + projected scale. ALB hour charges, LCUs, cert management.
- Blast radius of a bad change, how many services go down if someone breaks the listener?
- Supports per-service operational needs. WAF, access logs, SG rules, TLS cert differentiation.
The routing landscape
-
Host-based only. One rule per host-header.
api.example.com→ api target group.www.example.com→ marketing. Different services mean different hosts. Downside: if the API team wants to split/v1and/v2onto different target groups, host-based routing forces a subdomain (v1.api.example.com), which is awkward. -
Path-based only. One rule per path pattern.
/api/v1/*→ api-v1 target group./app/*→ customer app. All services share one hostname; paths carve them up. Works for single-domain architectures; doesn’t map cleanly when each service has its own branded domain. -
Host-based primary, path-based secondary. The hybrid that fits most real catalogs. Host-header identifies the service; path identifies the version or sub-route.
api.example.comAND/v1/*→ api-v1;api.example.comAND/v2/*→ api-v2;app.example.comAND/*→ customer-app. Each rule has two conditions; priority ordering goes from most-specific to least-specific. -
Header-based for canary. On top of host and path, an HTTP-header condition routes a subset of traffic (
X-Canary: true) to a canary target group. The rest falls through to the main rule. -
Multiple ALBs, one per service. Sacrifice the shared listener for isolation. Each ALB has its own listener, its own WAF, its own cert, its own SG. More monthly baseline cost; less blast radius; clear team boundaries. The admin console splitting to its own ALB with tighter IP allow-lists is a sensible argument for this.
Side by side
| Option | Ownership fit | Priority-safety | Legibility | Cost | Blast radius | Per-service ops |
|---|---|---|---|---|---|---|
| Host-based only | ✓ (per-service) | ✓ (non-overlapping by host) | ✓ | One ALB | Shared | ✗ (shared listener config) |
| Path-based only | — | ✗ (overlap-prone) | — | One ALB | Shared | ✗ |
| Host + path hybrid | ✓✓ | ✓ with discipline | ✓ with discipline | One ALB | Shared | — |
| Header-based canary | ✓ | ✓ | — (harder to read) | One ALB | Shared | — |
| Multiple ALBs | ✓✓ | n/a (no sharing) | ✓✓ | 4× baseline | Isolated | ✓ |
For this situation: host-based primary, path-based secondary, with the admin console split to its own ALB for security isolation. The marketing, customer, and API apps can share one ALB because they have similar security posture; the admin console’s tighter requirements (internal-only, VPN-gated, WAF rules that differ) earn it an ALB to itself.
Rule table layout and priority strategy
The listener config in depth
Three tiers of priority. The ladder above codifies the pattern: the most-specific rules (header-based canaries) at the top, specific path-within-host rules in the middle, host-only catch-alls at the bottom. Gaps between tiers (10, 100, 1000) leave numeric room to insert new rules without having to renumber neighbours. Within a tier, gaps of 10 between rules do the same at a smaller scale.
No two rules can match the same request in a way that priority doesn’t disambiguate. That’s the invariant. Either two rules have non-overlapping conditions, or their priorities are explicit about which wins. Reviewing a PR means asking: “if this new rule fires, does any existing rule that should have fired get skipped?” A simple check: for each new rule, list the requests it would match, then check whether any higher-priority rule already matches that set.
TLS certificate. One ACM-issued certificate on the listener with subject alternative names covering www.example.com, app.example.com, customers.example.com, api.example.com. Validated via DNS; auto-renews. Multiple SANs on one cert beats multiple certs on one listener (ALB supports SNI, but the operational overhead of separate certs is not worth it for a single-owner domain). The admin console, on its own ALB, has its own cert for admin.example.com.
Health checks per target group. Each target group owns its health-check config: path, interval, threshold, success codes. tg-api-v1 checks /v1/health, tg-customer-app checks /health, etc. The health check runs per-target regardless of which listener rule directed traffic to the target group.
WAF. A regional WAF web ACL on the ALB with a common baseline (AWS managed rule groups: common rule set, known bad inputs, SQL injection). Per-service WAF rules tied to path patterns: /admin/* gets stricter rate limits and IP allow-lists, but wait, the admin console is on a separate ALB anyway. For the shared ALB, WAF is uniform across the three services that share it.
Access logs. ALB access logs to S3, one file every 5 minutes, including the matched rule ARN in each log record. Queryable in Athena to answer “what rule caught this request?”, which is exactly the kind of question the Tuesday incident needed to answer quickly.
Why the admin console splits. The admin ALB has:
- Source IP allow-list in the SG (office CIDRs, VPN CIDR), rejects the internet before the listener even evaluates rules.
- WAF with a stricter managed rule set (
AWS-AWSManagedRulesAdminProtectionRuleSet) and a much lower rate-based rule threshold. - Different TLS cert, separate lifecycle.
- No shared blast radius with marketing or customer-app rule changes.
The cost is ~$16/month extra for the separate ALB. The gain is clarity of ownership and a much smaller target surface for misconfiguration.
A worked refactor
We take the existing listener rules and restructure them into the three-tier scheme. The starting state:
priority 1: host=api.example.com path=/v2/* → tg-api-v2
priority 2: host=api.example.com path=/v1/* → tg-api-v1
priority 3: host=www.example.com → tg-marketing
priority 4: host=api.example.com → tg-api-v1 (fallback)
priority 5: host=app.example.com → tg-customer-app
priority 6: host=api.example.com path=/v3/* → tg-api-v3 (new, added last)
The bug that bit: priority 6 is the highest number (lowest priority) of the rules, so it sits below priority 4 (the api-catchall). A request to /v3/foo matches priority 4 first (host=api.example.com), gets routed to tg-api-v1, and the v3 rule never fires.
Refactor to the tiered ladder:
priority 100: host=api.example.com path=/v1/* → tg-api-v1
priority 110: host=api.example.com path=/v2/* → tg-api-v2
priority 120: host=api.example.com path=/v3/* → tg-api-v3
priority 130: host=api.example.com path=/internal/* src-ip∈office → tg-api-internal
priority 1000: host=www.example.com → tg-marketing
priority 1010: host IN (app, customers) → tg-customer-app
default: fixed-response 404
Notice: there’s no host-api catch-all rule. If someone requests api.example.com/random, no path rule matches, no host rule at that tier matches, the default fires: 404. This is deliberate; the old catch-all was the source of the bug. A request that isn’t to a specific version is a request that shouldn’t be routed anywhere.
We apply this via Terraform:
resource "aws_lb_listener_rule" "api_v1" {
listener_arn = aws_lb_listener.main.arn
priority = 100
action {
type = "forward"
target_group_arn = aws_lb_target_group.api_v1.arn
}
condition {
host_header {
values = ["api.example.com"]
}
}
condition {
path_pattern {
values = ["/v1/*"]
}
}
}
# ... and so on for each rule ...
Deploy during a quiet window. CloudWatch metric HTTPCode_Target_2XX_Count per target group stays flat; access-log spot-checks confirm requests are matching the expected rules. The admin console extraction is a separate, larger piece of work scheduled for the following sprint.
Growing the catalog safely
With the ladder in place, the PR checklist for any new rule:
- What priority tier does this belong to? (canary/header → 10s; host+path → 100s; host-only → 1000s)
- Is there a priority gap of at least 5 between this rule and its neighbours?
- Could this rule match a request already caught by a higher-priority rule? If yes, which one wins?
- Could this rule be caught by a lower-priority host-catch-all? If yes, is that the desired fall-through behaviour, or does the default 404 need to fire instead?
- What’s the access-log sample showing after deployment? First handful of requests confirm the rule is catching what it should.
Answering those five questions at PR time prevents the Tuesday incident from recurring. The tiered priority layout plus the “no unintended catch-all” posture makes the routing predictable as the catalog grows.
What’s worth remembering
- Rules evaluate in priority order, first match wins. Lower numeric priority is higher precedence. The default action fires only when no rule matches.
- Conditions within a rule compose with AND; multiple values in a condition compose with OR.
host=api AND path=/v1/*vspath IN (/v1/*, /v1beta/*). - Host + path together is the usual shape for real catalogs. Host identifies the service, path identifies the version or sub-route. Both conditions in one rule; priority tier based on specificity.
- Leave gaps in priority numbers. Tiers of 10, 100, 1000 with gaps of 5-10 inside each tier mean inserting a new rule doesn’t require renumbering half the listener.
- Avoid implicit catch-all rules. A “host=api” with no path condition at the bottom looks innocuous but catches everything that falls through. Prefer the default action (404) to explicit catch-alls when no default route is desired.
- Header-based canary rules sit at the top. Most specific, shortest-lived; removing them when the canary is done should leave the rest of the ladder undisturbed.
- One ALB per security posture, not per service. Services with similar WAF/SG/cert requirements share an ALB cheaply. Services with different requirements, especially “internal only” vs “public”, earn their own.
- Access logs with the matched rule ARN are the fastest debugging tool. The log record includes
chosen_cert_arnandmatched_rule_priority; Athena queries make “why did this request go to the wrong place?” a two-minute investigation.
One load balancer can carry many sites; keeping the rules legible as the catalog grows is a matter of discipline, not technology. Priority tiers, gaps for future rules, and a default action that says “not here” by default, that’s the structure that stops Tuesday’s bug from coming back next month under a different name.