The situation
Three workloads, one VPC, three very different shapes.
- An HTTPS API under
api.example.com. Serves REST at/v1/*, gRPC at/grpc/*, and a WebSocket endpoint at/ws. Needs TLS termination, host- and path-based routing, AWS WAF in front, and sticky sessions for the WebSocket long-lived connections. Peak 2,000 requests per second; mix of short REST and long-lived WS. - A TCP service on port 6379, a custom Redis-compatible cache accessed from an adjacent VPC and from on-prem over Direct Connect. Connection setup is part of the latency budget; the service needs to see the real client IP for per-tenant rate limiting. Handles 50,000 connections with sub-millisecond connection-establishment.
- A fleet of firewall appliances, bought from a third-party vendor, deployed as 4 EC2 instances. Every packet leaving the VPC on a route to the internet has to traverse one of them first, transparently, the firewall inspects, logs, and decides pass/drop. The appliances preserve the original source and destination addresses; no NAT.
Three services, three shapes. The temptation is to pick one load balancer and make it fit. ALB for everything, or a stack of NLBs, but that’s where the wheels come off. Each shape has a load balancer designed for it.
What actually matters
It’s worth starting with which OSI layer the balancer operates at, because that is the clarifying question.
An Application Load Balancer operates at layer 7, it reads the HTTP request, sees the path, the host header, the cookies, and can route on any of them. It terminates TLS, inspects the decrypted payload, and forwards to targets over a fresh connection. This is a lot of capability and a lot of work per request, and it’s why ALB hasn’t beaten NLB on raw throughput or latency: the protocol awareness costs microseconds. When the routing decision genuinely depends on the URL path, the ALB earns its keep; when it doesn’t, the layer-7 work is waste.
A Network Load Balancer operates at layer 4. TCP or UDP, source and destination IPs and ports. It doesn’t read the payload; it just hashes the connection’s 4-tuple to a target and stays out of the way. Connection establishment is fast because there’s no TLS handshake to terminate (optionally it can, but the point is it can also pass through), no HTTP parsing, no routing rules to evaluate. The client sees the target IP as the destination unless proxy protocol or NLB’s target-side client-IP preservation is configured. For latency-sensitive and connection-rate-sensitive workloads, NLB is the default.
A Gateway Load Balancer operates at layer 3, it’s a transparent insertion point for virtual appliances. The balancer encapsulates packets in GENEVE (UDP port 6081) and sends them to registered appliances; the appliances inspect, decide, and send the packets back for the balancer to forward to the original destination. The source and destination addresses the appliance sees are the original addresses from before the detour. This is not a load balancer in the traditional “distribute requests across application tier” sense; it’s a packet-level redirection mechanism for middleboxes.
Second question worth thinking about: what does the balancer need to see?. ALB sees HTTP, that’s the whole point; without HTTP visibility it can’t route on paths or hosts. NLB sees the TCP connection and nothing else, which is correct for a Redis-compatible cache that doesn’t speak HTTP. GWLB sees every byte of every packet but doesn’t need to understand any of them; it ships them to appliances that do the understanding.
Third: client IP preservation. ALB terminates TCP at the balancer; the target always sees the ALB’s IP unless the application reads X-Forwarded-For. NLB can preserve client IP on the target side, the target sees the real source IP on the connection, not the balancer’s, which matters when the application needs to rate-limit per tenant or write audit logs that record the actual caller. GWLB preserves everything: the appliances see the original 5-tuple.
Fourth: price and scaling shape. ALBs and NLBs scale automatically; you don’t size them, AWS does. NLBs bill per LCU (Load Balancer Capacity Unit), ALBs bill per LCU with a different cost of components; GWLB bills per GLCU. The rule of thumb is that NLBs are cheaper than ALBs at comparable traffic, because they’re doing less work per request.
Fifth: protocol support beyond HTTP. ALB speaks HTTP/1.1, HTTP/2, WebSocket, and gRPC. NLB speaks TCP, UDP, TCP with TLS termination, and TLS passthrough. GWLB sits under everything and doesn’t care about the protocol at all, a GENEVE tunnel full of IP packets is a GENEVE tunnel full of IP packets, whether the inner protocol is HTTPS or SMB or ICMP.
And finally: where to put them. ALBs are typically internet-facing or internal, in public or private subnets. NLBs can be placed in either, and their static per-AZ IPs are a feature, you hand those IPs to a customer’s firewall team once and the allowlist stays stable. GWLB sits in a “security” VPC of its own, with the appliances, and other VPCs route traffic into it via GWLB endpoints (a variant of VPC Endpoint Service).
What we’ll filter on
- OSI layer, does the balancer need to read application-layer data?
- Protocols supported. HTTP/HTTPS/WebSocket/gRPC, TCP/UDP, or arbitrary IP?
- Client IP visibility at the target, does the target see the original source?
- Routing capability, host, path, header, method, query string, or just 4-tuple hash?
- Target types supported, instance, IP, Lambda, ALB-as-target, appliance fleet?
- Use case fit, is this a request-level router, a connection-level splitter, or a packet-level redirector?
The load balancer landscape
-
Application Load Balancer (ALB). Layer 7. Listens on HTTP or HTTPS; evaluates listener rules against host, path, HTTP method, query string, header, or source IP; forwards to one of up to 100 target groups. Supports HTTP/1.1, HTTP/2, gRPC, and WebSocket (via connection upgrade and long-lived connections). Integrates natively with AWS WAF, Cognito for user-pool authentication, and Lambda targets. Target types: instance, IP, Lambda, another ALB. Terminates TLS; the target sees the ALB’s IP on the connection (client IP in
X-Forwarded-For). -
Network Load Balancer (NLB). Layer 4. Listens on TCP, UDP, or TCP_UDP; optionally terminates TLS with ACM certificates. Forwards via 4-tuple hash or via source-IP stickiness. Handles millions of connections with ultra-low latency and supports static per-AZ IP addresses (Elastic IPs), which is what makes it the correct answer when a customer’s firewall needs a stable allowlist. Target types: instance, IP, ALB-as-target (useful for attaching an NLB-fronted static IP story to an ALB-routed app). Client IP preservation is on by default for instance and IP targets; off when the target is ALB or Lambda.
-
Gateway Load Balancer (GWLB). Layer 3. Listens for all IP traffic on port 6081 (GENEVE). Forwards packets to registered appliance targets, third-party firewalls, IDS/IPS, deep packet inspection boxes, as GENEVE-encapsulated packets. Appliances process and send packets back; GWLB forwards the decrypted packets to the real destination. Target types: instance or IP, but the instance is expected to be a virtual appliance with GENEVE support. Typically paired with a GWLB Endpoint (GWLBe) in every consumer VPC, which is a VPC endpoint that injects traffic into the GWLB via an endpoint service.
-
Classic Load Balancer (CLB). Still exists, mostly for EC2-Classic nostalgia. Layer 4 or layer 7 but with less of both than NLB or ALB respectively. New designs do not start here; old designs on CLB should migrate.
Side by side
| Load balancer | OSI layer | Protocols | Client IP at target | Routing | Typical use |
|---|---|---|---|---|---|
| ALB | 7 | HTTP, HTTPS, WebSocket, gRPC | ✗ (use XFF) | host, path, header, method | HTTPS API with path routing |
| NLB | 4 | TCP, UDP, TLS | ✓ | 4-tuple hash, source-IP sticky | TCP service, static IP allowlist |
| GWLB | 3 | all IP | ✓ (appliance sees original 5-tuple) | n/a (transparent) | virtual appliance insertion |
| CLB | 4 or 7 | HTTP, HTTPS, TCP, SSL | ✗ | limited | legacy only |
Reading the table: each balancer is best at the layer it operates at, and terrible at pretending to be another. Putting an ALB in front of a Redis-compatible cache works only if the cache speaks HTTP (it doesn’t) or if you tunnel it (why would you). Putting an NLB in front of the HTTPS API means giving up path-based routing and WAF. Putting a GWLB anywhere except in front of appliances is a category error.
The three layers in action
The picks in depth
The HTTPS API → ALB. The routing requirements are entirely at HTTP. /v1/* to a REST target group, /grpc/* to a gRPC target group (gRPC over HTTP/2), /ws to a WebSocket target group with session stickiness. WAF attached to the ALB via wafv2:AssociateWebACL applies rate-based rules, managed rule groups like AWSManagedRulesCommonRuleSet, and a custom rule blocking traffic by country if needed. The listener configuration:
Listener: HTTPS :443
Certificate: arn:aws:acm:eu-west-1:...:certificate/abc
SSL policy: ELBSecurityPolicy-TLS13-1-2-2021-06
Default action: forward → rest-tg
Rules:
- if path /v1/* → forward to rest-tg
- if path /grpc/* → forward to grpc-tg (HTTP/2, gRPC health)
- if path /ws → forward to ws-tg (stickiness 3600s)
The WebSocket behaviour deserves attention: the ALB upgrades the connection and holds it open; idle timeout defaults to 60 seconds, raised to 3600 for long-lived connections. Stickiness is implemented with lb_cookie duration-based stickiness, the cookie tells the ALB which target to send subsequent messages to for the same user.
The TCP cache → NLB. The workload is TCP on a fixed port, latency-sensitive, and needs the client IP at the target. NLB is the obvious pick. Target type instance preserves client IP by default. Per-AZ static Elastic IPs mean the on-prem firewall team gets three IPs once and never has to update the allowlist. Cross-zone load balancing is off by default for NLB (and billable when on), with three AZs and healthy targets in each, leaving it off is fine; if one AZ’s target fleet is unusually small, turn it on and pay the cross-AZ data charge.
NLB: cache-nlb (internal)
Subnets: 3 AZs, EIP per AZ
Listeners:
TCP :6379 → cache-tg (target type: instance)
Target group:
Health check: TCP on 6379, 10s interval
Preserve client IP: true
Deregistration delay: 30s
Preserve client IP at the target lets the cache’s per-tenant rate limiter see the real caller; deregistration delay of 30 seconds means in-flight connections drain during instance replacement.
The firewall appliances → GWLB. The appliances live in a dedicated security VPC. The GWLB has a target group of the four appliance instances, with health checks on whatever status endpoint the vendor exposes. In every consumer VPC, a GWLB Endpoint points at the GWLB endpoint service, and the consumer VPC’s route table is edited so that the default route (0.0.0.0/0) points to the GWLBe rather than the IGW. Packets bound for the internet get redirected through the GWLBe, GENEVE-tunneled to GWLB, inspected by an appliance, tunneled back, and finally forwarded to the IGW with the original source and destination intact.
Route table (consumer VPC, private subnets):
10.0.0.0/16 → local
0.0.0.0/0 → vpce-0123abc… (GWLBe)
Security VPC route table (GWLB subnets):
10.0.0.0/8 → local
0.0.0.0/0 → igw-0456def…
The appliance fleet is scaled independently; GWLB’s target group health checks remove dead appliances automatically. The source IP the internet sees is whatever SNAT the firewall does, or, if the design keeps the original source IP, the workload’s private IP via the NAT Gateway pattern layered on top. Transparent insertion means zero changes to workload application code.
A worked trace: one request, three balancers
A single user interaction produces three load-balancer events, one per balancer. The user loads the API from the browser, which calls api.example.com/v1/orders. The API backend looks up a cache entry in the TCP cache. The API’s outbound metrics push goes to a third-party metrics endpoint on the internet.
14:02:18.104 ALB ws-api-alb elb_status=200 target_status=200
GET /v1/orders tls=TLSv1.3 user_agent=Mozilla/...
target_processing_time=0.027 request_processing_time=0.001
client=82.34.12.7:51204 target=10.0.12.87:8080
14:02:18.115 NLB cache-nlb elb_status=- target_status=-
TCP :6379 state=new client_ip=10.0.12.87
target=10.0.14.203:6379
14:02:18.209 GWLB egress-gwlb (via firewall-02 in security VPC)
outbound tcp 10.0.12.87:44892 → 52.85.12.3:443
inspect=ok action=allow post-nat-src=52.90.10.4
appliance latency: 0.4 ms
Three balancers, three layers, three jobs. The ALB logs the HTTP request with paths and status codes; the NLB logs the TCP connection with client IP (which is the workload’s private IP, which is correct, the API is the “client” of the cache); the GWLB operation is invisible to the workload entirely, but the appliance records every byte.
When to combine them
Two common combinations are worth naming.
ALB-as-target of an NLB. The NLB gets the static IP story; the ALB gets the HTTP routing. Clients reach a stable NLB IP (allowlist-friendly); the NLB forwards to the ALB (one target); the ALB routes on path. Works when external firewalls need stable IPs and the app needs path-based routing. Client IP at the ALB comes via X-Forwarded-For because the NLB does not preserve client IP when the target is an ALB.
GWLB under ALB/NLB. Every VPC’s egress passes through the GWLB for appliance inspection, regardless of whether ingress came through an ALB, an NLB, or Direct Connect. The balancers are independent; GWLB handles egress inspection, the others handle load balancing for the workloads.
What’s worth remembering
- OSI layer is the clarifying question. ALB is layer 7 (HTTP); NLB is layer 4 (TCP/UDP); GWLB is layer 3 (IP, via GENEVE tunnels). Pick by the layer your problem sits at; you cannot fake it from an adjacent layer without compromise.
- ALB for HTTP routing, WAF, and TLS termination. Host- and path-based rules, WebSocket and gRPC support, WAF attachment, Lambda and container targets, Cognito integration, all specific to layer 7.
- NLB for TCP, UDP, static IPs, and client IP preservation. Ultra-low latency, per-AZ Elastic IPs, high connection-rate, client IP visible at instance and IP targets by default.
- GWLB for transparent virtual appliance insertion. Packets are GENEVE-encapsulated (UDP 6081), shipped to appliance targets, returned, and forwarded to the real destination with original addresses intact. Workloads don’t know the appliance exists.
- Client IP is a separate axis from OSI layer. NLB preserves it natively on instance/IP targets; ALB makes you use
X-Forwarded-For; GWLB hands the appliance the original 5-tuple. - Target types differ. ALB targets: instance, IP, Lambda, ALB. NLB targets: instance, IP, ALB. GWLB targets: instance or IP of an appliance that speaks GENEVE.
- Static IPs only come from NLB. If a customer’s firewall needs a stable allowlist, the answer is NLB with Elastic IPs, not ALB. Front the ALB with an NLB if you need both.
- Don’t fight the shape. Every war story about load balancer performance or feature limits ends with “…and then we moved it to the right balancer.” The initial pick matters more than any tuning that follows.
Three workloads, three balancers, and they each sit at the layer their problem lives at. The HTTPS API goes to the ALB because the routing decision is on the URL; the cache goes to the NLB because speed and client IP matter; the firewalls go behind the GWLB because packets need to be redirected transparently. Picking correctly first saves a migration later.