Designing Centralised Packet Inspection With GWLB

March 07, 2029 · 15 min read

The situation

The security requirement:

Every packet leaving any VPC to the internet must pass through the vendor firewall appliance fleet.
Every packet entering an internet-facing service in the DMZ VPC must pass through the same fleet before reaching the origin.
East-west inspection (VPC-to-VPC through the TGW) also passes through the fleet, at least for some segments.
The firewall vendor is already contracted; their virtual appliance runs on EC2 with a custom AMI. Licensing is per-appliance, and licenses are expensive enough that spreading them across N VPCs is not an option.

The constraints:

Packets must be inspected in both directions of a flow, and both directions must land on the same firewall appliance (stateful inspection requires it).
The appliance must see the original source and destination IP, not a NAT’d IP from a load balancer in front of it. Deep inspection relies on 5-tuple visibility.
Throughput must scale horizontally, if one appliance is at 80% CPU, add another; if two fail, the remaining pool picks up the load.
Appliance failure must fail over to healthy peers without breaking active flows (or with the smallest possible disruption).

Options considered:

Firewall per VPC. Every VPC runs its own pair of appliances. Simplest from a routing perspective. Licenses multiply by VPC count. Not viable.
NLB in front of an appliance fleet. An NLB balances connections across appliances, but NLBs TCP-proxy, the appliance sees the NLB’s private IP, not the client’s, and flow symmetry isn’t guaranteed. Breaks the 5-tuple visibility requirement.
Gateway Load Balancer (GWLB) fleet in a dedicated firewall VPC. GWLB transparently forwards packets to appliances via GENEVE tunnels, preserves source and destination, hashes flows to stick to one appliance, and scales horizontally. The purpose-built primitive.
PrivateLink or TGW direct to appliance. No flow balancing; no symmetry guarantees; no packet preservation.

What actually matters

Before reaching for a service, it’s worth being explicit about what centralised deep-packet inspection actually requires of the insertion mechanism between the workload and the appliance.

The first thing worth thinking about is what the appliance has to see. Deep inspection. IDS/IPS, TLS decryption, application-aware rules, depends on the appliance seeing the original packet, with its original source IP, original destination IP, original ports, original transport. Anything that NATs the packet on the way in (a proxy load balancer, a NAT gateway in front, a PrivateLink that rewrites endpoints) destroys the visibility the appliance needs. The insertion mechanism has to be transparent: it can route, it can encapsulate for transport, it cannot rewrite the packet that the appliance sees.

The second is flow coherence. A stateful firewall tracks connections; the appliance that inspected the SYN must also inspect the SYN-ACK and every subsequent packet of the same flow, in both directions. If the outbound packet lands on appliance A and the return packet lands on appliance B, B has no state for the flow and either drops it as unsolicited or asks for a retransmission that never comes. The mechanism has to hash flows consistently to a target and hash the reversed-direction tuple to the same target, otherwise the inspection layer makes the network worse.

The third is horizontal scaling. Firewall throughput is finite per appliance and traffic grows; the cost of a second appliance shouldn’t be a routing rewrite. The mechanism needs a target group that can grow and shrink without changing route tables in every consumer VPC, and the hashing needs to be consistent enough that adding or removing a target doesn’t shuffle every existing flow onto a different appliance and break their state.

The fourth is multi-VPC consumption. The whole reason to centralise is licensing: one fleet, many VPCs. The mechanism has to expose the inspection fleet to many consumer VPCs without giving each VPC direct routing into the firewall VPC’s address space. A managed cross-VPC consumption pattern (interface endpoints, endpoint services) is the right shape; full mesh routing isn’t.

The fifth is health and failure. An appliance that locks up has to be evicted from the pool before too many flows get pinned to it; a recovered appliance should rejoin. The mechanism has to health-check targets, drop unhealthy ones, and re-hash the affected flows onto survivors with the smallest possible blast radius. The team can tolerate a brief flow-reset blip; they cannot tolerate “the firewall fleet had a bad host and traffic stopped.”

The sixth is symmetric routing in the consumer VPC’s own route tables. Even with a flow-sticky hash, if the route tables on either side of the flow don’t both send traffic through the inspection layer, return traffic bypasses inspection entirely. The mechanism by itself doesn’t enforce route symmetry; the design has to. East-west flows across the TGW need an extra discipline (some setting that pins flows to a single AZ of the firewall VPC) or asymmetric routing creeps back in the moment traffic crosses an AZ boundary.

And finally, what the firewall appliance itself expects. Vendor virtual appliances are designed around specific tunnelling protocols carrying flow metadata that the appliance unpacks. The mechanism has to do its own encapsulation invisibly: the appliance terminates the tunnel, inspects the inner packet (with the 5-tuple intact), and re-tunnels the verdict back. Anything else either NATs the packet (failing visibility) or requires the appliance vendor to support a protocol they don’t.

What we’ll filter on

Transparent 5-tuple preservation, the appliance sees the original source and destination?
Flow stickiness across appliances, same flow to same appliance?
Horizontal scaling, add appliance capacity by registering more targets?
Multi-VPC consumption, many VPCs route through one central fleet?
Symmetric return path, return traffic takes the same appliance?

The inspection-insertion landscape

1. Firewall per VPC. Simplest; licensing-expensive; operationally linear. Doesn’t scale in cost terms.

2. NLB in front of appliances. NLB TCP-proxies; the appliance doesn’t see the real client IP (unless preserve-client-ip is on, and even then it doesn’t see the real destination because the NLB has its own VIP). Flow-level symmetry isn’t guaranteed across directions. Fails on preservation and symmetry.

3. Gateway Load Balancer + GENEVE. Transparent, flow-sticky, horizontally scaling, multi-VPC via endpoint services, symmetric with careful routing. The purpose-built primitive.

4. Direct routing via TGW. Route traffic from VPCs through the TGW to a firewall VPC, with appliances as TGW attachments. No load balancing; manual failover; no stickiness. Works in small deployments with one or two appliances; doesn’t scale.

Side by side

Option	5-tuple preserved	Flow-sticky	Horizontal scale	Multi-VPC	Symmetric return
FW per VPC	✓	n/a (one appliance)	✗ (per VPC)	✗	✓
NLB in front	✗	✗ (proxy)	✓	✓	Partial
GWLB + GENEVE	✓	✓	✓	✓	✓ (with routing)
TGW direct	✓	Static	✗	✓	Manual

Three VPCs, one firewall fleet, GENEVE in the middle

Workload VPCs route through GWLB endpoints; GWLB forwards via GENEVE to firewall appliances; appliances return packets; GWLB forwards onward to NAT or IGW. One fleet inspects all VPCs, with flow stickiness.

The pick(s) in depth

Gateway Load Balancer fleet in a central firewall VPC, GWLB endpoints in every workload VPC, symmetric routing on both sides. The design that satisfies all five attributes when configured carefully.

The components:

1. Firewall VPC. Dedicated VPC (10.200.0.0/16) with:

Public subnets for egress (NAT Gateway, Internet Gateway).
A private subnet per AZ for the firewall appliances’ primary ENIs.
A dedicated subnet per AZ for the GWLB (managed subnets AWS uses internally).

2. The GWLB. aws elbv2 create-load-balancer --type gateway --subnets subnet-a subnet-b subnet-c. Creates a GWLB listening on all ports (gateway load balancers don’t take listeners the way ALBs or NLBs do).

3. The target group. Type ip (or instance), protocol GENEVE, port 6081. aws elbv2 create-target-group --name tg-fw --protocol GENEVE --port 6081 --target-type ip --vpc-id vpc-firewall. Health check on a TCP port the appliance listens on (e.g., SSH 22, or the vendor’s health port).

4. Appliance targets. Register each firewall appliance’s primary ENI IP with the target group: aws elbv2 register-targets --target-group-arn <tg-arn> --targets Id=10.200.1.10 Id=10.200.2.10 Id=10.200.3.10.

5. GWLB endpoint service. aws ec2 create-vpc-endpoint-service-configuration --gateway-load-balancer-arns <gwlb-arn>. The endpoint service has a name like com.amazonaws.vpce.eu-west-1.vpce-svc-0xyz. Allowlist the principals of the workload accounts that will consume it.

6. GWLB endpoints in workload VPCs. In each workload VPC, create an interface VPC endpoint pointing at the GWLB endpoint service: aws ec2 create-vpc-endpoint --vpc-endpoint-type GatewayLoadBalancer --service-name <service-name> --subnet-ids <firewall-subnet-in-workload>. The endpoint gets an ENI in a designated “firewall-insertion” subnet in the workload VPC.

7. The routing. This is where it gets delicate. In each workload VPC:

The main subnet’s route table has 0.0.0.0/0 → gwlb-endpoint-ENI (instead of the IGW/NAT route). Outbound traffic goes to the endpoint.
The firewall-insertion subnet’s route table has 0.0.0.0/0 → IGW (for egress from the firewall VPC after inspection), or more commonly the traffic flows back out via a parallel path.

Typically, the centralised pattern is:

Workload VPC → GWLB endpoint → (across endpoint service to) GWLB → (GENEVE) → firewall → (GENEVE) → GWLB → back to GWLB endpoint ENI → out the firewall-insertion subnet’s route table → to NAT/IGW in the firewall VPC.

Critically, return traffic for stateful protocols must traverse the GWLB on the way back. If the return path short-circuits the GWLB, the firewall appliance on the way out won’t see the return packets and either drops them (stateful rule) or lets a different appliance see them (breaking flow coherence). The “Appliance Mode” flag on TGW attachments is what keeps east-west flows symmetric when the flow crosses the TGW, always set appliance mode on the firewall VPC’s TGW attachment.

A worked packet (egress)

An EC2 in the prod VPC at 10.10.5.22 calls https://api.github.com:

Prod VPC route table for the main subnet: 0.0.0.0/0 → gwlb-ep-prod. Packet sent to the endpoint ENI’s IP.
GWLB endpoint receives the packet at its ENI in the prod VPC. The endpoint forwards via the endpoint service to the GWLB in the firewall VPC.
GWLB computes the 5-tuple hash of 10.10.5.22 → 140.82.x.y TCP sport dport, maps to (say) appliance 2.
GWLB encapsulates the packet in GENEVE with metadata (flow cookie, tenant ID, etc.) and sends UDP 6081 to appliance 2’s ENI.
Appliance 2 receives the GENEVE packet, decaps, sees the original 10.10.5.22 → 140.82.x.y packet, runs inspection. Assume pass.
Appliance 2 re-encapsulates the same packet in GENEVE and sends it back to GWLB.
GWLB forwards the (unchanged) packet out; it returns to the GWLB endpoint ENI in the prod VPC.
Firewall-insertion subnet route table (in the prod VPC’s dedicated subnet for this endpoint): 0.0.0.0/0 → TGW (if centralised egress via firewall VPC) or 0.0.0.0/0 → local gateway endpoint forwarding.
Eventually the packet reaches the NAT Gateway or IGW in the firewall VPC, SNATs (if NAT), and exits to the internet.

Return traffic comes back via the same path in reverse. IGW → firewall VPC → GWLB → same appliance 2 (5-tuple hash on the reversed 5-tuple lands on the same slot because GWLB hashes symmetrically) → decap → inspect return → re-encap → GWLB → endpoint → prod VPC → EC2.

What’s worth remembering

GWLB is a transparent gateway, not a proxy. It doesn’t NAT, doesn’t terminate, doesn’t see the payload (unless the appliance sends a modified packet back). Appliances get the original 5-tuple.
GENEVE on UDP 6081 is the tunnel protocol. All GWLB-to-appliance traffic is GENEVE-encapsulated. Security groups must allow UDP 6081 between GWLB (the managed infrastructure’s source) and appliances.
Flow stickiness via 5-tuple hash. Same flow → same appliance, consistently. Return traffic of the same flow → same appliance because the 5-tuple is symmetric.
GWLB endpoints are how other VPCs consume. Interface endpoints of type GatewayLoadBalancer, pointed at the GWLB’s endpoint service. The endpoint ENI in the workload VPC is the insertion point.
Route tables do the real work. Workload subnet → GWLB endpoint; firewall-insertion subnet → onward (NAT, IGW, or TGW). Symmetry matters; misrouted return paths break stateful inspection.
TGW Appliance Mode keeps east-west flows symmetric. When the firewall VPC is attached to a TGW, set appliance mode on the attachment to force TGW to pin flows to a single AZ of the firewall VPC, keeping return paths on the same appliance.
Appliances scale horizontally. Add targets to the target group; GWLB re-hashes. Hashing is consistent (jump-style) to minimise flow disruption on pool changes.
Health-check failures re-hash flows around failed appliances. An appliance going unhealthy drops from the pool; flows to it migrate to survivors. Brief flow-reset blip, not a total outage.

A vendor firewall per VPC is how small organisations do it. A single, shared firewall fleet handling every VPC via GWLB is how large ones do it, and GWLB is the AWS primitive designed for that job. The work isn’t picking the fanciest topology, it’s getting the route tables right in both directions, pinning flows to appliances with the 5-tuple hash, and trusting GENEVE to do its invisible job.

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.