VPC Peering or Transit Gateway

The situation

A platform engineering team operates 52 VPCs across three regions (us-east-1, eu-west-1, ap-southeast-2). Each VPC houses a separate workload, production, staging, dev for several product lines, plus shared services for auth, observability, secrets, and internal tooling.

They need any-to-any connectivity between VPCs in the same region and most cross-region paths reachable. They need a single egress point per region so all outbound traffic leaves through one inspection hop. They want east-west inter-VPC traffic to flow through a third-party firewall fleet. Adding a 53rd VPC should not mean touching route tables in 52 other VPCs. And they need a cost shape that scales with traffic, not with the count of pairwise connections.

Today everything is stitched together with VPC peering. The networking engineer who’s been holding the duct tape has asked whether there’s a better shape.

What actually matters

Before picking a primitive, it’s worth asking what properties the architecture needs to have in a few years, because the connectivity substrate is one of the hardest things to change once workloads depend on it.

The first property is transitivity. A well-behaved network lets A reach C via B without needing a direct A-to-C edge. Peering doesn’t; every pair needs its own connection and its own route table entries on both sides. The number of edges grows as N × (N-1) / 2, and the operational work grows faster than that because every new VPC requires updates to all existing ones. That’s a subtle failure mode, adding the 53rd VPC is fine; enrolling it into a mesh with 52 others is 52 changes, each one a chance to typo a CIDR.

The second is where control lives. Pairwise connections put every routing decision in the spoke’s own route tables. Hub-and-spoke shapes put the decisions in one place, which is both a feature (segmentation policy is centralised and auditable) and a risk (the hub is a blast-radius concentrator). The question isn’t “do we want central control”, at this scale we do, it’s what shape of central control the candidates offer. Some hubs expose imperative route-table editing; some expose a declarative policy layer over the top. The difference matters more as the topology grows.

The third is the data path. “Reach each other” is the easy bit. “Reach each other via a specific inspection point” is what security review actually cares about. A hub has an obvious place to insert a firewall fleet between segments. Peering doesn’t, packets travel peer-to-peer over the AWS backbone with nowhere to land a GWLB in the middle. If inspection is a hard requirement, the primitive has to have a seam for it.

The fourth is cost shape. Pairwise peering is free for the connections themselves but bills for data transfer; hub-style primitives charge per attachment-hour plus per-GB processed; the higher-level managed-network options add their own per-attachment and per-policy fees on top. The interesting number isn’t the unit cost, it’s whether cost grows with pairs (O(N²)) or with VPCs (O(N)). At 52 VPCs the O(N²) shape is already painful; at 200 it’s a full-time job.

The fifth is headroom. The team today is 52 VPCs in 3 regions. If in eighteen months they’re at 150 VPCs in 6 regions, will today’s choice still fit? A regional-hub primitive’s own cross-region story turns into its own N × (N-1) / 2 problem once the region count gets large. The choice isn’t only about today; it’s about whether today’s choice constrains the next decision.

And the sixth is operational simplicity as a feature. A network that a small team can confidently change at 2am during an incident is worth a lot more than a network that requires a specialist to touch. The team here has other things to do. Whatever they pick needs to be boring to operate in the steady state, not clever.

What we’ll filter on

Transitive any-to-any reachability without O(N²) configuration.
A single egress point per region all VPC traffic funnels through.
A seam in the data path where a security appliance can sit.
Cross-region connectivity that doesn’t explode into another mesh.
Operational simplicity, adding a new VPC is a few operations, not dozens.
Cost scaling with VPCs, not with pairwise connections.

The connectivity landscape

VPC Peering. A point-to-point connection between exactly two VPCs. Non-transitive, if A peers with B and B peers with C, A cannot reach C through B. To connect N VPCs in a mesh you need N × (N-1) / 2 peering connections, and you must add a route table entry to every VPC for every other VPC’s CIDR. For N = 52, that’s 1,326 peering connections and at least 2,652 route table entries to maintain. Free per connection, billed for data transfer. Right primitive for two or three VPCs. Wrong for 52.
AWS Transit Gateway (TGW). A regional hub. Each VPC gets one attachment to the TGW; the TGW has route tables that decide which attachments can route to which. Connectivity is transitive. A to TGW to C works fine. For N VPCs in one region: N attachments, one TGW. Cross-region works via TGW peering attachments between regional TGWs. Centralised egress is a dedicated VPC with NAT Gateway attached to the TGW; centralised inspection is a GWLB-fronted appliance fleet in an inspection VPC. Pricing is roughly $0.05 per attachment per hour plus $0.02 per GB processed.
AWS PrivateLink. Application-level connectivity to a single service exposed via NLB or GWLB. Not a general-purpose mesh, it’s “expose service X from VPC A to consumers in VPC B and C”. Excellent for SaaS-style producer/consumer exposure; doesn’t satisfy the general L3 reachability this scenario needs. Worth knowing for adjacent problems.
AWS Cloud WAN. A managed global network on top of TGWs. You define a core network with regions (TGWs are provisioned for you) and segments (logical groupings of VPCs that may or may not communicate). Policies are JSON documents describing segment isolation, sharing rules, and routing. The thing to reach for at hundreds or thousands of VPCs across many regions where declarative policy beats imperative route-table editing.

Side by side

Option	Transitive	Central egress	Inspection seam	Cross-region	Ops simple	Cost scales with VPCs
VPC Peering	✗	✗	✗	✓ (pairwise)	✗	✗
Transit Gateway	✓	✓	✓	✓	✓	✓
PrivateLink	✗	✗	✗	✓	—	✓
Cloud WAN	✓	✓	✓	✓	✓ (at scale)	✓

Cloud WAN technically ticks every box, but for 52 VPCs in 3 regions it’s a policy-engine layer of complexity the team doesn’t yet need. Transit Gateway lands.

Matching the shape to the primitive

Three shapes, three primitives. The 52-VPC, 3-region shape lives firmly in the middle column.

Transit Gateway, in depth

Attachments and route tables. Each VPC connects to the TGW via an attachment (an ENI per AZ, in subnets you specify). The TGW has one or more route tables, and each attachment is associated with exactly one route table for its outbound routing decisions and propagates routes (or has them statically added) into one or more route tables.

Association controls “which routes does this attachment see”, which routing table determines where this attachment’s traffic can go. Propagation controls “which route tables learn about this attachment’s CIDRs”, which other attachments can route to it. The separation is what lets you build sophisticated topologies.

A common shape for 52 VPCs:

Production route table, production VPCs associate here, propagate here, see only production routes plus the egress VPC.
Non-production route table, dev and staging VPCs associate here, see each other plus the egress VPC, but not production.
Shared services route table, auth, secrets, and observability VPCs propagate into both production and non-production tables (everyone can reach shared services) but associate with their own table.
Egress route table, the egress VPC associates here, sees all VPCs so return traffic finds its way back.
Inspection route table, controls which east-west pairs go via the inspection VPC versus direct.

Five route tables, not 1,326 peering connections.

Bandwidth and limits. Up to 50 Gbps per VPC attachment, aggregate TGW capacity much higher. Soft limit of 5,000 attachments per TGW, generous for most environments.

Cross-region (TGW peering). A peering attachment between two TGWs runs over the AWS backbone. Routes are static, no BGP between TGWs, so the remote region’s CIDRs are explicitly added to the local TGW’s route table. Bandwidth is high but the data-transfer cost is the cross-region rate (~$0.02/GB) plus TGW data processing on both ends. For three regions that’s three peerings (east-west, east-south, west-south). Manageable.

Centralised egress pattern. The egress VPC contains a NAT Gateway (or several, one per AZ for HA), optionally an AWS Network Firewall or third-party fleet behind a GWLB, and a TGW attachment. Workload VPCs add a default route 0.0.0.0/0 -> TGW. The TGW egress route table sends 0.0.0.0/0 to the egress VPC’s attachment. Inside the egress VPC, traffic goes TGW subnet to firewall (via GWLB endpoint) to NAT Gateway to IGW. Return traffic reverses the path.

This collapses what would otherwise be 52 NAT Gateways (one per VPC, ~$33/month each) into 3 NAT Gateways in per-region egress VPCs.

Centralised inspection pattern. An inspection VPC sits between workload VPCs in the TGW topology. East-west traffic between the production VPC and shared-services, say, is routed via the inspection VPC, where the GWLB-fronted appliance fleet sees every packet.

A worked architecture

For the 52-VPC, 3-region scenario:

Three TGWs, one per region. Each region’s TGW has the regional VPCs as attachments.
Three TGW peering connections, forming a regional mesh of three nodes.
Per-region egress VPC with NAT Gateway and firewall, attached to the regional TGW.
Per-region inspection VPC with GWLB and appliances for east-west inspection.
Workload VPCs segmented across two or three TGW route tables (prod, non-prod, shared services).

Adding a 53rd VPC: create the VPC, create one TGW attachment, associate with the appropriate TGW route table, propagate where needed. Three to four operations, one region, no changes to any other VPC’s route tables.

Cost estimate (rough, us-east-1 prices):

52 VPC attachments × $0.05/hr × 730 hrs = ~$1,900/month
3 TGW peering attachments × $0.05/hr × 730 hrs = ~$110/month
Data processing at ~$0.02/GB and 100 TB/month: ~$2,000/month
3 egress NAT Gateways × ~$33/month + per-GB processing
Plus the inspection appliances themselves

Total: ballpark $5,000/month for the full 52-VPC, 3-region topology with centralised egress and east-west inspection. The peering alternative is free for the connections but requires a dedicated networking engineer to maintain, with no centralised egress or inspection capability available at all.

When Cloud WAN takes over

The move from TGW to Cloud WAN tends to come when at least one of these is true:

VPC count exceeds a few hundred and TGW route table editing becomes its own operational drag.
Many regions (5+) where the TGW peering mesh mirrors the VPC peering problem at a higher layer.
Declarative policy is preferred over imperative route tables, segments as intent (“prod can talk to prod and shared services, never to non-prod”) with Cloud WAN materialising the routes.
SD-WAN integration with on-prem branches is in scope.

Cloud WAN is TGWs orchestrated by a higher-level policy engine, with a global console. For 52 VPCs in 3 regions, manual TGW topology is fine. For 500 VPCs in 8 regions, Cloud WAN saves real operational pain.

What’s worth remembering

VPC peering is non-transitive and pairwise, so connection count grows as N²/2. Wrong primitive beyond a handful of VPCs.
Transit Gateway is the regional hub, one attachment per VPC, transitive routing, route tables segment traffic between groups.
TGW peering between regions is how TGW topologies scale across regions, static-routed over the AWS backbone.
Centralised egress uses an egress VPC with NAT Gateway and optional firewall attached to the TGW, with workload VPCs default-routing to the TGW.
Centralised inspection uses Gateway Load Balancer in an inspection VPC, with TGW route tables shaping east-west traffic through the inspection path.
TGW costs are per-attachment-hour plus per-GB-processed; at 50+ VPCs the operational savings dwarf the per-hour fee.
Cloud WAN is the next step up for hundreds of VPCs across many regions, a managed policy engine on top of TGWs.
PrivateLink is for service exposure, not general mesh connectivity, different primitive, different problem.