The situation
The network today:
- 30 Regions, each with a Transit Gateway. VPCs attach to the local TGW; the TGWs are peered in a partial mesh (full mesh would be 435 peerings, which nobody wants). The partial mesh carries most inter-region traffic through three “hub” Regions –
us-east-1,eu-west-1,ap-southeast-1. - Branch offices connect via Site-to-Site VPN to the nearest hub’s TGW or via SD-WAN appliances also attached to the hub TGWs.
- Segmentation is enforced by TGW route tables. Three broad domains today –
prod,nonprod,shared, implemented as separate TGW route tables per Region, with propagations and static routes configured per attachment. Prefix filtering on peering attachments controls what crosses Regions. - Operational reality: adding a Region takes a fortnight. Peerings, route tables, association and propagation logic, prefix filters, VPN re-homing, all scripted, but the script has grown to the point that nobody fully owns it.
The business wants to open three more Regions this quarter. The network team wants a sit-down.
Options in scope:
- Status quo, but better scripts. Lean into Terraform modules and accept the operational tax.
- AWS Cloud WAN, the managed global network with segments defined in JSON policy.
- Third-party SD-WAN overlay, a vendor-managed mesh on top of AWS transport.
- Flatten segmentation, collapse the route-table complexity by giving up some isolation.
What actually matters
Before picking, it’s worth asking what the TGW mesh is actually costing and what we’d gain from replacing it.
The first cost is configuration distance from intent. The business asks for “production is isolated from non-production globally.” Delivering that intent today means maintaining 30 TGWs × 3 route tables × N attachments, plus peering-attachment prefix filters, plus propagation rules. Auditing whether a given production VPC in ap-northeast-2 can actually reach a non-production VPC in sa-east-1 requires walking five route tables and two peering filters. The intent-to-configuration ratio is terrible.
The second cost is blast radius of per-Region change. A change to segmentation, say, adding a fourth domain for a new regulatory boundary, touches every TGW. Thirty change windows, thirty rollback plans, and a non-trivial chance of inconsistency. There’s no single place where “our global segmentation” lives.
The third cost is inter-Region attachment modelling. TGW peering is point-to-point. A hub-and-spoke routed through us-east-1 works, but every prefix that crosses Regions is explicitly routed via the hub’s TGW route table, and transit through us-east-1 for eu-west-2 to ap-southeast-1 traffic is a latency and dollar cost that accumulates.
The fourth cost is what we’d give up to flip the abstraction. Replacing the mesh with a single global object means a different way of expressing segmentation, not per-Region route tables but a policy document that defines isolation across the whole estate. The trade-off is flexibility: any pattern that relies on per-attachment routing tricks (asymmetric peerings, selective prefix leaking between route tables) takes more thought in a policy-defined world, and the team has to learn the new operational model and policy-as-code flow before the operational saving lands.
What we’ll filter on
- Global single object, one thing to reason about across all Regions, not N-per-Region.
- Policy-defined segmentation, segments described in code, applied atomically.
- Automatic inter-Region routing, no manual peering mesh.
- Branch integration. VPN and SD-WAN attach naturally.
- Migration path from TGW, can we get there without a flag day?
The global-network landscape
1. Transit Gateway mesh (status quo). Per-Region TGWs, partial-mesh peering, route tables per Region per domain. Mature, well-understood, but doesn’t scale cleanly past roughly 10-15 Regions before the operational tax becomes untenable. Everything is explicit; nothing is global.
2. AWS Cloud WAN. A single core network object spanning Regions. Segments are the segmentation primitive; each segment has attachments, sharing rules, and routing behaviour defined in the core network policy. Edge locations are Regions the core network extends into, chosen at core-network creation and updateable via policy. Attachments come in flavours: VPC, Site-to-Site VPN, Transit Gateway (for hybrid migration), SD-WAN via Connect attachments. Policy is a JSON document applied atomically; changes go through a two-step flow (create change set, execute change set) so you can see the delta before committing.
3. Third-party SD-WAN overlay. Vendor appliances in each Region VPC, running a control plane across public or Direct Connect transport. Replaces AWS’s inter-Region connectivity with the vendor’s. Useful when the same vendor already runs the on-prem WAN and we want one pane of glass. Adds appliance licensing, throughput licensing, and a data-path the vendor controls rather than AWS.
4. Flat network, less segmentation. Collapse the three domains into one and eliminate the per-Region route-table duplication. The simplest answer; also the wrong one in any serious organisation, because regulated workloads need real isolation and flat networks are audit nightmares.
Side by side
| Option | Global single object | Policy-defined segmentation | Automatic inter-Region routing | Branch integration | TGW migration path |
|---|---|---|---|---|---|
| TGW mesh | ✗ | Per-Region only | ✗ (manual peering) | ✓ | n/a |
| Cloud WAN | ✓ | ✓ | ✓ | ✓ | ✓ (TGW attachment) |
| SD-WAN overlay | Vendor-defined | Vendor-defined | ✓ (vendor fabric) | ✓ | Parallel |
| Flat network | ✓ (trivially) | ✗ | ✓ | ✓ | ✗ |
The Cloud WAN policy in a picture
The pick(s) in depth
Cloud WAN for the global backbone, keeping TGWs where they still earn their place. The realistic path is not a big-bang migration; it’s a gradual one.
The core network lives as a JSON policy document. The minimum viable policy for our situation:
version: 2021.12
core-network-configuration:
asn-ranges: [64512-64555]
edge-locations:
- location: eu-west-1
- location: us-east-1
- location: ap-southeast-1
segments:
- name: prod
edge-locations: [eu-west-1, us-east-1, ap-southeast-1]
isolate-attachments: false
- name: nonprod
edge-locations: [eu-west-1, us-east-1, ap-southeast-1]
isolate-attachments: false
- name: shared
edge-locations: [eu-west-1, us-east-1]
isolate-attachments: false
segment-actions:
- action: share
segment: shared
share-with: [prod, nonprod]
attachment-policies:
- rule-number: 100
conditions:
- type: tag-exists
key: env
value: prod
action:
association-method: constant
segment: prod
- rule-number: 200
conditions:
- type: tag-exists
key: env
value: nonprod
action:
association-method: constant
segment: nonprod
The three patterns that earn understanding:
Segments define isolation by default. A segment is a distinct routing domain. Attachments in the same segment can reach each other (across Regions, automatically, no peering to configure). Attachments in different segments cannot, unless a segment-actions rule shares routes between them.
share-with is directional. shared → [prod, nonprod] means shared’s routes are advertised into prod and nonprod; prod and nonprod can reach the shared-services VPCs. The inverse is not automatic, shared does not see prod’s routes. That asymmetry is the point: a shared DNS resolver or monitoring VPC should be reachable from everywhere without the everywhere being able to reach each other through it.
Attachment policies auto-associate VPCs to segments by tag. A VPC attachment lands in prod because its CloudFormation template set env=prod on the attachment itself. No manual association step, no per-Region route-table editing. Spin up a new VPC in eu-central-2 next week, tag it, attach, it inherits the segment and all the routing, Region-to-Region included.
TGW attachments bridge the mesh. A TGW can be attached to the core network as a first-class attachment. During migration, the existing TGW mesh sits in a dedicated segment with share-with rules that match the old topology; new VPCs come up attached directly to Cloud WAN; old VPCs migrate one at a time by detaching from TGW and attaching to Cloud WAN. No flag day.
A worked migration step
Monday morning, we cut over a single non-critical VPC in ap-northeast-2 from the TGW mesh to Cloud WAN. The VPC currently has a TGW attachment with routes to 10.0.0.0/8 via the local TGW, and that TGW is peered to the ap-southeast-1 hub TGW, which in turn peers to us-east-1 and eu-west-1.
The cutover steps:
- Tag the VPC.
env=nonprodon the Cloud WAN attachment configuration. - Create the Cloud WAN VPC attachment. Attach the VPC to the core network in
ap-northeast-2. The attachment lands in segmentnonprodby rule 200. Routes start propagating: the VPC learns all othernonprodattachments’ prefixes globally, plus shared services’ prefixes viashare-with. - Update VPC route tables. Replace the TGW route-target on the VPC subnets with the Cloud WAN core network attachment. Two routes swap over atomically within the VPC.
- Remove the TGW attachment. Once the new path is verified, detach from the local TGW. The TGW mesh loses one endpoint.
- Validate.
aws networkmanager get-network-routeson the segment shows the VPC’s prefix propagating globally. Ping fromus-east-1prod (via shared services) and fromeu-west-1nonprod (directly) both land.
No peering edits, no route-table surgery across multiple TGWs, no change windows in three other Regions. The operational cost of the migration is in this one account, and every subsequent VPC migration is almost the same script.
What’s worth remembering
- Cloud WAN is one global object; TGW is N regional objects. A core network spans Regions; adding a Region is an edge-location edit in the policy. TGWs are per-Region and peer point-to-point; adding a Region is a mesh-edit exercise.
- Segments are the isolation primitive, not route tables. Attachments live in one segment. Segments are isolated by default. Cross-segment connectivity is explicit via
segment-actionswithshare-with. share-withis directional. Shared into prod does not mean prod into shared. This asymmetry is what makes shared-services VPCs clean.- Attachment policies auto-associate via tags. Tag the VPC attachment with
env=prod, the policy puts it in theprodsegment. No manual association step per VPC. - Edge locations are Regions; CNEs are the managed infrastructure inside them. AWS stands up the Core Network Edge automatically when a Region is added to the policy. You don’t run appliances.
- Cloud WAN accepts VPC, VPN, TGW, and Connect attachments. TGW attachment is the migration bridge: keep the existing mesh running as a segment while new VPCs go direct to Cloud WAN.
- Policy changes go via change sets. Create, review the delta, execute. The change is atomic across edge locations, no half-applied policies.
- Inter-Region data transfer is billed. Cloud WAN isn’t free; inter-edge-location traffic is priced per GB, similar to TGW peering. The saving is operational, not always financial, model the traffic before committing.
Thirty Regions and a partial-mesh TGW topology is an operational debt that only grows. Cloud WAN turns the global network into one object defined by policy, with segments as the isolation primitive and attachment tags as the assignment rule. The migration is gradual via TGW attachments; the end state is a network that can add a Region in an afternoon instead of a fortnight. The work isn’t throwing away what we’ve built, it’s moving the global topology into one place we can actually reason about.