Plugging SD-WAN Into a Transit Gateway

February 26, 2029 · 14 min read

Advanced Networking · ANS-C01 · part of The Exam Room

The situation

The estate:

  • Branch offices each have an SD-WAN CPE (customer premises equipment, a vendor box running their SD-WAN software). Branches build encrypted overlay tunnels to a hub SD-WAN appliance running in the cloud, carrying branch traffic over the public internet to AWS.
  • The hub appliance is a vendor virtual appliance (Cisco, Aruba, Versa, Fortinet, Palo Alto, Silver Peak, etc.) running as an EC2 instance in a dedicated “SD-WAN transit” VPC. It terminates the branch overlay tunnels and presents the branch traffic on the VPC interfaces.
  • Connectivity to the rest of AWS is via Transit Gateway. Today, the appliance has a VPN attachment to the TGW: two IPsec tunnels, BGP on top, carrying the branch prefixes into AWS and AWS prefixes back to branches.
  • The throughput problem. Each VPN tunnel tops out at 1.25 Gbps aggregate (bidirectional). ECMP across two tunnels gets ~2.5 Gbps. The hub appliance is a c6i.12xlarge with theoretical 50 Gbps networking, and during peak the VPN attachment is the bottleneck.
  • The double-encryption. Branch-to-appliance is SD-WAN’s own encrypted overlay. Appliance-to-TGW is IPsec. Every packet gets encrypted twice; CPU on the appliance is spent on encryption it didn’t need.

The network team wants:

  • Higher throughput between the SD-WAN appliance and the TGW, beyond the VPN attachment ceiling.
  • Dynamic routing (BGP) so the appliance advertises branch prefixes to the TGW and learns AWS prefixes.
  • Fewer layers of encryption (trust the SD-WAN overlay’s encryption; don’t encrypt again inside AWS).
  • The same multi-region model already in place: hub appliance per region, TGW per region, peered.

What actually matters

Before picking, it’s worth understanding what makes the VPN-style attachment awkward for SD-WAN specifically.

A VPN attachment to a TGW is IPsec-based. The attachment establishes encrypted tunnels between the TGW and the customer gateway device (the appliance in our case). AWS terminates IPsec on its side and enforces a per-tunnel throughput cap that’s low relative to modern instance networking. BGP runs over the tunnels. For a typical customer gateway, an on-prem router over the internet, this is fine: the internet hop is the bottleneck anyway, and encryption is the point.

For an SD-WAN appliance in AWS, talking to a TGW in AWS, the IPsec is redundant. The traffic never leaves AWS’s infrastructure; it goes appliance ENI → TGW over the AWS fabric. Encrypting that hop with IPsec burns CPU on both ends for nothing. And the per-tunnel cap is a real throughput ceiling in a place where AWS’s own fabric could handle tens of Gbps.

The wanted shape is an attachment that:

  • Carries the appliance’s branch prefixes into the TGW dynamically (BGP), without re-encrypting traffic that’s already inside the AWS trust boundary.
  • Treats throughput as a function of the underlying VPC fabric, not a per-tunnel licence cap.
  • Scales out to multiple appliances for HA without an exponential routing-plumbing exercise.

That points at a non-encrypting overlay on top of an existing VPC attachment, with BGP running across it for route exchange and the TGW treating the appliances as logical peers rather than IPsec-terminating customer gateways.

What we’ll filter on

  1. Throughput beyond the per-tunnel cap of an IPsec attachment, does this shape lift the ceiling that the VPN attachment imposes?
  2. Inside-AWS path, does the traffic stay on the AWS fabric rather than leaving it?
  3. Avoids redundant encryption, is the packet encrypted only where encryption is needed?
  4. Dynamic routing, does BGP carry branch prefixes and AWS prefixes dynamically?
  5. Vendor-neutral, works with the major SD-WAN vendors?

The SD-WAN integration landscape

1. VPN attachment from appliance to TGW. Two IPsec tunnels from the appliance’s EIP to the TGW’s VPN endpoints. 1.25 Gbps per tunnel. BGP runs over IPsec. Double-encrypts inside AWS. Throughput-limited, wastes CPU.

2. Transit Gateway Connect attachment + Connect peers. GRE tunnels + BGP from the appliance to the TGW, on top of an underlying VPC attachment. Up to 4 tunnels, ~5 Gbps each. No IPsec between appliance and TGW (the encryption is upstream, in the SD-WAN overlay). The purpose-built shape.

3. Direct ENI routing (no attachment mediation). Configure the appliance’s ENIs to be the next-hop for branch prefixes in the VPC route table; use static routing or a BGP session between the appliance and a separate router. Works in small deployments; doesn’t scale across multiple VPCs the way TGW does.

4. Cloud WAN Connect attachment. Cloud WAN supports Connect attachments with equivalent semantics (GRE + BGP) if the estate is on Cloud WAN rather than TGW. Same architecture, different parent service.

5. Gateway Load Balancer (GWLB) for traffic inspection. Not an SD-WAN integration per se, but worth naming. GWLB is for putting vendor appliances in the data path for packet inspection, not for exchanging routes.

Side by side

Option Throughput beyond VPN cap Inside-AWS path No redundant encryption Dynamic routing Vendor-neutral
VPN attachment ✓ (but IPsec encrypts) ✗ (double-encrypts) ✓ (BGP over IPsec)
TGW Connect ✓ (BGP over GRE)
Direct ENI Limited Partial
Cloud WAN Connect
GWLB n/a n/a n/a

Two Connect peers, four GRE tunnels, one TGW

Branch offices CPE appliances Branch London SD-WAN overlay to hub (encrypted) Branch Munich SD-WAN overlay to hub (encrypted) Branch Madrid SD-WAN overlay to hub (encrypted) Branch Warsaw SD-WAN overlay to hub (encrypted) Transit VPC 10.90.0.0/16 hub for SD-WAN appliances Appliance A (c6i.12xlarge) ENI 10.90.1.10, EIP public terminates branch overlay Connect peer A, BGP AS 65100 GRE inside IPs 169.254.100.1/.5 Appliance B (c6i.12xlarge) ENI 10.90.2.10, EIP public HA peer of A Connect peer B, BGP AS 65101 GRE inside IPs 169.254.101.1/.5 VPC attachment to TGW transport for Connect attachment TGW Connect attachment protocol: gre transport: VPC attachment above 4 GRE tunnels total 2 per Connect peer TGW BGP AS 64512 per tunnel: ~5 Gbps aggregate up to 20 Gbps BGP advertises branch prefixes in AWS VPC prefixes out no IPsec here (SD-WAN overlay already encrypts branch side) Transit Gateway eu-west-1 route table: prod learns branch prefixes via Connect route table: hub Connect propagates here TGW BGP AS 64512 Workload VPCs VPC prod 10.10/16 attached, assoc prod RT VPC shared 10.20/16 attached, assoc prod RT VPC dev 10.30/16 attached, assoc prod RT overlay
The SD-WAN overlay stays encrypted from branch to appliance. Inside AWS, Connect attachment carries plain GRE tunnels with BGP between each appliance and the TGW. No redundant encryption, no VPN throughput cap.

The pick(s) in depth

TGW Connect attachment with two Connect peers (one per appliance), each running two GRE tunnels. The shape the TGW was literally built for.

Walking through the configuration:

1. The underlying VPC attachment. The transit VPC, where the appliances live, has a normal VPC attachment to the TGW. This is the transport layer. The Connect attachment will ride on top of it.

2. Create the Connect attachment. aws ec2 create-transit-gateway-connect --transport-transit-gateway-attachment-id <vpc-attachment-id> --options Protocol=gre. The Protocol is always gre; it’s the only choice. The Connect attachment gets its own attachment ID and can be associated with and propagated into TGW route tables independently of the transport VPC attachment.

3. Create Connect peers. One per appliance. aws ec2 create-transit-gateway-connect-peer --transit-gateway-attachment-id <connect-att-id> --peer-address 10.90.1.10 --inside-cidr-blocks 169.254.100.0/29 --bgp-options PeerAsn=65100 --transit-gateway-address <optional>. The PeerAddress is the appliance’s VPC IP; the InsideCidrBlocks is the /29 (or /125 for IPv6) used for the GRE tunnel’s inside addresses. TGW picks two IPs in this range for the tunnel endpoints (one for the TGW side, one for the peer side); the extra IPs handle dual-tunnel redundancy per peer. The PeerAsn is the appliance’s BGP AS; TGW’s AS is implicit (either the TGW’s configured AS or Amazon’s default 64512).

AWS assigns GRE tunnel inside IPs and publishes them back (TransitGatewayAddress, PeerAddress). The appliance is configured with the matching GRE tunnel endpoints and BGP neighbor statements.

4. Repeat for the second appliance. PeerAsn=65101, different peer address, different inside CIDR.

5. Route table association and propagation. The Connect attachment associates with a route table (typically the same as the transport VPC attachment’s, or a separate “hub” route table). Propagation from the Connect attachment into the workload VPCs’ route tables is how the branch prefixes become reachable. Set propagation on the workload VPCs’ associated route tables from the Connect attachment.

6. Appliance configuration. The SD-WAN vendor’s config syntax varies, but in BGP-over-GRE terms:

  • Two GRE tunnels to the TGW’s inside addresses (one per redundancy pair).
  • BGP neighbors on each tunnel’s TGW inside address, advertising branch prefixes and accepting AWS prefixes.
  • ECMP across the tunnels for active/active use.

Each GRE tunnel sustains roughly 5 Gbps; with two tunnels per peer and two peers, aggregate is closer to 20 Gbps if the appliances can push it.

Why two tunnels per peer rather than one? AWS places the TGW’s inside addresses on different underlying capacity. Two tunnels per peer gives active/active redundancy within the peer’s BGP sessions; if one TGW GRE endpoint has issues, the other tunnel still carries BGP and data traffic. It’s belt-and-braces within one peer; adding a second peer is the appliance-level HA.

A worked traffic path

Branch Munich sends a packet to an instance in VPC prod (10.10.5.22):

  1. Branch CPE encrypts the packet (SD-WAN overlay, IPsec or vendor-proprietary), sends to the hub appliance’s EIP over the public internet.
  2. Hub appliance A receives the overlay packet, decrypts, has an IP packet with source 10.5.8.100 (Munich branch) and destination 10.10.5.22. The appliance’s route table says “route AWS prefixes via the GRE tunnel to TGW.”
  3. Appliance encapsulates the packet in GRE. Outer IP is appliance’s VPC IP → TGW’s inside GRE address. No encryption, just GRE encap. Sends into the VPC.
  4. VPC routing forwards the GRE packet via the VPC attachment to the TGW. The TGW’s Connect attachment unpacks GRE; the TGW now has an inner IP packet 10.5.8.100 → 10.10.5.22.
  5. TGW route table lookup: destination 10.10.5.22 in 10.10.0.0/16, matched by propagation from the prod VPC attachment. Forwarded to prod VPC attachment.
  6. prod VPC delivers to the instance. Instance responds.
  7. Return packet takes the reverse path: TGW recognises 10.5.8.100/24 learned via Connect attachment BGP, GRE-encap back through the Connect peer’s tunnel, appliance re-overlay-encrypts, out to Munich branch.

Throughput at the Connect attachment layer: ~5 Gbps per GRE tunnel, 20 Gbps aggregate across the four tunnels. The VPN ceiling is gone.

What’s worth remembering

  1. Connect attachments ride on VPC or DX attachments. The Connect attachment is a logical overlay; the transport attachment provides the IP layer.
  2. GRE, not IPsec. Connect attachments do not encrypt. The assumption is that encryption is upstream (in the SD-WAN overlay, or inside the AWS fabric’s own trust boundary). Don’t use Connect for paths that require wire-level encryption between the appliance and AWS, use VPN attachments for that.
  3. Up to 4 GRE tunnels per Connect attachment. Each tunnel ~5 Gbps, aggregate up to 20 Gbps. Compare to ~1.25 Gbps per VPN tunnel.
  4. Connect peers are per-appliance BGP endpoints. One Connect attachment, N Connect peers. Two peers for HA (active/active) is the canonical pattern.
  5. BGP over GRE carries routes dynamically. Branch prefixes advertised in; AWS prefixes advertised out. TGW-side BGP AS is 64512 by default; peer AS is the appliance’s AS.
  6. Inside CIDRs are tunnel endpoint addresses. /29 from the link-local 169.254.0.0/16 range; TGW picks the specific addresses from the range.
  7. TGW route table semantics are unchanged. Associate the Connect attachment with a route table; propagate into workload route tables; workload VPCs learn branch prefixes normally.
  8. Cloud WAN has equivalent Connect attachments. Same idea, just the parent service is Cloud WAN’s core network instead of a TGW.

A VPN attachment is the correct tool for over-the-internet connectivity where encryption matters and throughput is the internet’s problem anyway. It is the wrong tool for SD-WAN appliances that live inside AWS and just need a fast GRE pipe to the TGW. Connect attachments are that pipe. The work isn’t learning a new networking concept. GRE and BGP are venerable, it’s noticing that the TGW grew a purpose-built attachment type for this exact shape and using it.

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.