How to Carry Market Data Multicast Into AWS

February 28, 2029 · 13 min read

Advanced Networking · ANS-C01 · part of The Exam Room

The situation

The firm runs a market data platform that consumes a handful of multicast feeds from an exchange. Each feed is a multicast group: a specific IPv4 group address (239.10.x.y) plus a UDP port. Packets arrive at the firm’s edge, get distributed on the LAN via multicast routing, and dozens of consumer processes on dozens of hosts join the relevant groups and receive the packets at line rate.

The business wants to move the downstream analytics from on-prem EC2-equivalent hardware into AWS, behind a Direct Connect link. The feed itself still arrives on-prem (the exchange requires physical proximity); the firm will re-publish the multicast streams from on-prem into AWS over Direct Connect, and the analytics consumers will run in VPCs.

Constraints:

  • The protocol is multicast. Rewriting every consumer to pull from a queue is not on the table for timeline reasons. The consumers expect IGMP group membership and multicast UDP delivery.
  • Many consumers, many VPCs. Roughly 40 consumer processes across 8 VPCs, many of them in different accounts.
  • Low-latency-sensitive. Packets that are seconds late are worthless. End-to-end budget from packet-arrival-on-prem to consumer-receipt is tens of milliseconds.
  • Durable against consumer churn. Consumers come and go (deploys, autoscaling); the multicast infrastructure must handle group membership changes cleanly.

Options:

  • Unicast fan-out at the publisher. On-prem publisher opens TCP (or UDP) connections to every consumer and replicates. Trivial at small scale; O(N) at the publisher and bandwidth; not sustainable at 40+ consumers.
  • Managed pub-sub (MSK, Kinesis Data Streams). Replace the multicast protocol with a Kafka or Kinesis stream. Requires consumer rewrite.
  • Transit Gateway Multicast. TGW as a multicast router; publishers and consumers join groups; TGW replicates.
  • Overlay multicast via a software router (PIM on Linux EC2). Run multicast routers in AWS as EC2; software-only solution.

What actually matters

Before picking, it’s worth recalling how multicast actually works at the IP layer.

Classical multicast has two components:

  • A sender (source) emits UDP packets to a group address, e.g., 239.10.1.5:30001.
  • Receivers (members) signal their interest in the group by sending IGMPv2 Membership Reports on their LAN. Routers listen for these reports and build forwarding state: “group 239.10.1.5 has members on interfaces A, B, and C; replicate incoming packets for that group onto A, B, and C.”

Within a LAN, IGMP snooping on a switch ensures multicast traffic is only delivered to ports that have joined the group, without snooping, multicast floods. Between LANs, multicast routing protocols (PIM, DVMRP) build distribution trees so a single source packet fans out to all members efficiently.

In AWS VPCs, the fabric traditionally didn’t replicate multicast. An ENI could send a multicast-group-destined packet, and no other ENI would receive it. Whatever we choose has to put a multicast-aware replicator somewhere, either in the fabric or in a software appliance we run, with three properties:

  • A logical container that tracks group membership across all participating subnets, so consumers can be in different VPCs and accounts without the publisher caring.
  • An ingress mechanism for sources, since Direct Connect carries unicast and the multicast feed has to be re-injected as multicast once it’s inside AWS.
  • A signalling path for members. Either each ENI joins explicitly via API, or the consumer’s standard socket call generates the on-wire IGMP report the network already knows how to handle.

The on-prem hop is the other piece. Direct Connect doesn’t carry multicast itself, so the publisher on-prem has to re-transmit the multicast stream over a unicast tunnel to an in-AWS relay, and the relay emits packets to the group inside AWS. The replicator distributes from there.

What we’ll filter on

  1. Native multicast on the wire, consumers use standard sockets and IGMP, no rewrite?
  2. Cross-VPC distribution, members in different VPCs receive?
  3. Cross-account distribution, members in different accounts receive?
  4. Low-latency replication, the replication fabric doesn’t add seconds of latency?
  5. Managed, not appliance-based, no software routers to run?

The multicast landscape

1. Unicast fan-out at publisher. Publisher maintains O(N) TCP/UDP sessions. Works for small N; bandwidth and publisher CPU scale linearly. Fails on native multicast and on scale.

2. Managed pub-sub (MSK, Kinesis). Replace the wire protocol. Rewrite consumers. Mature, scalable, but the wire protocol is the reason we’re using multicast in the first place. Fails on “no rewrite.”

3. Transit Gateway Multicast. TGW-managed multicast domain, IGMPv2 membership, cross-VPC, cross-account. No software routers. Low replication latency. Ticks native, cross-VPC, cross-account, managed.

4. Software PIM router on EC2. Run a Linux box with smcroute or a commercial router in the VPC; consumers tunnel to it via GRE. Possible, operationally expensive, no multicast-aware VPC fabric underneath, the software router just tunnels multicast as unicast GRE to each subscriber. Back to fan-out plus more stuff to run.

Side by side

Option Native multicast Cross-VPC Cross-account Low-latency replication Managed
Unicast fan-out
Managed pub-sub
TGW Multicast
Software PIM Partial Partial Partial Partial

Sources, domains, and members

On-prem Market data publisher emits 239.10.1.5:30001 to GRE tunnel over DX Relay VPC Multicast injector EC2 terminates GRE emits on group address Source ENI registered via register-transit-gateway- multicast-group-sources group 239.10.1.5 TGW Multicast Domain tgw-mcast-dom-0abc1234 IGMPv2 support: true StaticSourcesSupport: true Group 239.10.1.5 source: injector ENI members: vpc-a, vpc-b, vpc-c ENIs VPC attachments relay-vpc assoc consumer-a, -b, -c assoc Replication TGW replicates to all member ENIs of the group per-group join/leave tracked via IGMPv2 reports Consumer VPC A (acct 1111) analytics-1 joined group analytics-2 joined group member ENIs receive replicated packets Consumer VPC B (acct 2222) risk-engine-1 joined group risk-engine-2 joined group Consumer VPC C (acct 3333) recorder-1 joined group recorder-2 joined group IGMPv2 report
The multicast domain is the logical container. An injector ENI registers as source; consumer ENIs send IGMPv2 membership reports and become members; the TGW replicates packets to every member.

The pick(s) in depth

TGW Multicast Domain with IGMP support, a multicast injector EC2 as registered source, consumer ENIs as IGMP members. The managed shape that gets multicast working across VPCs without rewriting consumers.

1. Create the TGW Multicast Domain. aws ec2 create-transit-gateway-multicast-domain --transit-gateway-id tgw-0abc --options Igmpv2Support=enable,StaticSourcesSupport=enable,AutoAcceptSharedAssociations=enable. Enabling IGMP means the domain will listen for IGMPv2 reports from ENIs in associated VPCs; enabling static sources means sources are registered via API (not via IGMP join to the group as sender, standard IGMP doesn’t have sender semantics).

2. Associate VPC attachments with the domain. For the relay VPC and each consumer VPC: aws ec2 associate-transit-gateway-multicast-domain --transit-gateway-multicast-domain-id <dom-id> --transit-gateway-attachment-id <vpc-att-id> --subnet-ids subnet-<multicast-enabled-subnet>. Associating at the subnet level limits which subnets can contain sources or members.

3. Register the source ENI. aws ec2 register-transit-gateway-multicast-group-sources --transit-gateway-multicast-domain-id <dom-id> --group-ip-address 239.10.1.5 --network-interface-ids eni-<injector-eni>. Now the domain knows where packets for this group come from.

4. Consumer ENIs join via IGMPv2. On a Linux EC2 instance, a process that calls setsockopt(IP_ADD_MEMBERSHIP, ...) on a UDP socket bound to the group triggers an IGMPv2 Membership Report on the ENI. The AWS dataplane picks it up and the TGW records the ENI as a member of the group. No AWS API call is needed from the consumer side, standard socket code produces standard IGMP packets and AWS handles them.

5. The injector. A small EC2 running a daemon that receives the unicast-encapsulated stream from on-prem (over Direct Connect, via GRE or a vendor-specific relay) and transmits the decapsulated packets to the multicast group address on the appropriate UDP port. The injector sends packets with destination 239.10.1.5 via its ENI; the fabric delivers them to the multicast domain; the TGW replicates to members.

6. IPv4 allow-multicast on ENIs. ENIs serving as multicast sources or members must have source-dest-check handled correctly (the specific flag varies with configuration). More importantly, the subnets must have route-table entries for the multicast group addresses pointing to “local” (default), no extra route entries typically needed, but the subnet must be in a multicast-capable VPC attachment association.

7. Network ACL and security group rules. The ENI’s security group must allow inbound UDP from the source to the group port; the subnet NACL must allow the same. Many “multicast not working” issues turn out to be a closed security group or a NACL that allows unicast ranges but denies the multicast 239.x range.

A worked join

A new analytics consumer instance boots in consumer-a VPC:

  1. Userspace process calls setsockopt(sock, IPPROTO_IP, IP_ADD_MEMBERSHIP, &group_req, sizeof(group_req)) on a UDP socket, where group_req specifies group 239.10.1.5 and interface eth0.
  2. The kernel generates an IGMPv2 Membership Report packet and sends it on eth0.
  3. The ENI forwards the packet to the TGW (via the VPC attachment’s multicast-domain association).
  4. The TGW records the ENI as a member of group 239.10.1.5 in the multicast domain.
  5. Within seconds, incoming packets to that group are replicated to this ENI alongside existing members’.
  6. The consumer process’s recvfrom on the UDP socket starts receiving packets.

When the process exits (or sends IP_DROP_MEMBERSHIP), the kernel sends an IGMPv2 Leave Group message; the TGW removes the ENI from the group’s member list. Replication stops to that ENI.

Autoscaling: new instances that join the ASG run the consumer code, which joins the group on startup; instances that terminate send their leave messages. The TGW tracks membership dynamically across the churn.

What’s worth remembering

  1. TGW Multicast is the AWS-native multicast primitive. VPC fabric alone doesn’t route multicast; TGW Multicast does, via multicast domains that span VPCs and accounts.
  2. Domain options matter at creation. Igmpv2Support, StaticSourcesSupport, AutoAcceptSharedAssociations. Enable IGMP for consumer-side dynamic join; enable static sources for publisher-side registration; set auto-accept if the domain is shared via RAM and you want new attachments to associate without approval.
  3. Sources are registered via API. Consumers (members) join via IGMPv2. One is explicit, the other is implicit from application sockets.
  4. Association is per-subnet within a VPC attachment. Only ENIs in associated subnets can participate as sources or members.
  5. Cross-region needs TGW peering with static sources. IGMPv2 does not propagate across peering; static source registration is required on both sides for the members across the peering.
  6. Direct Connect doesn’t carry multicast natively. You need a unicast encapsulation (GRE, vendor relay) for the on-prem → AWS hop, and an injector EC2 that decapsulates and emits to the multicast group inside AWS.
  7. Security groups and NACLs must allow the multicast traffic. Inbound UDP from the source to the group port, including the multicast destination address ranges.
  8. Not all instance types support multicast receive. Check the supported list before deploying; current-gen Nitro instances generally do. Older platforms may not.

Multicast in the cloud used to mean “run your own overlay”; with TGW Multicast, it means “associate the VPC, register the source, let consumers join via socket API.” The operating model is the same as an on-prem switch with IGMP snooping; the control plane is an AWS API. The work isn’t re-inventing multicast, it’s knowing that the primitive exists, and plumbing the registration correctly so the fabric does the replication you already know how to consume.

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.