Choosing Private Connectivity Between VPCs

February 10, 2027 · 14 min read

Solutions Architect Pro · SAP-C02 · part of The Exam Room

The situation

The organisation has a mesh of internal APIs. The pricing team runs a service in their VPC that the checkout team calls from its VPC. Today:

  • Pricing has a public ALB with a TLS cert.
  • Checkout’s Fargate tasks egress through NAT and hit api.pricing.internal.company.com, which resolves to the ALB.
  • Security groups on the ALB allow specific source IPs, or, more often, 0.0.0.0/0 because NAT gateway IPs are hard to manage and the TLS cert is “good enough.”
  • Flow Logs record the call as outbound traffic through NAT to the internet. Security dashboards can’t tell it from a SaaS call.

Security has asked for internal calls to stay on AWS’s network. We need to walk the options for private VPC-to-VPC service connectivity and pick the one that scales when the mesh grows to twenty services.

What actually matters

The core trade in internal service connectivity is privacy in exchange for coupling. Public HTTPS is zero-coupling (any client, any network, any IP); a private path is tighter (only allow-listed consumers, only AWS network, no public IP). The privacy is what we want; the coupling is what we pay.

The first thing to ask is: who initiates? A consumer-initiated model (the calling VPC creates an endpoint into the provider’s service) decouples ownership cleanly: the provider doesn’t have to know consumer subnets, and the consumer doesn’t have to know provider topology. A network-peer model (both sides know each other) couples them tighter and gets harder as the mesh grows.

The second is: what shape does the provider expose? A load balancer fronting the service is the usual abstraction, but not every load balancer type plays nicely with every connectivity option. Whether the provider has to swap out their existing fronting layer is part of the cost.

The third is: who sees whom? The more the consumer learns about the provider’s topology (VPC CIDRs, subnet IDs, instance IPs), the more brittle the relationship. A model where the provider’s topology stays opaque to the consumer scales better; a model where both sides see each other’s CIDRs locks them into avoiding overlap forever.

The fourth is: how is consent granted? IP-based allow-lists rot (NAT addresses change, teams forget). A principal-based access model, “this AWS account or this role may connect”, survives the next reorg without an audit.

The fifth is: DNS. Consumers want to call api.pricing.internal.company.com, not a generated endpoint name. Whichever option we pick has to either let the provider publish its own domain to the consumer or leave each consumer to maintain their own CNAME, with the operational cost that implies.

What we’ll filter on

  1. Public internet path required, does traffic ever leave AWS?
  2. Cross-account and cross-VPC, how do we connect accounts without peering?
  3. Principal-based access control, can we allow-list who can connect?
  4. Client-IP preservation, does the provider see the original client?
  5. Cost profile, per-endpoint, per-hour, per-GB?

The connectivity landscape

  1. PrivateLink (Interface Endpoint + Endpoint Service). Consumer creates an interface endpoint in their VPC; provider publishes a service fronted by an NLB. Traffic never touches the internet. Principals-based access control; optional auto-acceptance. DNS via endpoint-private DNS or consumer-owned Route 53 PHZ. Per-endpoint hourly charge + per-GB processed. The answer for one-to-many provider-consumer relationships.

  2. VPC peering. Direct VPC-to-VPC connection; routes added to both VPC route tables. Fine-grained by security group; both VPCs’ CIDRs must not overlap. No transitive routing, peering is pairwise. Works cleanly for small meshes; fails at scale because N-squared peering relationships.

  3. Transit Gateway. Hub-and-spoke routing across many VPCs. No CIDR overlap needed within the hub (though care is required with route-table design). Workloads in different VPCs communicate via the hub, with security groups on each side controlling access. Exposes the VPCs’ topologies to each other by default, any instance in VPC A can reach any instance in VPC B unless security groups say otherwise.

  4. Public ALB with WAF and IP allow-list. The status quo. Traffic goes over the internet; security is the TLS cert + WAF rules + security group source-IP allow-list. Operationally simple, security-wise blunt (NAT IPs change, allow-lists rot).

  5. VPC Lattice. AWS’s service-mesh-like product for internal service connectivity. Auth via IAM auth policies, routing by service name, no per-service load balancers required. Useful for many-to-many service meshes at scale.

  6. PrivateLink with NLB-ALB target. A variation: the provider’s NLB has an ALB as its target, so the provider gets layer-7 routing, TLS termination, etc. The consumer still uses a PrivateLink endpoint; the extra hop is transparent.

Side by side

Option Public internet Cross-account Principal access Client IP Cost
PrivateLink ✓ (principals list) Optional (preserve-client-ip) Per-endpoint + per-GB
VPC peering ✓ (per peering) SG only ✓ (native) Per-GB cross-AZ
Transit Gateway ✓ (shared) SG + routes ✓ (native) Per-attachment + per-GB
Public ALB + WAF IP allow-list + WAF + TLS ✓ (X-Forwarded-For) ALB + WAF + NAT + transit
VPC Lattice IAM auth policy Via X-Forwarded-For Per-service + per-request
PrivateLink + NLB->ALB Principals + WAF on ALB Optional Per-endpoint + per-GB + ALB

Reading by use case:

  • One-to-many internal API (one provider, many consumers): PrivateLink. Each consumer gets an endpoint; provider has one service.
  • Many-to-many mesh of 10+ services: VPC Lattice if the organisation is ready for it; TGW + security groups otherwise.
  • Two-to-two VPCs that share a team: VPC peering, simplest.
  • Provider requires layer-7 routing: PrivateLink with NLB->ALB.
  • Service needs to be consumed from outside the organisation: PrivateLink with account allow-list for the external AWS account, or public ALB + mTLS.
Consumer VPC (checkout, account A) Fargate: checkout-api calls https://api.pricing.internal.company.com resolves to endpoint private IP SG allows outbound 443 to endpoint SG Route 53 PHZ: pricing.internal.company.com A alias -> endpoint DNS name attached to consumer VPC (or: private DNS enabled on service) Interface Endpoint com.amazonaws.vpce.eu-west-1.pricing-svc ENI in each subnet (AZ-resilient) SG: allow 443 from checkout tasks endpoint policy: allow principal checkout-role DNS name auto-provisioned state: available acceptance: auto (consumer in allow list) source: checkout account-ID Provider VPC (pricing, account B) Endpoint Service service name: com.amazonaws.vpce...pricing-svc acceptance required: no (auto) allowed principals: arn:...:account/A, ... Network Load Balancer internal, private subnets only TLS listener :443 target: pricing-tg (ALB target type) ALB (internal) layer-7 routing, TLS termination TG: Fargate tasks health checks pass up to NLB Fargate: pricing-api reads client IP from X-Forwarded-For mTLS optional SG: allow 443 from ALB SG AWS backbone no public internet
Consumer's endpoint ENIs connect over AWS backbone to the provider's endpoint service and NLB. Provider can chain NLB to ALB for layer-7 routing. Consumer never sees the provider VPC.

The picks in depth

Provider side: the NLB and the endpoint service. Pricing’s VPC has an internal NLB in two AZs. The NLB’s target group uses the “ALB” target type, pointing at an internal ALB that terminates TLS and does path-based routing to Fargate tasks. This is the common shape: NLB is required for PrivateLink, ALB gives you the layer-7 features.

Create the endpoint service:

aws ec2 create-vpc-endpoint-service-configuration \
  --network-load-balancer-arns arn:aws:elasticloadbalancing:...:loadbalancer/net/pricing-nlb/... \
  --acceptance-required \
  --tag-specifications 'ResourceType=vpc-endpoint-service,Tags=[{Key=Name,Value=pricing}]'

--acceptance-required means new connections wait for provider approval. For an internal allow-listed use case, flip this to --no-acceptance-required and use modify-vpc-endpoint-service-permissions to allow specific principals:

aws ec2 modify-vpc-endpoint-service-permissions \
  --service-id vpce-svc-xxx \
  --add-allowed-principals arn:aws:iam::AAAAAAAAAAAA:root arn:aws:iam::BBBBBBBBBBBB:root

The principals can be entire accounts (with :root), specific IAM roles, or an organisation ID (arn:aws:organizations::o-xxx:organization/o-xxx). Organisation-level allow-listing is clean: every account in the org can consume, nothing external can.

Private DNS on the service. If the provider owns pricing.internal.company.com, they can prove ownership via a DNS TXT record (AWS issues a verification token) and enable private DNS on the endpoint service. Consumers’ VPCs with “DNS resolution” and “DNS hostnames” enabled will automatically resolve the provider’s FQDN to the endpoint’s private IPs, as long as the consumer’s endpoint has PrivateDnsEnabled=true. No per-consumer Route 53 work.

Consumer side: the interface endpoint. Checkout creates the endpoint in their VPC:

aws ec2 create-vpc-endpoint \
  --vpc-id vpc-consumer \
  --service-name com.amazonaws.vpce.eu-west-1.vpce-svc-xxx \
  --vpc-endpoint-type Interface \
  --subnet-ids subnet-a subnet-b \
  --security-group-ids sg-endpoint \
  --private-dns-enabled

ENIs appear in each chosen subnet. The endpoint’s security group controls who can reach the endpoint ENI (typically: checkout’s Fargate task SG). An endpoint policy further restricts: Action: * Resource: * Principal: arn:aws:iam::AAAAAAAAAAAA:role/checkout-task, only checkout tasks, not any principal in the account.

Endpoint policies for least privilege. Without an endpoint policy, anything in the consumer VPC with network access to the ENI can call the service. The endpoint policy is a belt-and-braces IAM check, typically writing Effect: Allow Principal: arn:...:role/checkout-task Action: * means only that role can use the endpoint, even from the consumer VPC. This matters when the consumer VPC has multiple workloads and only some should reach the provider.

Private DNS and legacy FQDNs. Most internal APIs already have DNS names like api.pricing.company.internal. With private DNS on the service, the consumer’s resolver returns the endpoint’s IPs automatically, no code change in checkout’s clients. Without private DNS, each consumer creates a Route 53 private hosted zone for the pricing FQDN with a CNAME to the endpoint’s DNS name. The former is simpler when the provider owns the FQDN; the latter is necessary when FQDNs are owned per-consumer.

Client-IP preservation. Classic NLBs rewrite the source IP of connections to the NLB’s own IP. The provider’s ALB sees the NLB, not the original client. Preserve client IP by enabling ClientIPPreservation on the NLB’s target group (works when targets are IPs or instances, not when targets are another ALB). With this, the provider’s ALB sees the original consumer IP in the TCP stream, and X-Forwarded-For logic works end-to-end.

Cost model. Interface endpoints cost roughly $0.01/hour per endpoint per AZ plus $0.01/GB processed. For a 24/7 endpoint in two AZs, that’s ~$14/month in fixed cost plus per-GB. Cheaper than NAT + transit + ALB for most internal traffic; more expensive than peering if volume is very high. Costs compound across endpoints, 20 consumer services with 3 AZs each = 60 endpoint-AZs = ~$140/month fixed, before a byte flows.

A worked cross-account connect

Checkout in account A wants to use pricing in account B.

  1. Pricing team’s SRE adds arn:aws:iam::AAAAAAAAAAAA:root to the endpoint service’s allowed principals.
  2. Checkout team’s SRE creates an interface endpoint pointing at com.amazonaws.vpce.eu-west-1.vpce-svc-xxx, with private DNS enabled.
  3. The endpoint state transitions pending -> available within a minute (auto-accept because of the allow list).
  4. Checkout tasks call https://api.pricing.internal.company.com/v1/quote; DNS resolves to the endpoint’s IPs; TLS handshake goes through the endpoint ENI to the NLB to the ALB to the task.
  5. The pricing task sees X-Forwarded-For: <checkout-task-IP> in the HTTP header.
  6. CloudWatch metrics on both sides show the bytes. Flow Logs show traffic between the endpoint ENI and the consumer workload; the provider sees traffic between its ALB and its Fargate tasks. No NAT, no public internet, no IPs the security team has to allow-list on a public edge.

A worked failure: acceptance required

A new consumer, reports-prod, tries to create an endpoint to the pricing service. They’re not in the allowed principals list.

Endpoint state: pendingAcceptance. Pricing’s console shows a pending connection request with the consumer’s account ID. Pricing SRE accepts manually. Endpoint transitions to available.

For a mesh with many consumers, the manual acceptance overhead is annoying but also a useful audit trail, every new consumer is a conscious decision by the provider. Organisations with a larger mesh usually switch to auto-acceptance with org-wide principal allow-listing, accepting the looser model in exchange for operational simplicity.

What’s worth remembering

  1. PrivateLink is consumer-initiated, provider-gated. The consumer creates an endpoint; the provider controls who can use the service via the principal allow-list.
  2. NLB is required; ALB is allowed via NLB-ALB target type. Pure ALB can’t be fronted by PrivateLink directly, but NLB -> ALB works and is the common pattern when layer-7 features are needed.
  3. Private DNS lets consumers use the provider’s FQDN unchanged. Requires DNS domain ownership verification on the provider side and “private DNS enabled” on each consumer endpoint.
  4. Endpoint policies provide IAM-layer least privilege. Security groups stop traffic at the network layer; endpoint policies stop it at the IAM layer. Use both.
  5. Traffic never touches the internet. The whole value proposition; the NAT, firewall, and public IP concerns disappear for internal calls.
  6. Client-IP preservation is a target-group toggle. Off by default, on when needed, and you have to use IP or instance targets (not ALB targets) to use it.
  7. Cost compounds with endpoints. One endpoint per AZ per consumer. For a mesh of 20 services and 3 AZs, that’s 60 endpoints per consumer account. Plan for it.
  8. For many-to-many meshes, consider VPC Lattice. PrivateLink per service scales to tens of services; beyond that the endpoint-management overhead starts to hurt, and Lattice’s per-service-name routing becomes attractive.

Two teams, one AWS backbone, no public internet. The endpoint service, the NLB-ALB chain, and the principal allow-list are the three pieces; once they’re wired up, internal calls look the same from application code and completely different from the network security dashboard.

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.