The situation
Vantage Observability runs an ingestion API for metrics, traces, and logs. Today it sits behind an Application Load Balancer with a public hostname – ingest.vantageobs.example, and customers authenticate with short-lived tokens. Four of eight enterprise customers signed this year have asked the same question in procurement: “Can you deliver this without exposing our telemetry to the public internet? Our security team will not approve egress to an internet destination for production data.”
Vantage needs a single endpoint, offered from one account, that many customer accounts can consume privately, with per-customer authorisation, and with the vendor keeping control of the server side. The tools in the box are VPC peering, Transit Gateway, PrivateLink endpoint services, and “keep the public endpoint and lock it down”. The question is which one fits the shape of a SaaS relationship between separate companies.
What actually matters
Private connectivity between AWS accounts is rich with primitives, but the interesting constraints here are contractual and organisational, not just technical. Worth pulling apart what the customer is actually asking for and what each primitive demands in exchange.
Contractual semantics matter as much as packet paths. The customers said “must not traverse the public internet”. That’s about the path the bytes take, not the authentication at the endpoint. A heavily-authenticated public endpoint still fails this requirement. An unauthenticated private endpoint meets it (though we obviously still need authentication for other reasons). Packet path and credential strength are separate properties; confusing them is how procurement conversations end poorly.
Overlapping CIDRs are a given, not a bug. Vendor and customer networks were designed independently and both use RFC1918 liberally. Almost any Fortune-500 customer will use parts of 10.0.0.0/8 Vantage also uses. A primitive that requires non-overlapping address space (peering, TGW) requires the customer to either renumber or carve out a dedicated /24, an ask so large that most enterprise deals die on it. The primitive has to let both sides keep their address spaces unchanged.
The relationship is asymmetric. One vendor, many customers, vendor publishes, customer consumes. Primitives designed for shared governance (TGW) fit one-tenant-plus-branches topologies, not vendor-to-customer. Whichever primitive we pick has to express publisher and consumer as distinct roles with distinct controls, not as “both sides agree to a peering”.
Per-customer authorisation with revocation. Eight customers this year, more next year, churn expected. Adding or removing a customer must be a small, reviewable change, ideally an ARN on an allow-list, not a network engineering project with a peering connection to retire and route tables to clean up.
Hostnames the customer already uses. Existing SDKs and telemetry agents are configured with ingest.vantageobs.example. A solution that forces every customer to reconfigure their telemetry to a vendor-specific endpoint URL per account is a deployment burden on every customer workload. The primitive should let customers resolve the familiar hostname from inside their VPC and have it point at the private endpoint, transparently.
Vendor-managed service, customer-consumed handle. Vantage controls the load balancer, instances, scaling, TLS certificates, deployment cadence. Customers control their own VPC, their own ENI subnets, their own security groups. The primitive has to offer a clean seam: vendor on one side, customer on the other, neither operating anything on the other’s account.
And the cross-region reality. Most AWS-native private connectivity primitives are regional by construction. Cross-region options where they exist tend to be opt-in, with per-byte cost and gaps that matter for production data planes. For most SaaS at scale, the clean answer is per-region deployments, one customer base per region.
What we’ll filter on
- No IP coordination, overlapping CIDRs must not be a problem.
- No public traffic path, bytes stay on the AWS backbone.
- Multi-customer isolation, one customer cannot see another’s connection.
- Vendor-managed service, customers don’t operate anything server-side.
- Per-customer authorisation, allow-list of ARNs, reviewable revocation.
- Unchanged client-side hostname, existing SDKs keep working.
- Scales operationally, onboarding a new customer is a policy change.
The private-connectivity landscape
PrivateLink endpoint services. A service provider creates a VPC endpoint service: a Network Load Balancer (or Gateway Load Balancer) fronted by an AWS-managed service name of the form com.amazonaws.vpce.<region>.vpce-svc-<hash>. Customers in other accounts create an interface VPC endpoint that places AWS-managed ENIs into the customer’s chosen subnets with IPs from the customer’s own CIDR. Traffic flows customer ENI to AWS backbone to vendor NLB. No public IPs, no route-table changes between accounts, no shared CIDR. The relationship is asymmetric by design.
VPC peering. Direct Layer-3 between two VPCs. Requires non-overlapping CIDRs, non-transitive routing, route-table entries on each side, security-group references only within the same region. Right for one-to-one, IP-aligned networks under shared governance. Wrong for SaaS because IP coordination with customers is impractical.
Transit Gateway. Regional router that VPCs and VPNs attach to. Scales many-to-many but demands non-overlapping CIDRs across all attachments. Owner account is a governance question neither party wants to answer. Overkill and CIDR-fragile for vendor-customer shapes.
Public endpoint with strong authentication. An ALB or CloudFront with a public hostname, backed by WAF, SigV4, mTLS, or tokens. Authentication is fine; the path the packet takes is public. Fails the explicit constraint.
Side by side
| Mechanism | No IP coord | No public traffic | Multi-customer isolation | Vendor-managed | Cost / scale |
|---|---|---|---|---|---|
| PrivateLink endpoint service | ✓ | ✓ | ✓ | ✓ | ✓ |
| VPC peering | ✗ | ✓ | — | ✗ | ✗ |
| Transit Gateway | ✗ | ✓ | — | ✗ | ✗ |
| Public endpoint + IAM | ✓ | ✗ | ✓ | ✓ | ✓ |
One row is all ticks. PrivateLink lands.
Matching the relationship to the primitive
PrivateLink endpoint services, in depth
Origin. The origin must be an NLB (TCP / UDP / TLS listeners) or a GWLB. ALBs can’t front an endpoint service directly, though an NLB can target an ALB since 2021 for teams who need HTTP-layer features.
Acceptance. The provider decides whether acceptance is manual (acceptance-required) or automatic. Manual means each consumer request must be accepted before data flows, the common choice for enterprise SaaS.
Allow-list. Names AWS principals: a whole account (arn:aws:iam::<acct>:root), a specific IAM role, a specific user, or * for open services. The allow-list gates creating the consumer endpoint; acceptance gates the connection. Both apply.
Private DNS. Lets the provider publish a custom hostname that resolves privately inside each consumer’s VPC. The provider verifies domain ownership once via a TXT record; AWS then creates the private hosted zone and CNAME automatically in each consumer VPC that enables it. Existing SDKs pointed at the vendor’s hostname keep working, resolution comes from inside the VPC now instead of public DNS.
Cross-region isn’t transparent. A consumer endpoint in eu-west-1 can’t reach a service in us-east-1 with the baseline feature. A limited opt-in cross-region option (launched late 2024) exists but excludes a specific list of AZs and has no cross-region failover.
The vendor side, step by step
NLB in front of the ingest fleet. A TLS listener on 443, terminating the vendor’s certificate for ingest.vantageobs.example, targeting an auto-scaling group of ingest instances. Cross-zone load balancing enabled so a single-AZ customer still hits the whole fleet.
Endpoint service configuration. aws ec2 create-vpc-endpoint-service-configuration --network-load-balancer-arns <nlb-arn> --acceptance-required. The response includes a service name of the form com.amazonaws.vpce.eu-west-1.vpce-svc-<hash>. At this point the service is discoverable by no one, no principal is allowed yet.
Principal allow-list. aws ec2 modify-vpc-endpoint-service-permissions --service-id vpce-svc-<hash> --add-allowed-principals arn:aws:iam::200000000001:root .... Using <acct>:root grants the whole customer account permission to create an endpoint. Finer-grained principals are allowed for particularly sensitive customers.
Private DNS name. The vendor wants customers to use ingest.vantageobs.example, not the generated vpce-svc-*.eu-west-1.vpce.amazonaws.com. They verify ownership of vantageobs.example via a TXT record AWS hands them, then enable private DNS on the endpoint service. From that point, any customer who enables private DNS on their interface endpoint gets a private hosted zone automatically created inside their VPC with a CNAME from the hostname to the endpoint’s regional DNS.
The customer side, step by step
Confirm the service name. The vendor sends com.amazonaws.vpce.eu-west-1.vpce-svc-0abc123def456 and the region.
Create the interface endpoint. aws ec2 create-vpc-endpoint --vpc-id <cust-vpc> --service-name com.amazonaws.vpce.eu-west-1.vpce-svc-0abc123def456 --vpc-endpoint-type Interface --subnet-ids <subnet-a> <subnet-b> --security-group-ids <sg>. AWS creates one ENI per specified subnet with a private IP from that subnet, the vendor service is now reachable on IPs from the customer’s own address space.
Security group on the endpoint. Allow 443 inbound from whichever subnets run the workload that talks to Vantage.
Enable private DNS. The customer’s VPC gets the auto-created private hosted zone; applications resolve ingest.vantageobs.example to the endpoint’s ENI IPs. DNS hostnames and DNS resolution must be enabled on the VPC.
Wait for acceptance. Because the vendor set acceptance-required, the endpoint state is pendingAcceptance until the vendor accepts, then flips to available.
Quotas worth knowing
- Interface endpoints per VPC: 50, adjustable. A customer consuming many SaaS PrivateLink services burns this faster than it sounds.
- Connections per endpoint service: soft-limited, adjustable via Service Quotas.
- Principal allow-list entries per endpoint service: no published hard ceiling; Service Quotas is source of truth.
- NLBs per endpoint service: many-to-one.
- Bandwidth: 10 Gbps per AZ baseline, autoscaling to 100 Gbps per AZ.
Cross-region, honestly
PrivateLink endpoint services are regional. For a vendor with customers across regions:
Deploy the service in each region. Separate endpoint services in eu-west-1, us-east-1, ap-southeast-2. Each customer creates a regional endpoint in their own region. Regional deployments replicate between themselves by whatever mechanism already exists. The clean answer.
Enable cross-region connectivity. Launched late 2024. Requires opt-in on both sides; excludes specific AZs; carries no cross-region failover; has per-byte data-transfer cost. Reasonable for a handful of low-volume cross-region customers. Expensive and fragile for a SaaS data plane at scale.
What’s worth remembering
- PrivateLink endpoint services are the SaaS-shaped AWS primitive for private connectivity. One vendor account publishes; many customer accounts consume. Endpoint ENIs live in the customer’s address space, so CIDRs can overlap freely.
- Two authorisation gates. The principal allow-list controls who may create an endpoint;
acceptance-requiredcontrols whether that endpoint goes live. - NLB and GWLB are the producer origins, not ALB. ALB-in-front is achievable by making the ALB a target behind an NLB.
- Private DNS on the endpoint service unifies the client-facing hostname. Existing SDKs pointed at the branded hostname work unchanged from inside a VPC with the endpoint enabled.
- VPC peering and Transit Gateway are the wrong shape for SaaS. Both demand CIDR coordination; peering is bilateral; TGW is a shared-governance object.
- Public endpoint with IAM does not satisfy “no public traffic”. Authentication is about who; PrivateLink is about where the bytes go.
- Cross-region is not free and not transparent. Default PrivateLink is regional. Multi-region customers are usually best served by per-region deployments.
- The per-VPC endpoint quota (50 by default) hits customers before vendors. Worth warning customers during onboarding.
- Onboarding a new customer is a two-API-call policy change, add to allow-list, accept the connection, not a network engineering project.