The situation
We have roughly 40 internal services spread across 6 VPCs in 4 accounts. Shapes vary:
- A big monolith in its own VPC, front-door via ALB, called by everyone.
- Half a dozen customer-facing APIs in the “customer” VPC, each with an NLB, exposed to internal consumers via PrivateLink endpoints.
- Twenty-odd microservices across the remaining four VPCs, some calling each other via internal ALBs, some via service-mesh sidecars we added in 2023 and never fully owned.
- Consumers span accounts; roughly 200 consumer-side endpoint resources across the estate.
The result is a hairball:
- Service discovery is ad-hoc. Route 53 Private Hosted Zones, a couple of internal DNS forwarders, hostnames hardcoded in several consumers.
- Authorisation lives everywhere. Some services check JWT in application code; some rely on SG membership; some check source IPs from the NLB; some use IAM with a custom SigV4 signing step.
- Cross-account exposure is either PrivateLink (expensive in consumer-endpoint hours at this scale) or peering-plus-SGs (which doesn’t scale).
- Observability is inconsistent: some services emit access logs, some don’t, and there’s no unified request trace.
The platform team asks: is there one thing that replaces load balancers, endpoints, IAM-per-service, and DNS plumbing with a shared primitive?
Options:
- Status quo with better tooling. Terraform modules for load balancers, endpoints, and IAM policies.
- Service mesh (Istio / Linkerd / App Mesh). A client-side library or sidecar in every workload that handles identity, routing, retries, and tracing.
- VPC Lattice. A managed service-to-service networking primitive with service discovery, IAM auth, and cross-VPC routing built in.
- API Gateway internal stages. Put every service behind a private API Gateway stage; consumers call via private endpoints.
What actually matters
Before picking, it’s worth naming what “service-to-service networking” ought to do.
The first ask is service identity, not network identity. Today we often authorise by “which SG is the caller in” or “which source IP appears on the NLB.” These are network-identity proxies and they leak: an EC2 role reused across services, a shared SG across multiple apps, a source IP that doesn’t survive NAT all produce weak authorisation. What we actually want is “the caller is the orders-api service, running in account 2222, with IAM role X, and IAM policy says that role may call payments-api with POST /charge.”
The second ask is a service registry we don’t maintain. Route 53 PHZs work but they’re DNS records, one per service, per environment, and maintenance is a deploy-time burden. A service registry that knows about services and their versions, and resolves them automatically, cuts the DNS plumbing.
The third ask is path-based routing without ALBs everywhere. A lot of the microservice ALBs exist purely to do /orders/* → orders-api, /payments/* → payments-api. A shared router that understands HTTP and does the path routing across all services replaces N ALBs with one listener.
The fourth ask is cross-VPC and cross-account without endpoint sprawl. PrivateLink is the correct tool for exposing one service; it’s the wrong tool for exposing 40, because we end up with 40 endpoint services and 200+ consumer endpoints. A shared networking layer that every service plugs into, and consumers call through without creating per-service endpoints, is the abstraction we’re missing.
The fifth ask is auth policy in AWS-native terms. Instead of custom JWT in every service, let IAM be the auth primitive: “service A’s role may call service B” as a policy sentence, enforced by the networking layer before the packet reaches the service.
What we’ll filter on
- Shared router across services, one networking layer, not N per-service LBs.
- IAM-based identity and auth, policy sentences against service principals, not SG tricks.
- Cross-VPC / cross-account without per-service endpoints, linear scaling, not N × M.
- Managed service discovery, caller → service name → resolved automatically.
- Operational model fit for 40+ services, the abstraction matches the shape of the problem.
The inter-service landscape
1. Status quo (ALBs + PrivateLink + IAM). Each service has its own load balancer, its own exposure shape, its own auth story. Mature, well-understood, and genuinely fine at 5 services. Gets expensive and inconsistent at 40. Fails on shared router and scaling.
2. Service mesh (Istio/Linkerd/App Mesh). A sidecar proxy (Envoy, usually) next to every service; a control plane manages the mesh. Strong on mTLS, retries, traffic shifting, observability. Operationally heavy: control plane to run, sidecars to deploy, certificate rotation, debugging the proxy layer. Capable but expensive to run in-house.
3. VPC Lattice. A service network is the top-level construct, a logical boundary that many services and many VPCs associate with. Services register with the network, each carrying one or more listeners (HTTP, HTTPS, TLS) and target groups (Lambda, EC2 instance IDs, IP addresses, ECS, ALB, or Kubernetes Services via the VPC CNI integration). Auth policies on the service network or individual services use IAM-style JSON with principal, action, resource, and conditions; the data plane enforces them per-request. Consumers in VPCs associated with the network call services by their Lattice-provided DNS name; Lattice routes the request to healthy targets transparently, across VPC and account boundaries, no peerings, no endpoint sprawl.
4. API Gateway internal stages. Private REST or HTTP APIs in API Gateway, fronted by private endpoints in consumer VPCs. Good at request transformation, throttling, and auth. Less ideal for service-to-service gRPC or for long-lived TCP connections; scaled for 40 services is expensive.
Side by side
| Option | Shared router | IAM-based auth | Cross-VPC without endpoint sprawl | Service discovery | Fit at 40+ services |
|---|---|---|---|---|---|
| Status quo | ✗ | Partial | ✗ | PHZ-based | ✗ |
| Service mesh | ✓ | mTLS (not IAM) | ✓ (at mesh layer) | Mesh | With heavy ops |
| VPC Lattice | ✓ | ✓ | ✓ | ✓ | ✓ |
| API Gateway private | ✓ (per API) | ✓ | ✗ (endpoints again) | Per API | Partial |
The Lattice shape
The pick(s) in depth
VPC Lattice for the service-to-service layer, PrivateLink and TGW retained for non-service use cases. Lattice doesn’t replace everything; it replaces the 40-services-behind-load-balancers-with-endpoints hairball specifically.
Concepts worth understanding:
Service network. The top-level Lattice construct. A logical boundary that VPCs and services associate with. VPC associations are “this VPC’s clients can reach services in this network”; service associations are “this service is available in this network.” One service can belong to multiple networks; one VPC can associate with multiple networks. The association is what grants reachability; without it, no routing.
Service. A Lattice-managed entity with:
- One or more listeners (HTTP/HTTPS/TLS), each with rules for path-based, header-based, and method-based routing.
- One or more target groups per rule. Target types include
LAMBDA,INSTANCE(EC2 IDs),IP,ALB, and KubernetesService(via the AWS Gateway API controller / VPC CNI integration). Targets can live in different VPCs than the service; Lattice’s data plane handles the routing. - An optional service-specific auth policy (IAM JSON). If present, it’s evaluated after the service-network auth policy.
- An auto-generated DNS name (
service-id.xyz.vpc-lattice-svcs.aws) or a custom domain mapped via Route 53.
Auth policy. IAM JSON at the service-network level or per-service. Principals are IAM roles/users; actions are vpc-lattice-svcs:Invoke; resources are service ARNs; conditions can match aws:PrincipalTag, source VPC, method, path. Evaluated per-request. This is the role service-orders-frontend may invoke service orders-api with method POST on path /v1/orders sentence, made enforceable without any app-side code.
Identity on the wire. When the caller has an IAM role, the Lattice client (the AWS SDK, or awscurl) signs the request with SigV4. Lattice validates the signature, extracts the principal, evaluates the auth policy, and either forwards to the target (with the caller identity in request headers) or returns 403. Non-signed requests can optionally be allowed if the policy permits (vpc-lattice-svcs:Invoke without aws:PrincipalType constraints), but that’s a footgun, the point is authenticated service calls.
VPC association, not endpoint creation. The difference from PrivateLink is critical: a consumer VPC associates with the service network once, and from that point any service in the network is callable from the consumer VPC. No per-service endpoint ENIs; no per-service DNS configuration; no per-service security group management. Add a new service to the network, every associated consumer VPC can reach it immediately (modulo auth policy).
Observability built in. Access logs to CloudWatch Logs, S3, or Firehose show every invocation with principal, method, path, status, latency. Request tracing via X-Ray. No sidecars to configure.
A worked request
An EC2 instance in VPC a, running as IAM role service-orders-frontend, needs to call orders-api.
$ curl https://orders-api-0abc.xyz.vpc-lattice-svcs.aws/v1/orders/42 \
--aws-sigv4 'aws:amz:eu-west-1:vpc-lattice-svcs' \
--user $AWS_ACCESS_KEY_ID:$AWS_SECRET_ACCESS_KEY \
-H "x-amz-security-token: $AWS_SESSION_TOKEN"
What happens:
- DNS resolution. The hostname resolves (inside the associated VPC) to a Lattice-managed ENI with an IP in the consumer VPC’s CIDR. Lattice places a
link-local-ish ENI per associated VPC, and the service’s DNS name points at it. - Client SigV4. The SDK (or
--aws-sigv4) signs the request with the instance’s IAM credentials. Service name isvpc-lattice-svcs. - Lattice data plane receives the request. The ENI forwards into Lattice’s managed infrastructure.
- Auth policy evaluation. Service-network policy runs first: principal arn matches
service-*, service ARN matches, action matches. Allowed. Service-specific policy runs: no service-specific policy for orders-api, allowed. - Target selection. Lattice picks a healthy target in the ECS target group (round-robin or configured algorithm).
- Request forwarded to target. The ECS task receives the request. Lattice has added headers:
x-amzn-source-vpc,x-amzn-vpc-endpoint-id-style metadata, optionallyx-amzn-invoker-identitywith the caller’s principal. - Response flows back through the data plane to the caller. Access log written: caller principal, method, path, status, latency, target instance.
No peering, no endpoint, no manually-managed load balancer for orders-api, no application-layer auth code. The whole chain is Lattice-native.
What’s worth remembering
- Service network is the boundary. VPCs and services both associate with it; an association grants reachability. One network can host many services; one VPC can associate with many networks.
- Services register targets of many types. Lambda, EC2 instance, IP, ALB, Kubernetes Service. Targets can span VPCs and accounts. Lattice’s data plane handles the cross-VPC routing.
- Auth policies are IAM JSON against service principals. Principal, action (
vpc-lattice-svcs:Invoke), resource (service ARN), conditions on path/method/caller tags. Policy-level and service-level; network policy runs first. - SigV4 is the auth mechanism on the wire. Clients sign requests with their IAM credentials; Lattice validates the signature and enforces the policy. The caller’s identity is extracted before the request reaches the target.
- Consumer VPCs associate once, reach every service. Unlike PrivateLink where you create one endpoint per service, in Lattice the VPC associates with the network and picks up every service in it automatically.
- Listeners and rules do path/header/method routing. Replace per-service ALBs with listener rules that route
/orders/*to the orders-api target group,/payments/*to payments-api. - Observability is built in. Access logs + CloudWatch metrics + X-Ray tracing. No sidecars.
- Cost lives at invocation. Lattice charges per service-hour and per GB-processed. At 40 services, this is usually less than 40 ALBs + 200 PrivateLink endpoints; at 3 services, it’s more expensive. Model it.
Forty services behind forty load balancers behind two hundred consumer endpoints is an operational bill that grows linearly with the service count. Lattice replaces that with a shared networking layer where the unit of concern is “the service,” not “the VPC” or “the NLB.” The trade is learning a new abstraction and paying per-service-hour instead of per-LB-hour. For estates past roughly 15-20 internal services, the sums and the operational simplification both tend to land in Lattice’s favour. The work isn’t picking the newest primitive, it’s noticing that the service-shaped problem has finally got a service-shaped AWS tool to solve it.