The situation
The organisation runs a hybrid estate: workloads in two AWS accounts (connected to on-prem via Direct Connect), plus two on-prem data centres with Active Directory domains corp.example.com and finance.example.com. On-prem hosts resolve via AD DNS; AWS workloads resolve via the default VPC resolver.
Today:
- An application in AWS that tries
crm.corp.example.comgets NXDOMAIN because the VPC resolver doesn’t know thecorpdomain. - An on-prem application that tries
api.payments.internal.example.com(a Route 53 private hosted zone) gets NXDOMAIN because corporate DNS doesn’t know private hosted zones. - The workaround has been manual hosts-file entries or forwarding individual records through a BIND server somebody set up five years ago. It’s fragile, undocumented, and drifts constantly.
Before we configure anything, we need to walk the options for bridging AWS-side and on-prem DNS, and be clear about which direction each piece carries.
What actually matters
The core trade in hybrid DNS is specificity of forwarding in exchange for operational simplicity. Forward everything both ways and you have a bidirectional mirror, which is conceptually simple but leaks information across the boundary. Forward specific zones and you retain control, at the cost of maintaining forwarding rules.
The first thing to ask is: in which direction do queries flow? Two separate questions:
- AWS -> on-prem: a VPC resolver needs to forward queries for
corp.example.comto an on-prem DNS server. - On-prem -> AWS: an on-prem resolver needs a target to forward queries for
api.example.com(a private zone hosted in AWS).
Each direction is a different forwarding plane and a different operational concern. Confusing the two is the failure mode that puts a BIND server on someone’s desk for five years.
The second is: which zones forward and which don’t? Forward exactly what needs to be resolvable; leave everything else alone. A forwarding rule on the AWS side for corp.example.com -> on-prem-DNS-IPs forwards only that zone; public DNS queries still go to AWS’s public resolver. Same on the on-prem side: conditional forwarders for *.example.com.internal -> AWS-side resolver IPs, everything else stays on AD.
The third is: how many VPCs share this? Building the forwarding once per VPC doesn’t scale. The pattern needs cross-VPC sharing so we configure the rules once in a shared networking account and associate them with each workload VPC.
The fourth is: how do on-prem resolvers reach AWS-side resolvers? Whatever we pick lands as IPs in a VPC subnet, reachable from on-prem via Direct Connect or VPN. Route tables and security groups must permit port 53 (UDP and TCP) from on-prem resolver IPs.
The fifth is: query logging. Whatever resolver fronts the bridge has to capture queries for debugging and audit. A bridge with no query log is a bridge with no postmortem the next time something resolves wrong.
What we’ll filter on
- Query direction. AWS-side resolving on-prem names vs on-prem resolving AWS names.
- Zone specificity, forward all queries or only specific domains?
- Cross-VPC sharing, can one endpoint serve multiple VPCs?
- Query logging, can we audit what’s being asked?
- High availability, how do endpoints survive AZ or Region failure?
The hybrid DNS landscape
-
Route 53 Resolver with inbound and outbound endpoints. AWS-native. Inbound endpoint accepts queries from on-prem; outbound endpoint forwards queries from VPC to on-prem. Rules define which zones forward to which targets. Deploys as ENIs in the VPC, charged per endpoint per AZ.
-
Self-managed BIND servers in AWS. EC2 instances running BIND or Unbound, configured as forwarders. Works; requires running the infrastructure. Pre-Resolver-endpoint pattern; usually only kept for organisations with legacy BIND configs they can’t retire.
-
AWS Directory Service (AWS Managed Microsoft AD). A fully-managed AD domain in AWS that joins to an on-prem AD via trust. Queries for
corp.example.comresolve via the AWS-side AD domain controller. Heavier, you’re running AD in AWS, not just DNS. -
VPC peering or TGW + shared DNS. If every VPC has access to the same inbound endpoint (shared via TGW routing and Resolver Rule associations), one endpoint serves many. Infrastructure pattern, not a different tool.
-
Client-side forwarding (e.g., resolv.conf pointing at on-prem). Works for a single instance; fails at autoscale because hosts-file and resolv.conf don’t auto-update.
-
CloudMap / service discovery. For AWS-internal service discovery (Fargate service names), Cloud Map + Route 53 private hosted zone is the native pattern. Orthogonal to hybrid DNS; both coexist in most organisations.
Side by side
| Option | AWS->on-prem | On-prem->AWS | Zone specificity | Cross-VPC | Query logging |
|---|---|---|---|---|---|
| Resolver inbound + outbound | Outbound endpoint | Inbound endpoint | Per rule | ✓ via RAM | ✓ |
| Self-managed BIND | BIND forwards | BIND forwards | BIND config | Yes if routable | BIND logs |
| AWS Managed AD + trust | AD resolution | AD resolution | Per AD forwarder | Multi-VPC via AD | AD logs |
| Peering / TGW shared | Depends | Depends | Resolver rules | ✓ | Resolver log |
| resolv.conf on client | Per host | N/A | No | No | No |
| CloudMap / PHZ | N/A (AWS-only) | N/A (AWS-only) | Per namespace | ✓ | No (but R53 logs) |
For bidirectional forwarding between AWS and on-prem AD, Route 53 Resolver endpoints is the native, maintained, shared-across-VPCs answer.
The endpoint layout
The picks in depth
The outbound resolver endpoint. Created in the networking VPC with ENIs in at least two AZs for HA. Security group allows outbound UDP 53 and TCP 53 to the on-prem AD DNS IPs. Endpoint deploys in minutes; one endpoint serves all VPCs that have its rules associated.
The resolver rules. One per on-prem zone (or wildcard rule if appropriate):
aws route53resolver create-resolver-rule \
--creator-request-id $(uuidgen) \
--name corp-forward \
--rule-type FORWARD \
--domain-name corp.example.com \
--resolver-endpoint-id rslvr-out-aaaa \
--target-ips Ip=10.250.1.10,Port=53 Ip=10.251.1.10,Port=53
Target IPs are the on-prem DNS servers; specifying both provides HA. For the example.com top-level zone where multiple subzones exist (corp, finance, dev, etc.), usually one rule per specific subzone rather than a wildcard on example.com, wildcards block the public Internet’s view of example.com.
Sharing via RAM. The resolver rule is a resource that can be shared with other accounts via Resource Access Manager:
aws ram create-resource-share \
--name corp-forward-rules \
--resource-arns arn:aws:route53resolver:eu-west-1:NET:resolver-rule/rslvr-rr-aaaa \
--principals arn:aws:organizations::NET:ou/o-xxx/ou-Production
Every workload account in the OU sees the rule and can associate it with their VPCs:
aws route53resolver associate-resolver-rule \
--resolver-rule-id rslvr-rr-aaaa \
--vpc-id vpc-payments
After association, the payments VPC’s resolver knows to forward *.corp.example.com queries through the outbound endpoint. The rule itself lives in the networking account; the workload account just references it.
The inbound resolver endpoint. Also in the networking VPC, also ENIs in two AZs. Security group allows inbound UDP 53 and TCP 53 from on-prem resolver IPs (via Direct Connect CIDR). The endpoint’s ENI IPs are what on-prem AD’s conditional forwarder points at.
On-prem AD configuration:
# AD conditional forwarder for *.internal.example.com
add-DnsServerConditionalForwarderZone -Name "internal.example.com" \
-MasterServers 10.0.11.10, 10.0.12.10 \
-ReplicationScope Forest
Now any on-prem host resolving api.internal.example.com hits AD, AD sees a conditional forwarder, forwards to the inbound endpoint’s ENIs over Direct Connect. The endpoint queries the private hosted zone, returns the answer.
Private hosted zones. The networking account (or a dedicated DNS-ops account) owns the PHZs: internal.example.com, api.internal.example.com, etc. Each PHZ is associated with every workload VPC that should resolve those names. Cross-account association uses:
aws route53 create-vpc-association-authorization \
--hosted-zone-id ZXXX \
--vpc VPCRegion=eu-west-1,VPCId=vpc-payments
# Then in the workload account:
aws route53 associate-vpc-with-hosted-zone \
--hosted-zone-id ZXXX \
--vpc VPCRegion=eu-west-1,VPCId=vpc-payments
With the PHZ associated, both the workload VPC’s resolver and the inbound endpoint (which serves queries from on-prem) can answer for the zone.
Query logging. Enable Resolver Query Logging per VPC, destination: CloudWatch Logs in the logging account (see the log aggregation post). Every query, hit, miss, source IP, destination, logged. Useful for:
- Debugging “why can my Fargate task not resolve X?” (see the query, see the response code).
- Security analysis (who’s querying sketchy domains?).
- Capacity planning (DNS QPS per VPC).
Log volume is moderate, tens of thousands of queries per VPC per day for a typical workload.
HA and multi-Region. Each endpoint has ENIs in at least two AZs; that handles AZ failure. For multi-Region, replicate the pattern per-Region: an outbound endpoint and resolver rules in each Region, all pointing at the same (or Region-local) on-prem DNS. Same for inbound endpoints if on-prem needs to reach Region-specific PHZs.
A worked outbound query
A Fargate task in payments-prod tries crm.corp.example.com:
- Task calls
getaddrinfo("crm.corp.example.com"). OS resolver queries VPC resolver at169.254.169.253. - VPC resolver checks associated resolver rules.
corp.example.commatchesrule-1. Forward via outbound endpoint. - Outbound endpoint in networking VPC queries
10.250.1.10(on-prem AD) over Direct Connect. - AD responds with
crm.corp.example.com IN A 10.250.5.47. - Response flows back through outbound endpoint, VPC resolver, to the task’s OS.
- Task connects to 10.250.5.47 (routed via TGW -> Direct Connect).
Total latency: ~5-15ms (DNS query + response over Direct Connect). Every query logged in Query Logs for audit.
A worked inbound query
An on-prem Windows server tries api.internal.example.com:
- Windows DNS client queries AD DNS at
10.250.1.10. - AD checks conditional forwarders. Match on
*.internal.example.com->10.0.11.10. - AD forwards query to 10.0.11.10 (inbound endpoint ENI in networking VPC) over Direct Connect.
- Inbound endpoint resolves against the PHZ
api.internal.example.com. Returnsapi.internal.example.com IN A 10.20.5.12(the PrivateLink endpoint IP). - Response flows back to AD, to the Windows client, which connects to 10.20.5.12.
Same latency story; same logging.
What’s worth remembering
- Outbound endpoint forwards out; inbound endpoint receives from outside. Name refers to query direction; both endpoints are ENIs in your VPC.
- Resolver rules define what forwards and what doesn’t. One rule per zone. Wildcards on top-level domains cause problems; be specific.
- Share rules via RAM to every account. Associate in each VPC. Rule definition lives in one place; associations fan out.
- On-prem needs its own conditional forwarder. AD won’t forward anything unless told. Forwarder points at the inbound endpoint’s ENI IPs.
- Private hosted zones need cross-account association. Authorisation in the PHZ owner account; association in the consuming account. Tedious for new accounts; automatable via Lambda triggered on account-join.
- Query Logging is cheap and worth having. Destination CWL in the logging account. Every DNS query, logged.
- High availability via multi-AZ ENIs. Each endpoint has at least two ENIs in different AZs; on-prem forwarder points at both IPs.
- One networking account owns the hub. Endpoints, rules, PHZs all live there. Workload VPCs associate; rules are not duplicated.
Two endpoints, rules in both directions, names resolve both sides. The hosts-file hacks disappear; the fragile BIND server gets decommissioned; new VPCs joining the org get hybrid DNS as part of onboarding. The organisation’s DNS starts looking like one DNS, even when it’s running in two places.