The situation
The estate has grown into a hybrid shape:
- On-prem. Active Directory forest
corp.example.com, domain controllers acting as DNS servers at10.1.2.53and10.1.3.53. All on-prem hosts, servers, and internal web apps live under subdomains ofcorp.example.com. - AWS. Several VPCs connected to on-prem via Direct Connect through a Transit Gateway. Workloads in AWS use Route 53 Private Hosted Zones for
cloud.example.comandaws.example.com, plus VPC-scoped zones for database endpoints. - DNS today. EC2 instances default to the VPC resolver at
.2in their CIDR (the AmazonProvidedDNS). Queries forcloud.example.comand for public names work. Queries forcorp.example.comreturnNXDOMAINbecause the VPC resolver has no path to the AD DNS. - The reverse. On-prem servers default to the AD DNS. Their
corp.example.comqueries work; queries forcloud.example.comfail because AD DNS has no zone for it and no forwarder.
The migration team wants:
- AWS → on-prem. EC2 in any VPC can resolve
corp.example.comnames (to on-prem IPs). - On-prem → AWS. On-prem servers can resolve
cloud.example.comandaws.example.comnames (to AWS private IPs). - Split-horizon ready. A handful of names in
corp.example.comhave AWS-hosted alternatives during migration; queries originating in AWS should prefer the AWS record, queries on-prem should prefer the on-prem record. - No hosts files. Everything through DNS.
Five options on the menu:
- EC2 custom DNS on
/etc/resolv.confpointing directly at AD DCs over the tunnel. - Forwarder on a proxy EC2 running Unbound or BIND.
- Route 53 Resolver endpoints + rules, the AWS-native pattern.
- AD DNS conditional forwarders for cloud zones, AWS-side doing its own thing.
- Shared services VPC with centralised resolvers used by every VPC.
What actually matters
Before picking, it’s worth asking what hybrid DNS actually has to do.
The first ask is direction. On-prem has an authoritative DNS for corp.example.com; AWS has authoritative Private Hosted Zones for cloud.example.com. Each side needs two paths: authoritative for its own names, resolver-with-forwarding for the other side’s names. That’s two separate flows. AWS-originated queries leaving for on-prem, and on-prem-originated queries arriving for AWS, and the architecture has to express both.
The second ask is who decides the forwarding. If EC2 instances point their /etc/resolv.conf directly at the on-prem DC over the tunnel, every Linux or Windows box needs individual configuration, and there’s no AWS-side observability for which names are being asked. Centralising the forwarding decision, so the VPC resolver continues to be the per-instance default and a managed rule shapes what happens to specific domains, is what keeps the configuration in one place and the audit trail honest.
The third ask is which VPCs learn which rules. The forwarding configuration should be a shareable resource, defined once, associated with every VPC that needs the same behaviour, and ideally shared across accounts via the same mechanism the org already uses for resource sharing. A well-run estate has one rule saying “corp.example.com goes to the on-prem DCs” and applies it everywhere.
The fourth ask is high availability. Whatever sits at the AWS edge of the hybrid DNS path has to be multi-AZ by design, with at least one endpoint per AZ. The cross-VPC-boundary hop is the failure-prone bit, and losing a whole AZ should not take DNS down. On-prem forwarders must also be multi-DC for the same reason.
The fifth ask is split-horizon. Route 53 Private Hosted Zones associated with a VPC shadow public DNS for the same names inside that VPC. So if on-prem has hr.corp.example.com → 10.1.5.6 and AWS has a PHZ for corp.example.com associated with a VPC with hr.corp.example.com → 10.100.1.1, queries from inside that VPC see the AWS answer first. That’s either a feature or a footgun, depending on whether the split is intentional. We want it to be intentional.
What we’ll filter on
- AWS → on-prem resolution, do AWS workloads resolve on-prem zones?
- On-prem → AWS resolution, do on-prem servers resolve AWS private zones?
- Centralised, shareable, can the config live once and apply to many VPCs?
- Multi-AZ / multi-DC, does it survive loss of a facility?
- Observable, can we tell what’s being asked, and what got forwarded where?
The hybrid DNS landscape
1. Point resolv.conf at on-prem DCs. Set nameserver 10.1.2.53 on each EC2 and trust the tunnel. Works for AWS → on-prem (if the tunnel is up); does nothing for on-prem → AWS; not centralised (config per host); no observability.
2. EC2 Unbound/BIND proxy. Run a resolver EC2 in a shared VPC configured with corp.example.com forwarded to AD DCs, other zones to AmazonProvidedDNS. Every other VPC’s workloads use the proxy via .2 → proxy in DHCP options or /etc/resolv.conf. Works, ages badly, the proxy is now a thing to patch, monitor, and scale. Observable via the proxy’s own logs.
3. Route 53 Resolver Inbound + Outbound endpoints + rules. Managed, multi-AZ by design, observable via Resolver query logging, shareable via RAM. Inbound endpoint has ENIs in the VPC that listen on port 53; on-prem DCs forward queries for AWS zones to those ENI IPs. Outbound endpoint has ENIs that are the source of forwarded queries going out to on-prem DCs. Resolver rules match a domain and forward matching queries via an outbound endpoint to a list of target IPs. Rules can be associated to VPCs; rules can be shared across accounts.
4. AD conditional forwarders for cloud zones. On-prem DCs are configured with a conditional forwarder cloud.example.com → some AWS resolver. Needs the “some AWS resolver” to be reachable, usually an Inbound Resolver endpoint. Complementary to option 3, not a replacement.
5. Centralised resolver VPC. A shared-services VPC runs the inbound and outbound endpoints; every other VPC has no endpoints of its own and uses cross-account resolver rules (shared via RAM) associated with the local VPC. Reduces endpoint count and cost; requires that the shared VPC has network paths (TGW or Cloud WAN) to on-prem so its outbound endpoint can actually reach the AD DCs.
Side by side
| Option | AWS → on-prem | On-prem → AWS | Centralised | Multi-AZ | Observable |
|---|---|---|---|---|---|
| Point resolv.conf at DCs | Partial | ✗ | ✗ | Per DC | ✗ |
| EC2 Unbound proxy | ✓ | With work | Partial | With work | Partial |
| R53 Resolver (in + out) | ✓ | ✓ | ✓ (via rules + RAM) | ✓ | ✓ (query logs) |
| AD conditional forwarders only | ✗ | ✓ | ✓ (AD-side) | Per DC | Partial |
| Centralised resolver VPC | ✓ | ✓ | ✓ | ✓ | ✓ |
The two endpoints, in a network diagram
The pick(s) in depth
Route 53 Resolver with inbound + outbound endpoints in a shared-services VPC, rules shared via RAM. The shape that earns the five attributes.
The pieces, named and placed:
1. Outbound resolver endpoint. Create in the shared-services VPC with two ENIs, one per AZ. These ENIs are the source IPs of queries leaving AWS. On-prem firewalls allow UDP/TCP 53 inbound from those IPs to 10.1.2.53 and 10.1.3.53. Size: the endpoint scales automatically within limits (10,000 queries per second per ENI as the published soft cap; if you’re past that you’re unusual and should open a support case). aws route53resolver create-resolver-endpoint --direction OUTBOUND --ip-addresses SubnetId=subnet-a,Ip=10.100.1.10 SubnetId=subnet-b,Ip=10.100.2.10 --security-group-ids sg-outbound-dns.
2. Inbound resolver endpoint. Create in the same shared-services VPC with two ENIs in two AZs. These ENIs are the destination IPs that on-prem DCs forward queries to. On-prem DCs have conditional forwarders configured: cloud.example.com → 10.100.1.53, 10.100.2.53. On-prem firewalls allow UDP/TCP 53 outbound from the DCs to those IPs. aws route53resolver create-resolver-endpoint --direction INBOUND --ip-addresses SubnetId=subnet-a,Ip=10.100.1.53 SubnetId=subnet-b,Ip=10.100.2.53 --security-group-ids sg-inbound-dns.
3. Resolver rule. One rule per on-prem zone we want AWS to forward. For our case, one rule: corp.example.com → type FORWARD → target IPs 10.1.2.53:53, 10.1.3.53:53 → via the outbound endpoint. aws route53resolver create-resolver-rule --domain-name corp.example.com --rule-type FORWARD --resolver-endpoint-id <outbound-id> --target-ips 'Ip=10.1.2.53,Port=53' 'Ip=10.1.3.53,Port=53' --name "forward corp to AD".
4. Rule association. Associate the rule with every VPC that needs to resolve corp.example.com. In a single account, this is a list of associations. Across accounts, share the rule via RAM (aws ram create-resource-share --name dns-corp --resource-arns <rule-arn> --principals <account-ids>), then the recipient accounts associate the shared rule with their own VPCs.
5. Split-horizon awareness. If AWS has a PHZ for corp.example.com associated with a VPC, that PHZ wins for queries matching its zone inside that VPC, and only queries for names not in the PHZ fall through to the forward rule. A PHZ for corp.example.com with no matching record doesn’t fall back to the forwarder, it returns NXDOMAIN. So for migration, either set up the PHZ to cover the full on-prem zone (maintain parity), or don’t associate the PHZ with VPCs that need on-prem fallback.
6. Query logging. aws route53resolver create-resolver-query-log-config --name dns-audit --destination-arn arn:aws:logs:...:/aws/r53/queries, then associate it with VPCs. Every query shows up with query name, type, source VPC, and the answer. One of the cheapest security-observability tools AWS sells; use it.
A worked resolution, both directions
AWS → on-prem. Our EC2 at 10.10.5.22 in VPC prod runs dig file-server.corp.example.com.
- The resolver lookup goes to
10.10.0.2(the VPC resolver). - The VPC resolver sees a rule associated with this VPC:
corp.example.comforwards via outbound endpoint to10.1.2.53, 10.1.3.53. - The outbound endpoint sends the query from ENI
10.100.1.10across the TGW and Direct Connect to10.1.2.53. - AD DC #1 answers:
file-server.corp.example.com IN A 10.1.5.40. - The response flows back: outbound endpoint → VPC resolver → EC2.
digprints the answer.
On-prem → AWS. corp server 10.1.5.100 does nslookup api.cloud.example.com.
- The server asks its configured DNS (AD DC #1 at
10.1.2.53). - AD DC #1 has a conditional forwarder for
cloud.example.com→10.100.1.53, 10.100.2.53(the inbound endpoint). - The DC forwards the query across Direct Connect to the inbound endpoint ENI.
- The inbound endpoint hands it to the shared-services VPC’s resolver, which checks PHZs:
cloud.example.comis a PHZ associated with this VPC, so the PHZ answers. - Response:
api.cloud.example.com IN A 10.100.4.15(or whatever the PHZ says). - Flows back: inbound endpoint → DC → server.
nslookupprints the answer.
What’s worth remembering
- Two endpoint types, one per direction. Outbound sends AWS-originated queries out; inbound receives on-prem queries coming in. You usually need both for true two-way hybrid DNS.
- Endpoints are always multi-AZ. At least two ENIs, in different subnets, in different AZs. Losing an AZ must not take DNS with it.
- Resolver rules are the forwarding logic. Type
FORWARD+ a domain + target IPs + an outbound endpoint. Associate with VPCs. - Share rules across accounts via RAM. One rule, one association per consumer VPC. New accounts inherit the config by being added to the share.
- Private Hosted Zones associated with a VPC shadow everything else. A PHZ for
corp.example.comin the VPC means the PHZ is authoritative for that zone inside the VPC; missing names return NXDOMAIN, not “try the forwarder.” Plan split-horizon deliberately. - The VPC resolver is still
.2. You don’t point workloads at the endpoints directly; the endpoints exist to extend the VPC resolver’s capabilities. Keep DHCP options at default for the workload VPCs. - Inbound endpoint IPs are what on-prem DCs forward to. Hardcode them in AD conditional forwarders; make sure they’re in the allow-list of the on-prem firewall.
- Query logging is cheap visibility. Associate a query log config with VPCs to see every DNS lookup, answer, and source. For incident response and drift detection, few tools are more useful.
Hybrid DNS isn’t hard; it’s just two directions that both need setting up, plus a handful of rules that tell the VPC resolver what to do with specific zones. Route 53 Resolver provides the endpoints and the rules as managed, multi-AZ, RAM-shareable primitives. The work is naming the zones, placing the endpoints, and remembering that PHZ associations change query behaviour inside the associated VPC. Once those three things are right, corp.example.com resolves from the cloud and cloud.example.com resolves from on-prem, and nobody has to hardcode an IP again.