How to Allowlist Egress by Hostname

February 12, 2029 · 14 min read

Advanced Networking · ANS-C01 · part of The Exam Room

The situation

Production traffic leaves each VPC through a central egress path today: a NAT Gateway in the shared-services VPC, reached via a Transit Gateway from every workload VPC. Outbound connections land on whatever public IP the destination resolves to – github.com, registry.npmjs.org, the monitoring vendor, a handful of package mirrors, the AWS public APIs. There is currently no filter beyond the NAT Gateway and the public routing decision.

Compliance wants:

  • Allowlist by hostname. Egress is blocked by default. Connections to hostnames on a curated list succeed; everything else is denied.
  • TLS inspection without decryption. We do not want to proxy TLS; we do want to see the hostname the client is trying to reach, which means reading the SNI from the ClientHello.
  • Per-environment policy. Production allowlist is strict; non-prod is more permissive. Different VPCs, different rules.
  • Audit trail. Every denied connection logged with timestamp, source IP, destination hostname (or SNI), and rule that matched.
  • Central management. One team owns the firewall configuration; workload teams don’t create their own.

Options considered:

  • Security groups and NACLs. IP-based, stateless for NACLs, stateful for SGs, no hostname awareness.
  • Forward proxy (Squid on EC2). HTTP and HTTPS CONNECT, classic, but needs client configuration and is another fleet to run.
  • AWS Network Firewall. Managed Suricata, domain-list support, SNI inspection, stateless + stateful rules.
  • Third-party firewall appliance. Palo Alto, Check Point, Fortinet, either behind a GWLB or in-line via TGW. Capable, but vendor-sized and licensed.
  • Route 53 Resolver DNS Firewall. Blocks at the DNS layer. Stops lookups; doesn’t stop a connection to an IP the client already knows.

What actually matters

Before picking, it’s worth asking what “allow by hostname” actually means on the wire.

There is no “hostname” in an IP packet. There are two places a hostname appears in a normal outbound HTTPS call: the DNS query that preceded it, and the TLS ClientHello’s Server Name Indication (SNI) field. Both are visible to something sitting in the path:

  • DNS-layer filtering (Route 53 Resolver DNS Firewall, Squid’s DNS control, or a corporate resolver with domain lists) blocks the lookup. If a client never resolves evil.example.com, it doesn’t learn the IP. This works until someone hardcodes an IP or uses DNS-over-HTTPS to bypass the corporate resolver.
  • SNI-layer filtering at the first hop that sees the TCP stream, a transparent firewall or proxy, can read the SNI from the ClientHello (the first client-side TLS record) and decide whether to let the handshake complete. This works even if the client brought its own DNS; it breaks cleanly if the client uses Encrypted ClientHello (ECH) or TLS session resumption, but for the common case of a compute workload reaching the internet, SNI is visible.

SNI-based filtering reads the SNI for stateful domain-list rules. It does not decrypt; it looks at the first handshake record, matches the SNI against the allowlist, and either passes the stream or drops it. That’s the main lever for our requirement.

The second thing worth asking is where the inspection sits in the path. A managed filtering primitive lives somewhere we can route to, usually a dedicated subnet per AZ in the egress VPC, with the workload subnets routing 0.0.0.0/0 to it and the filter forwarding accepted traffic on to the NAT Gateway. The shape is “interception by route table,” not “in-band on every ENI.”

The third is how rule evaluation needs to work for default-deny. Whatever we pick has to support an explicit priority ordering where the allowlist matches first and a final catch-all drops everything else, with both events logged. Modes where the engine picks “best match” from a set of rules without a clear priority ordering invite ambiguity at audit time.

What we’ll filter on

  1. Hostname-based policy, can rules match domains, not just IPs?
  2. TLS SNI awareness, does it read SNI from ClientHello without decryption?
  3. Per-VPC / per-environment policy, can prod and non-prod have different rules?
  4. Default deny with allowlist, does “block everything except the list” express naturally?
  5. Audit trail, are denials logged with enough context?

The egress-filtering landscape

1. Security groups + NACLs. IP-based. SGs are stateful but only in the 5-tuple sense; they don’t see hostnames. Fails on hostname.

2. Squid / HAProxy forward proxy. HTTPS CONNECT for TLS (reads the target hostname from the CONNECT method), or full MITM proxying for HTTP. Requires client configuration (HTTPS_PROXY env var or OS-level setting). Self-managed fleet. Capable but operationally expensive. Partial on central management.

3. AWS Network Firewall. Managed Suricata-compatible firewall. Stateless + stateful rule groups. Domain-list support, SNI inspection for TLS, HTTP Host header for HTTP. Logs to CloudWatch Logs, S3, or Kinesis Data Firehose. Ticks all five attributes.

4. Third-party firewall (PA/CP/FTNT) via GWLB. Vendor-managed appliances running on EC2, exposed through a Gateway Load Balancer, traffic routed to them by GWLB endpoints. Most capable; most expensive; most complex. Suits orgs that already standardise on a vendor.

5. DNS Firewall. Route 53 Resolver DNS Firewall applies domain-list rules to DNS queries originating in associated VPCs. Blocks lookups. Complementary to Network Firewall, catching lookups is cheaper than inspecting TCP streams, but alone is bypassable by hardcoded IPs or out-of-band DNS.

Side by side

Option Hostname-based SNI / Host aware Per-VPC policy Default deny Audit trail
SG + NACL ✓ (on IPs) Flow logs only
Forward proxy ✓ (proxy logs)
Network Firewall ✓ (per policy) ✓ (CWL / S3)
Third-party GWLB ✓ (vendor)
DNS Firewall ✓ (names) ✗ (names only in DNS) ✓ (query logs)

Packet path through the engines

EC2 in prod VPC TLS ClientHello to github.com or evil.example.com AWS Network Firewall endpoint (per-AZ) Stateless engine (first) 5-tuple match, no connection state drop RFC1918 ranges to internet default action: forward to stateful Stateful engine (second) STRICT_ORDER, connection-aware priority 100: pass tls content:"github.com"; sni; plus allowlist domains priority 150: pass dns to VPC resolver priority 200: pass http host allowlist priority 900: drop any any "default deny" catch-all alerts on drop policy: stateful default actions drop-established on non-match Egress VPC NAT Gateway Elastic IP → IGW Logging alert log → CWL flow log → Firehose → S3 pass (allowlisted) drop + alert logs for both pass and drop are selectable, prod typically logs drops only at scale
Stateless engine pre-filters, stateful engine does the hostname work. STRICT_ORDER makes rule priority explicit; the catch-all drop at priority 900 is what "default deny" looks like in Suricata syntax.

The pick(s) in depth

AWS Network Firewall with a stateful rule group in STRICT_ORDER, domain allowlists per environment, CloudWatch Logs for alerts. The shape that earns the five attributes.

The pieces:

1. Deploy the firewall. A Firewall resource in the egress VPC, with firewall subnets in each AZ (one subnet per AZ, dedicated to the firewall). AWS provisions a firewall endpoint ENI in each subnet; route tables in the workload subnets route 0.0.0.0/0 to the firewall endpoint in the same AZ. The firewall subnet itself routes 0.0.0.0/0 to the NAT Gateway. Traffic goes workload → firewall → NAT → IGW.

2. Define a firewall policy. One FirewallPolicy per environment (prod strict, nonprod permissive). The policy lists stateless rule groups, a stateless default action, stateful rule groups, and a stateful default action. Set stateful engine options → rule-order to STRICT_ORDER at policy creation.

3. Stateless rule group. Pre-filter the obvious: drop RFC1918 sources heading to the internet (indicates a misconfigured route), permit and forward anything else to the stateful engine. The stateless default is usually “forward to stateful” so that all policy lives in one place.

4. Stateful rule group, the domain allowlist. In Suricata rule syntax, domain allowlists translate to:

pass tls $HOME_NET any -> $EXTERNAL_NET any (msg:"allow github";
  tls.sni; content:"github.com"; endswith; sid:1000001; rev:1;)
pass tls $HOME_NET any -> $EXTERNAL_NET any (msg:"allow npm";
  tls.sni; content:"registry.npmjs.org"; endswith; sid:1000002; rev:1;)
pass tls $HOME_NET any -> $EXTERNAL_NET any (msg:"allow monitoring";
  tls.sni; content:".datadoghq.eu"; endswith; sid:1000003; rev:1;)
pass dns $HOME_NET any -> any any (msg:"allow DNS to VPC resolver";
  sid:1000010; rev:1;)
drop ip any any -> any any (msg:"default deny";
  sid:1009999; rev:1;)

The last rule is the default-deny catch-all. Priorities are implicit by the order in which the rules are added when using Suricata-compatible rule strings in STRICT_ORDER mode; for the managed “stateful rule group with domain list” shape, AWS also offers a simpler stateful domain list rule type, a JSON array of domain names with an allowlist/denylist flag, which compiles to equivalent Suricata rules under the hood. The domain-list form is friendlier; the Suricata form is more flexible (you can match HTTP host, SNI, JA3 fingerprints, and anything else Suricata supports).

5. Per-environment policies. Prod policy: strict allowlist, default drop, alerts on every drop. Nonprod policy: broader allowlist, maybe alert-only on denials rather than drop, to let teams discover what they need. Attach each policy to its respective firewall.

6. Logging. Configure the firewall’s logging configuration: alert logs (records every match of a rule with alert or drop action) to CloudWatch Logs for the SOC; flow logs (per-connection metadata, like VPC Flow Logs but from the firewall’s perspective) to Firehose → S3 for long-term retention; TLS logs (metadata about TLS connections, including SNI) if desired. Prod usually logs drops only at scale to control volume.

7. Delegate administration. Use AWS Firewall Manager if the policy needs to apply across many accounts in the organisation. Firewall Manager can deploy a Network Firewall policy to every in-scope account and prevent local changes. Without FMS, the central team just owns the egress VPC and workload teams route to it.

A worked packet

EC2 10.10.5.22 in the prod VPC calls git pull from github.com. The connection attempt:

  1. The VPC route table sends 0.0.0.0/0 to the firewall endpoint in the same AZ.
  2. The TCP SYN to 140.82.x.y:443 arrives at the firewall. Stateless engine: not RFC1918 source, not to RFC1918 destination, stateless default is “forward to stateful.” Stream handed to stateful.
  3. Stateful engine: the SYN establishes a flow. No SNI yet (SNI is in the ClientHello, not the SYN); Network Firewall permits the handshake to start because the flow matching isn’t decided until enough bytes have arrived to evaluate the rules.
  4. The ClientHello arrives with SNI: github.com. Stateful rule at priority 100 matches (tls.sni; content:"github.com"; endswith). Action pass. The flow is permitted for its duration.
  5. Response bytes flow back through the firewall via connection tracking. No rules re-evaluated unless a new flow starts.
  6. Alert log: no entry (pass rules can optionally log; by default they don’t).

Same instance tries curl https://evil.example.com:

  1. SYN through stateless, forwarded to stateful.
  2. ClientHello arrives with SNI: evil.example.com. No allowlist rule matches. The catch-all drop ip any any at priority 900 matches.
  3. Action drop. The connection is reset (or silently dropped, depending on stateful action config). Alert log captures: timestamp, source 10.10.5.22, destination 1.2.3.4:443, protocol, rule sid 1009999, message default deny, SNI evil.example.com.
  4. The SOC dashboard lights up.

What’s worth remembering

  1. Network Firewall sits on route tables. Route 0.0.0.0/0 from workload subnets to the firewall endpoint in the same AZ, then from the firewall subnet on to the NAT Gateway. Per-AZ endpoints, per-AZ routing.
  2. Stateless runs first, stateful runs second. Stateless drops the obviously bad and forwards the rest to stateful. Stateful does connection tracking and rule matching (including SNI).
  3. STRICT_ORDER for default-deny. Rule priorities are explicit; rule at priority N evaluated before N+1; first match wins. The last rule should be a catch-all drop.
  4. SNI inspection happens on the ClientHello. Stateful sees the first client TLS record, matches against the domain allowlist, permits or drops the flow. No decryption; just reading the unencrypted SNI field.
  5. Stateful domain lists are the friendly form. A JSON list of domains with TARGETS_AND_TARGET_TYPES or Suricata rules; both compile to the same engine. Use whichever is easier to read and review.
  6. Per-environment policies via multiple Firewalls. Prod firewall is strict, nonprod firewall is permissive. Policies are first-class resources reused across firewalls.
  7. Logs tell the story. Alert logs catch matches; flow logs catch per-connection metadata; TLS logs catch SNI and JA3. Pick destinations (CWL for alerts, S3 for bulk) based on volume and audience.
  8. DNS Firewall is complementary. It blocks the lookup; Network Firewall blocks the stream. For defence in depth, run both: DNS Firewall catches hardcoded domain lists cheaply, Network Firewall catches clients that bring their own DNS.

Security groups allow by IP; NACLs allow by IP; both fail the “allow by hostname” test. Network Firewall reads the SNI off the TLS ClientHello and matches against a domain allowlist, with explicit rule priorities and a default-deny catch-all. The work isn’t picking a firewall product, it’s spelling out the allowlist, getting the rule order right, and routing the workload traffic through the endpoint so the engine can see it.

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.