Choosing Between Security Groups, NACLs, and Network Firewall

January 27, 2027 · 14 min read

Solutions Architect Pro · SAP-C02 · part of The Exam Room

The situation

The VPC landscape has grown: a web tier, an application tier, a data tier, a shared-services segment, and a handful of third-party connectivity paths. Traffic policies live in three places and the team isn’t sure which is authoritative:

  • Security groups attached to ENIs, configured by each workload team in each account.
  • Network ACLs on subnets, configured once by the networking team when the VPC was built and barely touched since.
  • AWS Network Firewall in the egress VPC, configured by the security team, currently allowing most outbound and inspecting SNI.

The audit question is simple: “Show me the rule that prevents a compromised web-tier host from reaching the payroll database.” Three teams point at each other’s tools. Nobody is quite sure.

Before the next audit we need a clear answer: which layer carries which policy, which layers are the safety net, and how do we make this legible to a new engineer in their first week.

What actually matters

The core trade in network controls is specificity in exchange for placement cost. Fine-grained rules (this ENI can reach that ENI on port 5432) are precise but numerous; coarse rules (this subnet can reach that subnet) are terse but blunt. Each layer sits at a different level of specificity, with different failure modes and different operational costs.

The first thing to ask is: where does the policy actually belong? A policy about which workloads can talk to which other workloads is a security-group-shaped policy, specific to the ENI, stateful, close to the workload. A policy about what goes in and out of a subnet is a NACL-shaped policy. A policy about what leaves the VPC for the internet is a firewall-shaped policy. Putting the wrong policy in the wrong layer doesn’t not work; it just means the policy is either too loose (and something slips through) or too rigid (and legitimate traffic breaks).

The second is: stateful or stateless? Security groups are stateful: allow an outbound request and the response is allowed back automatically. NACLs are stateless: allowing a request out does not allow the response in; both directions must be permitted explicitly, and the ephemeral port range has to be right. Misunderstanding this is the most common NACL failure.

The third is: allow or deny? Security groups are allow-only, you add rules to permit traffic; anything not permitted is denied. NACLs are allow-or-deny, you can explicitly deny specific CIDRs (good for blocking known bad IPs). Network Firewall is either depending on rule group type.

The fourth is: identity-aware or address-aware? Security groups can reference other security groups as sources and destinations. That’s identity: “anything tagged as web-tier can reach anything tagged as app-tier on port 8080.” NACLs only understand CIDRs. Network Firewall understands 5-tuple plus domain (via SNI) but not security groups.

The fifth is: visibility. When traffic fails, which layer logged the failure? VPC Flow Logs catch security-group and NACL decisions at the ENI level (accept/reject). Network Firewall has its own alert and flow logs. The layer whose logs are easiest to query is the layer most teams end up trusting.

What we’ll filter on

  1. Stateful / stateless, does the return path need an explicit rule?
  2. Allow only / allow + deny, can the layer explicitly block traffic?
  3. Identity-aware, can the rule reference workload identity (SG ID, tag)?
  4. Scope of attachment, per ENI, per subnet, per VPC, per firewall endpoint?
  5. Rule limits, how many rules before we hit a quota?

The control landscape

  1. Security Groups. Stateful, allow-only, identity-aware (via SG references), attached to ENIs. A security group is a set of permitted traffic rules; traffic matching any inbound rule is allowed in, traffic not matching any inbound rule is denied. Outbound defaults to all allowed (though this can be restricted). Return traffic for allowed flows is automatic. Up to 60 inbound and 60 outbound rules per SG by default, quota-increasable; up to 5 SGs per ENI (raisable to 16).

  2. Network ACLs. Stateless, allow + deny, CIDR-based, attached to subnets. Rules are numbered; the lowest-numbered matching rule wins. Default NACL allows all; custom NACLs deny all until rules are added. Because stateless, every allowed flow needs rules in both directions, including the ephemeral-port range for responses (1024-65535 typically).

  3. AWS Network Firewall. Stateful (Suricata-compatible) and stateless rule groups, allow + deny, supports 5-tuple and domain-based rules, attached to VPC endpoints (one endpoint per AZ). Sits on the packet path via route table redirect. Priced per firewall endpoint per hour plus per GB processed.

  4. Route tables + no-NAT. Not a traffic-filtering tool, but the absence of a route is the simplest deny: if there’s no path from subnet A to subnet B, traffic cannot flow regardless of security groups. Often overlooked; often the answer to “how do we ensure the payroll subnet can’t reach the internet” (no default route, only a VPC endpoint).

  5. AWS PrivateLink. A deliberate allow path through a VPC endpoint service. Inverts the usual “public by default, restrict” model into “no path by default, create one endpoint per consumer.” The cleanest control for “this VPC can call this service, no others.”

  6. WAF. HTTP/S layer 7 filtering on ALBs, CloudFront, API Gateway. Not a VPC control; mentioned because the audit sometimes conflates it with Network Firewall. WAF sees the HTTP headers and body; Network Firewall sees the packets.

Side by side

Control Stateful Allow + deny Identity-aware Scope Rule limit
Security Group Allow-only ✓ (SG refs) ENI ~60 per SG
NACL ✗ (CIDR only) Subnet 40 per NACL
Network Firewall ✓ (stateful rules) Firewall endpoint ~30k rules
Route table N/A Implicit deny (no route) Subnet ~50 routes
PrivateLink Allow-only ✓ (endpoint policy) Endpoint Small rule surface
WAF ✓ (request attributes) Layer 7 target Thousands

Reading by policy type:

  • Which workload can reach which workload (web -> app, app -> data, never web -> data): security groups with SG references. Identity-aware, per-ENI, stateful. The right layer for every east-west policy.
  • Known-bad IP blocks (sanctioned countries, threat-intel feeds): NACLs or a Network Firewall stateless rule. Both are cheap to add and cheap to remove.
  • Outbound to the internet (domain allow-lists, SNI inspection): Network Firewall in the egress VPC. Centralised, inspection-capable.
  • Cross-VPC traffic (spokes talking to spokes, or not): route tables on the TGW plus security groups on the destinations. Controlled at the transit layer plus the workload layer.

The layered diagram

Network Firewall (VPC-wide, egress VPC) policy: what leaves the VPC for the internet NACL (subnet) policy: deny known-bad CIDRs, allow RFC1918 Security Group (ENI) policy: who can talk to this workload EC2 instance payroll-db.prod SG: sg-payroll-db Inbound rules: tcp/5432 from sg-payroll-app tcp/22 from sg-bastion (everything else: deny) Rule order 1. Firewall: allowed domains 2. NACL: subnet allow-list 3. SG: ENI allow-list default at each layer: deny legit app tier web tier (dropped at SG) known bad IP (NACL) unsanctioned domain (firewall)
Three layers, each carrying a different class of policy. Security groups carry identity; NACLs carry subnet-coarse bans; Network Firewall carries egress inspection. Default-deny at each layer.

The picks in depth

Security groups are the authoritative east-west policy. Every workload tier has its own security group. Rules reference other security groups, not CIDRs:

sg-payroll-db:
  inbound:
    tcp/5432 from sg-payroll-app
    tcp/22   from sg-bastion
  outbound:
    (none needed -- stateful return traffic is automatic)

sg-payroll-app:
  inbound:
    tcp/8443 from sg-payroll-alb
  outbound:
    tcp/5432 to sg-payroll-db
    tcp/443  to 0.0.0.0/0  (for egress to the firewall)

This is the identity layer. A new instance with no SG memberships cannot reach the database; adding sg-payroll-app membership grants access; revoking it denies it. The rules read like “who” not “what IP”, which matches how operators think about the system.

One security group per workload-tier-purpose pair, not one per instance. sg-payroll-db, sg-payroll-app, sg-payroll-alb, three groups for three roles. Instances scale horizontally by joining the right group.

NACLs are the subnet-coarse safety net. Each subnet has one NACL with a small, stable rule set:

  • Allow RFC1918 inbound (other VPCs, VPN, on-prem).
  • Allow HTTPS and ephemeral ports inbound/outbound for general egress.
  • Deny the current threat-intel CIDR list (updated by a scheduled Lambda that pulls from a feed).
  • Deny sanctioned-country CIDRs if regulatory.

NACLs are numbered; deny rules get low numbers (evaluated first), allow rules get higher numbers. The default deny at the bottom of every NACL is the catch-all.

NACLs are stateless. Every rule has an ingress and an egress counterpart. The ephemeral port range for return traffic is typically 1024-65535, and forgetting this is the single most common “why is NACL blocking my traffic” symptom.

The team doesn’t add per-workload rules to NACLs. The NACL exists for the subnet-level bans that apply to everyone. Workload identity lives in the security group.

Network Firewall is the egress inspection and policy layer. In the egress VPC (see the centralised-egress post), not on individual workload VPCs. Rule groups:

  • Stateful domain allow-list for *.stripe.com, api.github.com, package mirrors. SNI inspection; no decryption.
  • Stateless IP deny-list for threat-intel feeds, updated hourly.
  • Stateful geo-blocking for sanctioned countries via GeoIP rule groups.

The firewall does not replace security groups. It doesn’t see east-west traffic at all, it’s on the egress path. An auditor asking “what stops an EC2 instance in one VPC from talking to an EC2 instance in another VPC” does not get “Network Firewall” as the answer; they get “the security group on the destination instance, and the absence of a route if the two VPCs aren’t peered.”

Route tables are the “no path” answer. The payroll subnet has no default route to the internet. It has:

  • Local routes (VPC CIDR) for intra-VPC traffic.
  • A route to the TGW for the shared-services CIDR block.
  • An interface-endpoint route for Secrets Manager, KMS, and ECR.

It does not have a 0.0.0.0/0 route. A compromised process on a payroll instance cannot reach evilsite.com because there is no route, security groups and firewalls never see the packet, because the packet has nowhere to go. The strongest deny is the missing route.

PrivateLink for deliberate VPC-to-VPC paths. The analytics VPC needs read access to the payroll VPC’s data-export endpoint. Rather than peering the VPCs (which would allow any security group in analytics to reach any IP in payroll), we publish a VPC endpoint service in payroll: com.amazonaws.vpce.eu-west-1.payroll.exports. Analytics creates an interface endpoint to it. The only path between the two VPCs is the endpoint service’s NLB; the rest of the payroll VPC is not reachable from analytics. Endpoint policies on each side further constrain who can invoke what.

A worked east-west trace

An attacker compromises a web-tier EC2 instance. They attempt psql -h payroll-db.internal.

  1. DNS resolves payroll-db.internal to the database’s private IP.
  2. The web-tier instance generates a packet to the database’s IP on port 5432.
  3. The web-tier subnet’s route table has a route to the payroll VPC via TGW, the packet leaves the web-tier ENI.
  4. The packet reaches the payroll VPC’s TGW attachment. TGW route tables check: payroll attachment is reachable from the web-tier attachment. Packet forwarded.
  5. The packet arrives at the payroll subnet’s NACL. RFC1918 inbound is allowed. Packet continues.
  6. The packet reaches the payroll database’s ENI. Security group sg-payroll-db evaluates: inbound rules allow 5432 only from sg-payroll-app. The web-tier instance is in sg-web, not sg-payroll-app. Dropped.

The audit question, “what stops the web tier reaching the database”, has one primary answer (the security group on the database) and one secondary (the TGW route tables can also be configured to prevent cross-attachment paths). The NACL and the firewall aren’t involved in this decision.

VPC Flow Logs on the payroll database’s ENI record the rejected flow with REJECT and the source IP. Athena queries over the flow logs can answer “has anything in the web-tier ever attempted to connect to the database?” in seconds.

A worked egress trace

A compromised process on the app tier attempts to exfiltrate data to evilsite.com.

  1. App-tier SG’s outbound rule allows tcp/443 to 0.0.0.0/0. Packet leaves the ENI.
  2. Subnet route table sends the packet to the TGW (centralised egress pattern).
  3. TGW forwards to the egress VPC; egress VPC route table sends packet to the firewall endpoint.
  4. Network Firewall’s stateful SNI inspection reads evilsite.com from the TLS ClientHello. Not in the allow-list. Dropped, alert logged.

The security group allowed the packet, because “tcp/443 to 0.0.0.0/0” is what the app needs to reach Stripe, GitHub, and the package mirrors. The firewall is the layer that carries the domain-specific policy, because the security group doesn’t understand domains. Three layers, three jobs; each catches what the others can’t.

What’s worth remembering

  1. Security groups are the identity layer for east-west. Reference other SGs by ID, not CIDRs. Stateful, per-ENI, readable as “who can talk to this workload”.
  2. NACLs are the subnet-coarse safety net. Cheap deny for known-bad IPs, sanctioned regions, and “should never see this traffic” rules. Stateless, always think in pairs.
  3. Network Firewall sits on egress paths. Domain allow-lists, SNI inspection, geo-blocks. Not an east-west control; not a replacement for security groups.
  4. Route tables are the strongest deny. No route means no path; no path means no packet. For workloads that should never egress, the missing default route is the first control, not the last.
  5. PrivateLink is the opposite pattern. Instead of “allow by default and filter”, PrivateLink starts with “no path by default and deliberately create one”. Best for inter-VPC service-to-service.
  6. Each layer’s logs answer a different question. VPC Flow Logs for accept/reject per ENI; Network Firewall logs for egress decisions with domain; WAF logs for HTTP-layer decisions. An audit needs all three.
  7. Stateful vs stateless is the NACL trap. Every allowed flow needs rules in both directions, including the ephemeral port range. Testing NACL changes requires exercising the return path explicitly.
  8. Default-deny at every layer, allow by addition. New workload, new security group; new subnet, new NACL with the standard rule set; new egress domain, new firewall rule. No layer ever starts “allow all then deny exceptions”, that inverts the audit burden.

Three layers, three jobs. Security groups carry the who; NACLs carry the where; Network Firewall carries the what-domain. Each layer has a default of deny, and each layer has a single clear policy type. The auditor’s question has a clear answer: the layer that carries the policy in question, backed by the others as the safety net.

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.