Choosing Between EFS and the FSx Family for Shared Storage

September 21, 2026 · 16 min read

Solutions Architect · SAA-C03 · part of The Exam Room

The situation

Four workloads, one AWS account, four different filesystem profiles.

  • Linux web farm, 40 instances behind an ALB, Ubuntu 24.04, serving a PHP app that writes user-uploaded images to /var/www/uploads. Everyone needs to read everyone’s writes; file sizes 100 KB to 20 MB; roughly 200 IOPS on average, 2,000 IOPS peaks.
  • Windows app tier, 12 instances running a .NET application that expects \\fileserver\shared with ACL-based access keyed off Active Directory groups. Existing on-prem AD forest via AWS Managed Microsoft AD. File sizes 1 KB to 500 MB; mostly reads.
  • Genomics pipeline, 80 TB of reference data (FASTA, BAM, indexed databases). Thousands of Lambda invocations per run, each reading 10 to 200 MB of files at a time. Throughput is the metric that matters: the pipeline’s p95 per-invocation runtime is dominated by the read phase.
  • Mac creative team, 6 users on macOS, each working on Final Cut projects stored on a shared volume. Currently NFS-mounted from an on-prem NAS; they’d like the same experience on AWS.

“Just use S3” is the answer for some workloads and wrong for others. The PHP app expects POSIX semantics with fopen and directory listings; Lambda expects to be able to mount something without per-invocation copying; the Windows app expects SMB with AD integration. The filesystem protocol is not an implementation detail, it’s the contract.

What actually matters

Start by asking what protocol the workload actually speaks.

The Linux web farm speaks NFS. Every POSIX-aware language on Linux can mount an NFS filesystem; the PHP app does not need to change. fopen("/var/www/uploads/abc.jpg", "w") works exactly as on a local disk, subject to the usual NFS semantics around locking and caching. Whatever the team picks for this workload has to speak NFS over a managed service.

The Windows app speaks SMB, with AD-based ACLs. Linux-world NFS won’t do; the app expects net use Z: \\fileserver\shared and for fileserver$ to be a computer account in a domain. The picked service has to expose a Windows file share, joined to either a managed directory or an on-prem AD via trust.

The genomics pipeline needs high aggregate throughput more than it needs POSIX. General-purpose managed NFS exists, but per-client throughput on most managed-NFS services is modest. What the pipeline actually wants is a high-throughput parallel filesystem with an object-store backing, a category of filesystem designed for HPC and analytics, where per-client throughput is in the hundreds of MB/s and the durable copy of the dataset lives in cheap object storage, lazily loaded into the fast filesystem on first access.

The Mac team speaks NFS or SMB. macOS has a built-in NFS client; whatever the Linux team uses for NFS will work for the Mac team too. SMB would also work, but NFS is what the team’s workflow already uses, and changing filesystem protocol is a bigger disruption than migrating the mount point.

Now the second question: performance envelope. General-purpose managed NFS scales throughput either automatically with filesystem size, or via a provisioned-throughput knob. Parallel filesystems scale throughput with provisioned storage at much higher rates, orders of magnitude beyond general-purpose NFS, at a cost. Managed Windows file shares scale with a throughput capacity configured at creation. The right envelope depends on the workload, not the cheapest option per GB.

Third: durability and replication. Most managed file services offer both single-AZ and multi-AZ deployment types; the multi-AZ variants have automatic failover but typically cost roughly 2x. For a production filesystem holding the only copy of user uploads, multi-AZ is non-negotiable; for a scratch filesystem that loads from an object store on each run, single-AZ is fine because the durable copy lives elsewhere.

Fourth: cost shape. Per-GB pricing varies across services by a factor of two or three; per-throughput pricing varies more. The price per GB is less interesting than the price per useful GB per useful IOPS, storage you don’t need at performance you do need.

Fifth: where does the data live when no client is mounted? Most managed file services hold data always. Some parallel-filesystem variants treat the filesystem as a scratch cache in front of object storage, the durable copy lives in the bucket and the filesystem is just the fast layer.

What we’ll filter on

  1. Protocol. NFSv4, SMB 2/3, Lustre, or something else?
  2. Client OS compatibility, which platforms can mount it natively?
  3. Performance envelope, read/write throughput per client and in aggregate?
  4. Durability / AZ scope, single-AZ or multi-AZ; ephemeral or persistent?
  5. Integration. AD join, IAM, S3 backing, VPC-only?
  6. Cost shape, per-GB, provisioned throughput, Lustre-tier?

The shared-storage landscape

  1. Amazon EFS. Managed NFSv4.1 and v4.0. Mount from Linux via nfs4 or the efs-utils helper (which adds TLS-in-transit and IAM auth). Storage classes: Standard (multi-AZ), One Zone (single-AZ), Infrequent Access and Archive lifecycle tiers that move cold files to cheaper storage automatically. Throughput modes: Bursting (scales with filesystem size), Elastic (auto-scaling throughput up to 10 GB/s read), Provisioned (pay for a fixed MB/s). Performance modes: General Purpose (default, lower latency, up to 35,000 IOPS) or Max I/O (higher latency but parallel scaling beyond that). Native integration with Lambda via filesystem configs. Lambda mounts EFS at a path inside the invocation.

  2. FSx for Windows File Server. Managed SMB 2.x/3.x on EBS-backed storage. Joins an AWS Managed Microsoft AD, an on-prem AD via trust, or a self-managed AD. Supports ACLs, DFS Namespaces, shadow copies, and Windows-style file locking. Single-AZ 1 and 2 and Multi-AZ deployment types; the Multi-AZ type keeps a synchronous mirror in a second AZ and fails over via DNS update.

  3. FSx for Lustre. Managed Lustre parallel filesystem. Scratch 2 (ephemeral, high throughput, cheap) or Persistent 1/2 (durable, SSD or HDD with SSD cache). Optional S3 link, bucket objects appear as filesystem entries, read lazily on first access, and dirty entries flush back on demand or automatically. Scales to 1 TB/s+ aggregate on very large filesystems; per-client throughput in the hundreds of MB/s.

  4. FSx for NetApp ONTAP. Managed ONTAP. Multi-protocol (NFS, SMB, iSCSI), SnapMirror for cross-Region replication, FlexClone for instant copies, tiering between SSD and capacity pool. The answer when a team is migrating from an on-prem NetApp and wants the tooling to work unchanged.

  5. FSx for OpenZFS. Managed ZFS over NFS. POSIX semantics, snapshots, user quotas, data deduplication and compression. The answer for Linux workloads that specifically want ZFS features rather than generic NFS.

  6. S3. Object storage, not a filesystem. Access via the S3 API or, for some workloads, via s3fs or Mountpoint for S3 (a FUSE filesystem AWS publishes). Mountpoint-for-S3 provides read-heavy filesystem-like access with eventual POSIX semantics, adequate for the genomics pipeline’s read-only case, not adequate for the web farm’s random writes.

  7. EBS gp3 + manual replication. The self-managed answer, useful as a reminder: a single EBS volume is single-AZ and single-attach (multi-attach variants exist for io1/io2 but with cluster-aware-filesystem caveats). Shared storage across a fleet is not what EBS is for; reach for EFS or FSx first.

Side by side

Service Protocol Client OS AZ scope AD integration Typical fit
EFS NFSv4.1 Linux, macOS Multi-AZ (Standard) / Single-AZ POSIX UIDs, IAM (efs-utils) Linux web farm, Lambda with modest I/O
FSx for Windows SMB 2/3 Windows, macOS Single or Multi-AZ ✓ (AD native) Windows app tier
FSx for Lustre Lustre Linux Single-AZ (Scratch) / Multi-AZ (Persistent) POSIX UIDs HPC, genomics, big analytics
FSx for ONTAP NFS + SMB + iSCSI All Single or Multi-AZ NetApp migrations
FSx for OpenZFS NFSv3/v4 Linux Single or Multi-AZ POSIX UIDs ZFS-specific features
S3 + Mountpoint S3 API / FUSE Linux Regional IAM Read-mostly large datasets

Reading the table by workload:

  • Linux web farm. EFS Standard Elastic throughput, mounted on all 40 instances via efs-utils. Regional redundancy, handles the IOPS, scales automatically.
  • Windows app tier. FSx for Windows Multi-AZ, joined to the Managed Microsoft AD. ACL-based access works unchanged from the on-prem design.
  • Genomics pipeline. FSx for Lustre Scratch 2 linked to the S3 reference bucket; created at run start, deleted at run end, or kept Persistent if the dataset changes frequently.
  • Mac team. EFS Standard, same filesystem pattern as the web farm. If they need SMB specifically (Finder-friendly behaviour), FSx for ONTAP offers the same share over both protocols simultaneously.

Protocols to services, visually

Workload × Service fit green = native fit · amber = valid alternative · grey = wrong tool EFS NFSv4.1 FSx Windows SMB 2/3 FSx Lustre Lustre FSx ONTAP NFS + SMB S3 + Mountpoint S3 API Linux web farm 40 instances, uploads moderate IOPS, Multi-AZ wants NFS Elastic throughput if NFS + SMB later Windows app tier 12 instances, AD ACLs Multi-AZ, group policies wants SMB native AD, Multi-AZ for multi-protocol Genomics pipeline 80 TB reference 10k Lambdas, high GB/s wants throughput ok if provisioned Scratch 2 + S3 link read-only variant Mac creative team 6 users, projects existing NFS workflow wants NFS or SMB macOS NFS native if Finder/SMB preferred dual-protocol Protocol is the first filter. Performance envelope and AD integration finish the decision.
The matrix reads as "does the workload's protocol match what this service speaks natively?", and then "does the performance envelope fit?"

The picks in depth

Linux web farm → EFS Standard with Elastic throughput. The filesystem is created with aws efs create-file-system --performance-mode generalPurpose --throughput-mode elastic. Each instance mounts via amazon-efs-utils:

# /etc/fstab
fs-0abc1234.efs.eu-west-1.amazonaws.com:/ /var/www/uploads efs _netdev,tls,iam 0 0

The tls option encrypts in-transit; iam scopes mount access to the instance’s IAM role. A Regional EFS access point (aws efs create-access-point) pins the mount to /uploads under a specific POSIX UID/GID, so the PHP runtime doesn’t need to be root-writeable at the filesystem root. Lifecycle management moves files not touched in 30 days to Infrequent Access ($0.025/GB) and after 90 days to Archive ($0.008/GB), with transparent read-back on access.

Windows app tier → FSx for Windows File Server Multi-AZ. Created as 16 TiB SSD, 128 MB/s throughput capacity, joined to the Managed Microsoft AD (corp.example.local). The file share is \\fs-0abc1234.corp.example.local\share. ACLs are set from any domain-joined Windows client using standard tooling (icacls, Explorer Security tab). DFS Namespaces can layer a friendlier path (\\corp.example.local\apps) that resolves to the FSx share.

Backup goes through AWS Backup daily; the FSx built-in automatic backups run on the timer configured at creation with a 35-day retention ceiling.

Genomics pipeline → FSx for Lustre Scratch 2 linked to S3. Created at run start:

$ aws fsx create-file-system \
    --file-system-type LUSTRE \
    --storage-capacity 4800 \
    --subnet-ids subnet-0abc... \
    --lustre-configuration \
      DeploymentType=SCRATCH_2,\
      DataRepositoryConfiguration={ImportPath=s3://ref-genomes/,ExportPath=s3://ref-genomes/results/}

Throughput scales with storage: Scratch 2 delivers ~200 MB/s per TiB of provisioned storage, so a 4.8 TiB filesystem gives ~960 MB/s aggregate, plenty for a parallel Lambda read phase. Files in S3 appear as filesystem entries immediately; first-read of each file imports the object into Lustre. Deletion at run end returns the capacity; S3 holds the durable copy.

Lambda mounts Lustre via VPC config + an EFS-style filesystem mount; Lambda-for-Lustre is supported for Scratch 2 and Persistent filesystems through the Lustre client baked into the runtime. Alternatively, EC2 workers in an Auto Scaling group provide more throughput per dollar than Lambda at this scale.

Mac creative team → EFS Standard, same filesystem, different access points. macOS can mount EFS directly via mount -t nfs -o nfsvers=4.1,rsize=1048576,wsize=1048576 fs-0abc....efs.eu-west-1.amazonaws.com:/ /Volumes/projects. The cleaner option is to route macOS clients through a Client VPN into the VPC (EFS mount targets are VPC-scoped) and use an access point with POSIX UIDs that map to each user’s uid:gid. If Finder’s NFS quirks are a problem, FSx for ONTAP with SMB export is a one-click alternative.

A worked migration: the Windows tier

We are migrating the Windows app tier from a on-prem file server to FSx for Windows. The on-prem share is \\dc01\apps with 1.2 TB of data and AD-scoped ACLs. AWS Managed Microsoft AD is already set up in the corp.example.local domain.

1. Create FSx for Windows Multi-AZ, joined to corp.example.local,
   throughput 128 MB/s, 2 TiB SSD. Wait ~30 min.
   $ aws fsx create-file-system --file-system-type WINDOWS ...

2. Pre-stage ACL-preserving copy from on-prem to FSx using
   AWS DataSync with an on-prem agent. DataSync preserves ACLs,
   timestamps, and ADS (alternate data streams).
   DataSync task: source = \\dc01\apps, dest = \\fs-...\share
   Scheduled incremental every 4 hours during migration window.

3. Cut over: stop writes at source, run final DataSync increment,
   flip the DFS Namespace target from \\dc01\apps to \\fs-...\share.
   Clients pick up the new target on next reconnect because DFS
   handles the referral.

4. Verify: icacls on a sample path, check domain groups resolve
   correctly, run the app's smoke tests.

The app tier sees \\corp.example.local\apps\config exactly as before; the fact that the share is now on FSx in eu-west-1 is invisible. Multi-AZ failover is automatic, the FSx DNS name always resolves to whichever AZ holds the active endpoint.

What’s worth remembering

  1. Protocol is the first filter. NFS → EFS. SMB → FSx for Windows. Lustre → FSx for Lustre. Multi-protocol → FSx for ONTAP. The workload’s existing protocol choice drives the service choice more than performance or price.
  2. EFS is the Linux default. Regional, multi-AZ, NFSv4.1, scales with use, mounts on Lambda and EC2 alike. Elastic throughput mode removes the “did I size this right?” question for most workloads.
  3. FSx for Windows is the Windows default. Managed SMB share joined to AD, with ACLs, DFS, shadow copies, the things a Windows app expects to exist. Multi-AZ for production.
  4. FSx for Lustre is for high-throughput, wide-fanout workloads. Scratch for ephemeral runs against an S3 backing store; Persistent for long-lived HPC. Throughput scales with provisioned storage; S3 integration makes the dataset durable without keeping the Lustre filesystem running.
  5. FSx for ONTAP / OpenZFS are the specialist picks. ONTAP for NetApp-shaped workflows and dual-protocol shares; OpenZFS for teams that want ZFS features over NFS. Don’t reach for them without a specific reason.
  6. S3 is not a filesystem, except when Mountpoint-for-S3 is enough. Read-heavy, sequential, large-object workloads work with Mountpoint; random-write POSIX applications do not. Don’t bolt S3 into a fopen path.
  7. Single-AZ vs Multi-AZ is a durability decision. Production-path filesystems want Multi-AZ; scratch, ephemeral, and S3-backed filesystems can tolerate Single-AZ for half the cost.
  8. DataSync moves filesystems in and out of AWS. Cross-protocol is fine. NFS to SMB, SMB to NFS, on-prem NAS to FSx, with ACLs, timestamps, and sparse files preserved. Reach for DataSync whenever the migration is “move files” rather than “rewrite app”.

Four workloads, four filesystem shapes, and none of them is EFS-for-everything. The protocol the application speaks picks the service; the performance and durability envelope narrow the variant within it.

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.