KMS or CloudHSM for FIPS 140-2 Level 3

March 08, 2027 · 14 min read

Solutions Architect Pro · SAP-C02 · part of The Exam Room

The situation

A fintech processes card transactions on behalf of merchants. The regulator. PCI DSS plus a jurisdiction-specific equivalent, requires:

  • Cardholder data encrypted at rest with auditable key management.
  • Signing keys for transaction messages stored in hardware validated to FIPS 140-2 Level 3 (tamper-resistant, tamper-evident).
  • Key material never leaves the hardware module, unwrapped.
  • Customer-controlled key policies, the cloud provider cannot access plaintext keys or use them without explicit authorisation.
  • Quorum / multi-operator control for key administrative operations.
  • Audit trail of every key operation, retained for seven years.

The engineering team has proposed KMS with customer-managed keys. The security team has pushed for CloudHSM. The finance team has asked what CloudHSM costs and whether there’s a middle ground.

What actually matters

The core trade in key management is control in exchange for operational weight. A fully managed key service is cheap, convenient, and auditable; a customer-operated HSM gives more control over the hardware and software boundaries at the cost of running the cluster. The PCI and FIPS requirements push toward customer-administered hardware for signing keys; the operational overhead pushes back.

The first thing to ask is: what does FIPS 140-2 Level 3 actually require? Level 3 adds to Level 2 the requirement for identity-based authentication of operators, strong enclosure (tamper-resistant with tamper-response), and trusted channels for key input/output. The module must zero its keys on tamper detection. Hardware-backed key services from cloud providers typically use Level 3 modules under the covers, but the module is operated by the provider, not the customer. Whether that counts as “Level 3” for the regulator depends on whether the requirement is about the module’s certification or about who administers it.

The second is: does the regulator accept “the cloud provider operates Level 3 hardware”? Many do. PCI DSS itself doesn’t require Level 3 for all key types; it requires appropriate key management and auditable processes. Regulators that mandate Level 3 specifically for certain high-value key classes (signing keys used to produce legal financial statements, for example) often require customer-administered or on-premises HSMs, because the requirement is that the customer, not the cloud provider, administer the module.

The third is: which keys need which treatment? A single-tier “everything on customer-administered hardware” is expensive and usually overkill. Tier by use:

  • Data encryption keys for data at rest: a managed key service with envelope encryption is usually sufficient for encryption-at-rest requirements.
  • Key encryption keys for wrapping: same tier, with separation per data domain.
  • Signing keys for regulated transaction messages: customer-administered single-tenant hardware, with key lifecycle owned by the customer.
  • TLS private keys for internet-facing endpoints: certificate-manager-backed for public endpoints; higher-grade hardware for specific regulated use cases.

The fourth is: quorum control. A managed key service can express resource policies and conditions, but typically doesn’t natively support m-of-n operator quorum on administrative operations. A customer-administered HSM does, a key-change operation can require N operators to approve before it executes.

The fifth is: operational weight. Running an HSM cluster is a commitment. Multiple nodes for availability; each node is a per-hour rental that adds up to thousands per month; client libraries to install in every application that calls the cluster; certificates, CRLs, and backups to manage. Not something the platform team takes on lightly.

What we’ll filter on

  1. Hardware certification. FIPS 140-2 level and validation scope?
  2. Single-tenant hardware, is the module dedicated to your use?
  3. Customer-administered keys, do you control key lifecycle, rotation, destruction?
  4. Quorum operations, m-of-n for administrative changes?
  5. Operational overhead, how much work is steady-state operation?

The key-management landscape

  1. AWS KMS with AWS-owned key. AWS-managed, AWS-owned. Used by integrated services (S3 default encryption, SQS, etc.). No customer control. Cheapest; least flexible.

  2. AWS KMS with customer-managed key (CMK). AWS-managed HSMs, but customer controls the key policy, rotation, grants, and can schedule deletion. Most common. FIPS 140-2 Level 3 hardware under the covers, but AWS administers the modules.

  3. AWS KMS with imported key material. Customer generates key material outside AWS (e.g., on an on-prem HSM) and imports it into KMS. KMS uses it; customer retains the original. Useful when a regulator requires “key born outside the cloud.”

  4. AWS KMS with custom key store (CKS) backed by CloudHSM. KMS interface with CloudHSM as the backing store. Customer owns the cluster; KMS API surfaces the keys. Hybrid of the two worlds. KMS integrations still work, but the key material lives on customer-administered single-tenant hardware.

  5. AWS CloudHSM. Single-tenant HSM cluster, FIPS 140-2 Level 3, PKCS#11 / JCE / OpenSSL APIs. Customer manages users, quorum, key lifecycle. Requires client libraries in each application. Most control; most overhead.

  6. On-premises HSM with AWS integration. Thales, Gemalto, etc. on-prem, accessed from AWS via direct connect or integrated via CloudHSM-import. Appropriate when the organisation already owns on-prem HSMs.

Side by side

Option FIPS certification Single-tenant Customer-administered Quorum Operational overhead
KMS AWS-owned Level 3 module, AWS admin Multi-tenant None
KMS CMK Level 3 module, AWS admin Multi-tenant Policy only Minimal
KMS imported Level 3 module, AWS admin Multi-tenant ✓ (material origin) Moderate (rotation)
KMS + Custom Key Store Level 3, customer-admin HSM Single-tenant (CloudHSM) Via HSM quorum High (runs CloudHSM)
CloudHSM Level 3 ✓ (full) High
On-prem HSM Per device Per device Owns the box

For the signing-keys requirement with regulator specifying customer-administered Level 3: CloudHSM is the direct fit. For the data-encryption-at-rest requirement: KMS CMKs. KMS Custom Key Store backed by CloudHSM is a middle path that lets regulated signing use the KMS API while the material lives in CloudHSM.

The tiered architecture

Applications S3 buckets (cardholder data) SSE-KMS with CMK envelope encryption RDS + EBS encrypted with CMK per database volume keys wrapped by CMK ALB / CloudFront TLS ACM-managed public certs no customer key material Transaction signer service PKCS#11 client -> CloudHSM signs every settlement message AWS KMS (customer-managed keys) CMK: data-at-rest-prod key policy: app role, audit role rotation: yearly automatic CMK: tls-private-ca ACM private CA backend for internal mTLS Custom Key Store (backed by CloudHSM) KMS API surface; material lives on CloudHSM used by services that require KMS integration + Level 3 CloudHSM cluster (FIPS 140-2 Level 3) HSM node 1 (AZ-a) single-tenant hardware CO, CU, AU users HSM node 2 (AZ-b) automatic replication 2-of-2 quorum for CO ops CloudHSM client (on ECS / EC2) PKCS#11, OpenSSL dynamic engine connects via mutual TLS to cluster Audit and retention CloudTrail KMS: Encrypt, Decrypt, GenerateDataKey events CloudHSM: audit log streamed to CWL Central logging bucket 7-year retention + Glacier Object Lock Compliance mode Quorum operator workflow CO user changes require 2-of-3 via signed CloudHSM CLI tokens
KMS CMKs for broad encryption at rest; ACM for public TLS; CloudHSM single-tenant for regulated signing; Custom Key Store as the bridge where KMS integration is needed with CloudHSM-backed material.

The picks in depth

KMS customer-managed keys for data-at-rest. Separate CMK per data domain (payments, customers, audit), each with:

  • Key policy granting the application role Encrypt/Decrypt/GenerateDataKey only.
  • Automatic rotation (yearly). KMS keeps old material for decrypting older ciphertexts transparently.
  • Aliases for each CMK so applications reference alias/payments-prod-data rather than the key ID directly, simplifies rotation and environment promotion.

S3 buckets use SSE-KMS with the bucket-appropriate CMK. RDS encrypted with the RDS-specific CMK. EBS volumes in payments VPC use the data CMK. Envelope encryption means the actual data is encrypted with per-object data keys (DEKs), which are themselves encrypted with the CMK. The CMK only ever encrypts small DEKs, not the data directly; the DEKs are used in the service’s data plane without calling KMS per-object after the initial wrap.

CloudHSM cluster for signing keys. Two HSM nodes in two AZs (minimum for availability; three recommended for m-of-n quorum). Client VPC with PKCS#11 client libraries installed on the signing application’s Fargate tasks.

Key classes:

  • Crypto Officer (CO): administrative users who create/delete keys and other users. Quorum (2-of-3 or 3-of-5) required for CO operations.
  • Crypto User (CU): application users who use keys for encrypt/decrypt/sign/verify.
  • Appliance User (AU): cluster-wide administrator, created at cluster init, used rarely.

The signing application authenticates to the HSM as a CU, loads the signing key by label, and calls C_Sign. The key never leaves the HSM. The signature comes back; the message with signature is what leaves the cluster.

Quorum: the cluster manifests allow a configuration like CO_QUORUM_VALUE=2 meaning any CO-level operation requires two CO-signed tokens. The deployment process for a key-rotation uses a CI pipeline that collects two operators’ signatures (via offline tokens stored on hardware keys) and submits the signed operation to the HSM.

KMS Custom Key Store. For services that require KMS integration (e.g., S3 SSE-KMS, Secrets Manager at rest) but where the regulatory posture requires single-tenant hardware, a CKS backed by CloudHSM bridges the gap. The CMK’s material lives in CloudHSM; the KMS API is the surface the integrations use. Operational overhead is the CloudHSM cluster plus the CKS linkage.

Not every key needs CKS, only the ones where a specific regulator demands Level 3 but the service only speaks KMS. The default is “KMS regular” for regulatory-tolerable, “CloudHSM direct” for regulatory-required-Level-3-signing, “CKS” for the overlap.

ACM for public TLS. ACM-managed certificates (AWS generates and stores the private key). For public endpoints, the private key never leaves ACM; deployment to ALB/CloudFront is automatic. For private CA scenarios, ACM Private CA can be backed by KMS or CloudHSM.

Audit. Every KMS API call logged to CloudTrail (KMS management events by default; data events for Decrypt/GenerateDataKey optionally). CloudHSM’s own audit log streamed to CloudWatch Logs via the client. Seven-year retention via S3 Object Lock in the central logging bucket. Audit queries via Athena (covered in the log aggregation post).

Operational disciplines.

  • Key aliases, not key IDs. Applications refer to aliases; actual key IDs rotate over key versions.
  • Separation of duties. KMS key administrators cannot use keys; key users cannot administer keys. Enforced via IAM policies that grant kms:CreateKey/PutKeyPolicy to the admin role and kms:Encrypt/Decrypt to the app role, never both.
  • Key access boundary via grants. For temporary access (e.g., cross-account data sharing), use kms:CreateGrant with a specific retiring principal and a short lifespan. Avoid permanent cross-account Allow in key policies.

A worked signing operation

The transaction signer service processes a settlement batch:

  1. Batch arrives via SQS. Signer service reads it.
  2. For each transaction message, signer opens a PKCS#11 session to the CloudHSM cluster, authenticating as the signer-prod CU.
  3. Finds the signing key by label prod-settlement-key-2028, verifies it’s RSA-2048 as expected.
  4. Calls C_Sign with the message hash. HSM computes the signature inside the module.
  5. Signature returned to the signer, which attaches it to the message and sends downstream.
  6. Audit log entry on the HSM records: session ID, user ID, operation type, key handle, timestamp. Log shipped to CWL.

Latency per sign: ~5-10ms (HSM op + network). Signer processes ~200 txns/sec across the cluster. CPU utilisation of the HSM is tracked; scaling up is “add a third HSM node” (AWS provisions within hours).

A worked KMS decrypt for data at rest

A checkout Lambda reads a cardholder record from S3:

  1. Lambda’s IAM role has s3:GetObject on the bucket. S3 returns the object encrypted with SSE-KMS under CMK alias/payments-prod-data.
  2. S3 calls KMS to decrypt the object’s DEK (the DEK was encrypted with the CMK when the object was stored).
  3. KMS evaluates the CMK’s key policy: Lambda’s IAM role has kms:Decrypt via the policy. Pass.
  4. KMS returns the plaintext DEK to S3. S3 decrypts the object in memory, returns plaintext to Lambda.
  5. CloudTrail records the KMS Decrypt call with Lambda’s principal, source IP, and request details.

Lambda never sees the DEK; the CMK never exits KMS; the key policy is the only authorisation.

What’s worth remembering

  1. KMS CMKs are FIPS 140-2 Level 3 hardware, AWS-administered. For many regulated workloads this is sufficient. Check the specific regulator’s language on customer-administered.
  2. CloudHSM is single-tenant, customer-administered, Level 3. When the regulator requires both customer administration and Level 3, CloudHSM is the answer.
  3. Custom Key Store bridges KMS API with CloudHSM material. For services that integrate with KMS but the key material has to live on customer-administered hardware.
  4. Quorum operations live in CloudHSM. KMS doesn’t natively support m-of-n operator quorum. Regulators that require quorum administration push toward CloudHSM.
  5. Envelope encryption keeps per-operation KMS cost low. DEKs encrypt the data, CMKs encrypt the DEKs. KMS calls once per key wrap, not per object.
  6. Operational overhead of CloudHSM is real. Two HSM nodes at ~$1.60/hour each; client libraries on workloads; cluster backup and CRL management. Don’t use it for keys KMS can handle.
  7. Separate key admin from key use. IAM policies grant kms:CreateKey to admins, kms:Encrypt/Decrypt to workloads. A single role holding both is a separation-of-duties failure.
  8. Audit every key operation. CloudTrail for KMS, CloudWatch Logs for CloudHSM audit, seven-year retention with Object Lock. The auditor asks “what signed this”; you answer with a timestamp, a key label, and a CU identity.

Keys that never leave, with an operational story the team can sustain. Tiered by what actually needs Level 3 and customer administration, priced to match the regulatory value. The regulator’s letter gets a one-paragraph answer: data-at-rest by KMS CMK, signing by CloudHSM, audit in the central bucket, seven-year retention. Each sentence is cheap to defend.

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.