The situation
A PCI-scoped workload has been running on AWS for two years with encryption everywhere: EBS volumes encrypted with KMS CMKs, RDS clusters encrypted with KMS CMKs, S3 buckets using SSE-KMS, Secrets Manager secrets encrypted by KMS. About 40 customer-managed keys across the estate; keys rotated annually; key policies granting specific IAM roles decrypt access. Audit CloudTrail records every Decrypt call.
New regulatory constraint from a banking partnership:
- Keys protecting the cardholder environment must be FIPS 140-3 Level 3 validated (KMS HSMs are Level 3, but the constraint also requires…)
- The customer must be the sole operator of the HSM cluster, with no shared tenancy.
- Access to key-use must be mediated by a quorum or by a named operator with a signed policy.
- The key-use audit must record the operator identity, not just the IAM role.
- The existing KMS integrations (EBS, RDS, S3, Secrets Manager) must keep working so the workload doesn’t need to change.
The team’s candidate answers are:
- Stick with KMS as-is and argue the interpretation.
- Switch to CloudHSM-backed cryptographic operations, accepting the loss of KMS service integrations.
- Use KMS custom key stores backed by CloudHSM, keeping the KMS surface while moving key material to the customer’s HSM.
Which is correct?
What actually matters
Encryption keys on AWS have three layers the question actually involves: where the key material lives, who can issue operations on it, and what service interfaces can consume the key without app changes.
The first, where the material lives, determines the compliance story. The default AWS path keeps master-key material inside FIPS 140-3 Level 3 HSMs operated by AWS. The material never leaves the HSM, and AWS employees can’t extract it. This meets most “key material in hardware” requirements, but some regulated sectors interpret “sole operator” more strictly: they want the HSM to be theirs, not shared. Meeting that interpretation requires a single-tenant cluster where the customer is the only party with the equivalent of administrator credentials.
The second, who can issue operations, determines the access story. The managed surface gates operations by IAM and key policy: “this role can call Decrypt on this key”, with the role’s identity captured in CloudTrail. A direct-to-HSM surface gates operations by the HSM’s own user database, authenticated via cryptographic libraries using customer-issued credentials; the audit trail lives on the HSM, not in CloudTrail.
The third, what services can consume the key, determines how much application change is needed to switch providers. Many AWS services integrate with the managed key surface natively; you specify a key ARN and the service encrypts/decrypts transparently. Going direct to an HSM means every application uses a cryptographic library explicitly, and the AWS services that only know how to talk to the managed surface can’t help you.
The interesting middle option keeps the managed service surface (so storage and secrets services keep working against a key ARN) but moves the underlying material to a single-tenant HSM the customer controls. The managed service delegates the cryptographic operation to the customer’s HSM; the master-key material never exists inside the AWS-controlled HSM fleet.
What we’ll filter on
- Tenancy. Shared AWS-controlled HSMs, or customer-dedicated cluster?
- Master-key material custody. AWS, customer, or customer-via-KMS?
- AWS service integration. Transparent via KMS, or direct via PKCS#11?
- Audit trail. CloudTrail, HSM logs, or both?
- Operational burden. AWS-managed, customer-managed cluster, or hybrid?
The key-management landscape
1. AWS KMS (AWS-managed CMKs). Keys AWS creates and manages on your behalf. Rotated annually by AWS. Zero configuration. Useful as the free-tier floor; not flexible (no custom policy, no disable, no deletion).
2. AWS KMS (customer-managed CMKs). Keys you create and control within KMS. Full key policy, rotation settings (annual or on-demand), disable/re-enable, scheduled deletion with 7-30 day pending window. Master-key material lives in AWS-operated FIPS 140-3 Level 3 HSMs; AWS cannot extract it. Free to use (usage charges apply per-request and per-key-per-month). The default for most AWS encryption needs.
3. AWS KMS (imported key material). A CMK where the key material is created by the customer off-AWS and imported into KMS. The material is still inside AWS-operated HSMs at rest but you retain a separate copy, can delete the KMS-held version, and can re-import later. Mitigates the “AWS controls the material” concern partially without moving off KMS. Imported keys can be set to expire; they do not auto-rotate.
4. AWS CloudHSM. Single-tenant, single-customer hardware HSM cluster in your VPC. FIPS 140-3 Level 3 validated. You manage the HSM user credentials, key creation, key deletion, backup, and cluster membership. Access via PKCS#11, Java Cryptography Extension, OpenSSL dynamic engine, Windows CNG/KSP. AWS handles hardware, power, connectivity, patching; you handle every logical operation. Priced per-HSM-per-hour ($1.50/hr at list).
5. AWS KMS custom key store (CloudHSM-backed). KMS surface, CloudHSM backend. Create a KMS CMK whose key material lives in your CloudHSM cluster. Applications and AWS services call KMS as normal; KMS proxies the cryptographic operation to CloudHSM transparently. Combines KMS integration (S3, EBS, etc.) with CloudHSM custody.
6. AWS KMS custom key store (external HSM). KMS surface, external-HSM backend (via XKS, External Key Store). Key material lives in an HSM outside AWS entirely (on-prem, third-party cloud), accessed over HTTPS via a customer-operated XKS Proxy. Stricter than CloudHSM-backed custom key stores because AWS never has access to the HSM at all. Requires operating an HTTPS proxy and accepting the additional latency and availability considerations.
Side by side
| Option | Tenancy | Key material | AWS service fit | Audit | Operational burden |
|---|---|---|---|---|---|
| KMS AWS-managed | Shared | AWS | ✓ | CT | None |
| KMS customer-managed | Shared | AWS (opaque) | ✓ | CT | Low (key policies) |
| KMS imported material | Shared | Customer-origin, held by AWS | ✓ | CT | Medium (rotation manual) |
| CloudHSM (direct) | Dedicated | Customer | ✗ (PKCS#11 only) | HSM logs | High |
| KMS custom key store + CloudHSM | Dedicated for keys | Customer | ✓ | CT + HSM logs | Medium-high |
| KMS external key store (XKS) | External | Customer (off-AWS) | ✓ (with latency) | CT + XKS proxy | High |
Reading the table: for most workloads, customer-managed KMS CMKs are sufficient. Specific regulatory constraints that require single-tenant HSM custody push to CloudHSM or XKS. Custom key stores are the path that preserves AWS service integration.
Where the material lives
The picks in depth
Customer-managed KMS CMKs for the general estate. For the 95% of encryption needs that don’t have regulatory single-tenant HSM constraints, customer-managed CMKs are correct. Enable auto-rotation (annual). Key policy scoping: the key policy plus IAM policy together determine access; the key policy is the primary authority (IAM policy alone isn’t sufficient without key-policy permission). Use grants for temporary delegated access (e.g. to an EC2 instance making encryption calls during a job). CloudTrail data events on kms:Decrypt capture every key-use. Cost is a few dollars per key per month plus usage.
CloudHSM-backed custom key store for the PCI CMKs. Create a CloudHSM cluster in the security VPC, two HSMs across two AZs (minimum for redundancy). Create the custom key store via KMS, connecting to the cluster. Create CMKs in the custom key store with Origin=AWS_CLOUDHSM; these look and behave like normal CMKs to S3, EBS, RDS, etc. but the key material lives in the CloudHSM cluster. The CMKs are used for the PCI-scoped encryption: the bucket holding cardholder data, the RDS cluster, the Secrets Manager secrets that hold DB passwords.
CloudHSM direct for the legacy Java client. The legacy checkout service uses Java’s JCE API to sign transactions. The JCE provider for CloudHSM lets the client use the HSM directly via Cipher.getInstance("RSA", "Cavium") or similar. No KMS involvement; the client authenticates to the HSM as a CRYPTO USER via a per-application credential. This path is used only where the application specifically needs PKCS#11-style integration; for everything else, route through KMS custom key stores.
Rotation for custom key stores. Automatic key rotation is not supported for custom key store CMKs; KMS can’t rotate key material it doesn’t hold. Rotation is operational: create a new CMK, re-encrypt data under it, retire the old CMK. Secrets Manager and S3 Bucket Keys make this less painful than it sounds; plan a yearly or two-yearly rotation cycle.
A worked PCI migration
The PCI-scoped workload is running on KMS customer-managed CMKs. The migration to CloudHSM-backed custom key store:
- Provision the CloudHSM cluster in the security VPC; activate; create CRYPTO OFFICER credentials (stored in a hardware token, not in Secrets Manager; the clue is in the name).
- Create a KMS custom key store, connect to the cluster. Validate connectivity.
- Create a new CMK in the custom key store:
pci-cardholder-data-v1. - Update the S3 bucket’s default encryption to use the new CMK. Existing objects remain encrypted under the old CMK; new writes use the new.
- Re-encrypt existing objects: a batch process lists every object, copies it in place (which triggers re-encryption under the new bucket default key). Takes several hours for a multi-TB bucket; run it as an S3 Batch Operations job.
- Do the same for the RDS snapshot chain and Secrets Manager secrets in PCI scope.
- Disable the old CMK (not delete; keep it disabled for the snapshot-restore window). Schedule deletion after the 30-day pending window.
- Update Config rules and Hub findings to verify all PCI-tagged resources reference the new CMK.
The migration touches several services but doesn’t touch application code. The applications call kms:Decrypt on the key ARN and don’t know that the ARN now points to a custom key store. The audit artefacts for the auditor: the CloudHSM cluster creation records, the CRYPTO OFFICER initialisation ceremony log, the CMK creation CloudTrail events, the S3 Batch re-encryption job results, and the disable-and-schedule-delete records for the old CMKs.
A worked incident and audit
The auditor asks: for the cardholder-data bucket, show every decrypt operation in Q3, by IAM identity and time. The answer is a CloudTrail query:
SELECT eventTime, userIdentity.arn, requestParameters.encryptionContext
FROM <cloudtrail-lake-eds>
WHERE eventName = 'Decrypt'
AND resources[0].ARN = 'arn:aws:kms:eu-west-1:…:key/<cmk-id>'
AND eventTime BETWEEN '2028-07-01' AND '2028-09-30'
ORDER BY eventTime
Because the custom key store preserves the KMS surface, the audit is the same as it would be for any other CMK: CloudTrail has the full record, and the IAM identity that invoked Decrypt is captured. Separately, the CloudHSM cluster’s own audit logs (pulled to CloudWatch Logs via the HSM agent) show the cryptographic operations on the HSM side. Two audit trails, cross-referenceable by timestamp.
What’s worth remembering
- Customer-managed KMS CMKs are the default. Key material inside AWS-operated FIPS 140-3 Level 3 HSMs; full key policy, rotation, disable/delete; every API call in CloudTrail.
- CloudHSM is single-tenant, customer-operated. You hold the CRYPTO OFFICER credentials, you manage users and keys, AWS manages the hardware.
- CloudHSM does not integrate with AWS services natively. Applications use PKCS#11, JCE, OpenSSL, or CNG libraries directly.
- KMS custom key stores bridge the two. KMS surface for service integration, CloudHSM backend for key custody.
- Custom key store CMKs do not auto-rotate. Rotation is operational (create new CMK, re-encrypt, retire old).
- External key stores (XKS) go further. Key material lives in an HSM outside AWS, accessed via a customer-operated HTTPS proxy.
- Imported key material is a middle ground. Customer-originated key material held in KMS HSMs; can be deleted and re-imported, does not auto-rotate.
- Start with “where must the material physically live?” Most workloads don’t care; some regulations do; the tool falls out of the answer.
Three places a key can live: AWS-operated shared HSM, customer-dedicated CloudHSM, or entirely off-AWS. Three service surfaces: KMS, CloudHSM direct, KMS custom key store. The combination that fits the workload is the one that answers both the compliance question and the integration question at the same time.