The situation
The internal CA problem:
- A 2018-era Windows CA is about to expire; nobody wants to renew.
- 120 microservices each need leaf certs for mTLS; currently issued manually every 90 days by a rotating on-call.
- External-facing TLS is separate (ACM-public with DV from ACM; Route 53 validation), and is fine.
- Compliance requires that key material for the root CA lives in hardware and has restricted access.
- Leaf certificates should have short lifetimes (<= 30 days) to limit the blast radius of compromise.
- Revocation must be possible and visible to clients; CRL or OCSP.
The target architecture:
- A root CA with a long-lived key (10-20 years), rarely used, kept offline except for signing subordinates.
- Subordinate CAs issued under the root, each with its own scope (e.g. per environment or per business unit).
- Leaf certificates issued from the subordinates to workloads with short lifetimes and automatic renewal.
- Revocation visible to consumers via CRL distribution points in S3.
AWS Certificate Manager Private CA (ACM PCA) builds this shape natively. The design choices are the post.
What actually matters
A CA hierarchy is a trust pyramid. At the bottom are the leaf certificates that workloads present; they’re numerous, short-lived, and get compromised or replaced routinely. Above them are subordinate CAs that sign the leaves; they’re fewer, longer-lived, and compromising one means every leaf it signed has to be revoked or expire. At the top is the root CA whose signature is the entire basis of trust; compromising it means the whole PKI collapses.
The design work is deciding how many tiers the pyramid has, how the tiers relate, and what scopes each tier covers. Two considerations drive most of it.
Blast radius. A single CA signing all leaves means one compromise equals full reset. Multiple subordinate CAs with scoped responsibilities (say, one for production, one for dev, one per business unit) mean a compromise of a subordinate only revokes the leaves it signed. This argues for multiple subordinates.
Operational cost. A managed CA carries a flat monthly fee per CA plus per-cert issuance. Ten subordinates means ten monthly fees. The right number of subordinates is “enough to bound blast radius, not so many that the bill dominates design decisions.”
The other big decisions:
Key algorithm and length. RSA 4096 is the classical choice; ECDSA P-384 is smaller and faster on modern hardware. The root’s algorithm has to outlast the root (20 years is a long time for cryptography), so erring conservative is reasonable. The subordinates can match the root or differ.
Template type. Templates at each tier are the mechanism by which the path-length constraint is enforced. A subordinate that can only sign leaves (path length 0), not further subordinates, prevents whoever operates that subordinate from spawning their own sub-hierarchy.
Revocation mechanism. CRL (publish a signed list of revoked serial numbers at a known URL, refreshed periodically) or OCSP (respond to live queries for a single cert’s status). CRL is cheaper and simpler; OCSP is more timely. A useful managed CA supports both.
CA tier. Managed services often distinguish a general-purpose tier (any cert validity, up to the CA’s own validity) from a short-lived tier (cert validity capped at a few days, cheaper per CA). For a service mesh where every cert rotates daily, the short-lived tier fits; for a traditional 90-day leaf cert, the general-purpose tier is right.
What we’ll filter on
- Hierarchy depth. How many tiers?
- Key algorithm. RSA or ECDSA, what length?
- Template enforcement. What can each tier sign?
- Revocation. CRL, OCSP, or both?
- CA tier. General-purpose or short-lived?
The private-CA landscape
1. ACM Private CA (general-purpose). Managed CA service. Creates root or subordinate CAs with configurable key algorithm, validity, and templates. Issues certificates up to the CA’s own validity minus a grace period. CRL to an S3 bucket with a CloudFront-facing CDP (CRL Distribution Point); optional OCSP responder. Priced ~$400/month per CA in addition to per-issued-certificate fees.
2. ACM Private CA (short-lived). Cheaper CA tier, cert validity capped at 7 days. Designed for service-mesh and ephemeral-workload use cases where certs rotate frequently.
3. ACM (public). Managed public certificates. ACM issues public TLS certs with DNS or email validation, integrated with ALB, CloudFront, API Gateway. Free. Separate service from Private CA; they share the ACM branding but different use case.
4. ACM Private CA (external signing of the root). ACM PCA supports creating a subordinate CA whose signature is done by an external CA (e.g. the customer’s on-prem root). This keeps the root outside AWS; the subordinate lives in ACM PCA.
5. Roll-your-own CA (HashiCorp Vault, step-ca, cfssl). Non-AWS alternatives. Cheaper per-CA cost; higher operational burden; used by orgs with cross-cloud or on-prem-heavy estates.
6. AWS IAM Roles Anywhere trust anchors. Not a general CA service but uses ACM PCA (or customer-provided) certificates as the trust anchor for workload authentication outside AWS. A related-but-separate use case.
Side by side
| Option | Tier | Key options | Templates | Revocation | Cost shape |
|---|---|---|---|---|---|
| ACM PCA general-purpose | Root, Sub, Leaf | RSA 2048/4096, ECDSA P-256/P-384 | Many | CRL + OCSP | ~$400/mo + per-cert |
| ACM PCA short-lived | Sub, Leaf (≤7d) | RSA 2048/4096, ECDSA P-256/P-384 | Few | CRL + OCSP | Cheaper/mo + per-cert |
| ACM public | Leaf only | Managed | N/A | AWS-managed | Free |
| External root + ACM sub | Sub (AWS), Root (external) | Customer’s root choice | Many | CRL + OCSP | ~$400/mo + per-cert |
| Roll-your-own | Any | Full flexibility | Custom | Custom | Operational |
Reading the table: for an AWS-native service mesh, ACM PCA is the default. Root tier general-purpose; subordinates short-lived if the cert rotation is frequent, general-purpose if not.
A two-tier hierarchy with scoped subordinates
The picks in depth
Root CA: general-purpose, RSA 4096, 20 years, disabled between signings. Create the root CA in an isolated AWS account owned by the security team. Use the RootCACertificate_V1 template; self-signed during creation. RSA 4096 errs on the side of conservative cryptography for a 20-year artefact. Mark the CA DISABLED as soon as it’s done signing its subordinates; re-enable only when needed to sign a new subordinate or renew one. The account’s SCP allows only acm-pca:IssueCertificate and related actions to a tightly scoped RootCaOperator role, so even an account compromise doesn’t silently issue rogue certificates.
Subordinate CAs: short-lived tier, ECDSA P-384, 5 years, PathLen0. Three subordinates: production, development, and IoT. Short-lived tier because mesh certs rotate every few days; PathLen0 so the subordinates cannot spawn their own sub-subordinates. ECDSA P-384 is smaller and faster than RSA 4096 while providing similar security margin, and fits better in TLS ClientHello for latency-sensitive services. Separate subordinates per scope so a compromise of, say, the dev subordinate requires revoking only dev leaves.
Leaf issuance: ACM integration for AWS-native resources, PKCS#7 for everything else. ALBs, API Gateway, CloudFront, App Runner can all request a cert from a private CA via ACM (aws acm request-certificate --certificate-authority-arn <subordinate-arn>); ACM handles renewal automatically. For the service mesh pods, an Envoy proxy or istio-csr calls acm-pca:IssueCertificate directly via a per-pod IAM role (bound via IRSA on EKS), caches the cert for its lifetime, and renews on schedule. For IoT devices, AWS IoT Core’s just-in-time registration flow uses the subordinate to sign device-presented CSRs.
Revocation: CRL to an S3 bucket with a CloudFront CDP. Configure each subordinate’s CRL to publish to a known S3 bucket, fronted by a CloudFront distribution whose URL is in the certificate’s CRL Distribution Points extension. CRL refresh interval is 24 hours by default; for high-security environments shorten to 1 hour. OCSP can be enabled additionally for real-time checks, at extra cost. The CRL’s public URL should be reachable without TLS-to-the-CRL-endpoint bootstrapping chicken-and-egg; the CRL bucket should present a certificate from a different CA (typically ACM public) so clients can validate the CRL-server cert without needing to already trust the CRL.
A worked issuance
A new microservice orders-api deploys in the production EKS cluster. Its Pod has a service account bound via IRSA to an IAM role with acm-pca:IssueCertificate on the production subordinate’s ARN. On startup, istio-csr calls:
aws acm-pca issue-certificate \
--certificate-authority-arn arn:aws:acm-pca:eu-west-1:…:certificate-authority/prod-sub \
--csr fileb://csr.pem \
--signing-algorithm SHA384WITHECDSA \
--validity Value=3,Type=DAYS \
--template-arn arn:aws:acm-pca:::template/EndEntityCertificate/V1 \
--idempotency-token <uuid>
The subordinate signs the CSR, returns the certificate ARN, and istio-csr then retrieves it via get-certificate. The cert is installed into Envoy’s sidecar, valid for 3 days. A cronjob in the mesh control plane renews every pod’s cert at 50% lifetime. Total issuance latency: sub-second.
A worked revocation
A security finding: a container image in production had a credential embedded; the image’s running pods may have leaked. Revocation decision: revoke every cert issued to the affected service in the last 30 days.
aws acm-pca revoke-certificate \
--certificate-authority-arn <prod-sub-arn> \
--certificate-serial 00:11:22:33:… \
--revocation-reason KEY_COMPROMISE
The subordinate adds the serial to its CRL; on the next CRL refresh (minutes, given the hourly interval), the published CRL in S3 contains the revoked serial. Consumers pulling the CRL and validating peer certs against it will reject any peer presenting the revoked serial. For faster propagation, OCSP returns the revocation immediately.
What’s worth remembering
- Design for blast radius. Multiple scoped subordinates limit the reach of a compromise; PathLen0 prevents subordinate proliferation.
- Root CA should rarely do anything. Disabled by default, enabled only to sign subordinates or renew itself.
- Subordinate CA tier choice matters. Short-lived tier for service mesh certs (≤7-day validity); general-purpose for longer-lived leaves.
- ACM is the integration point for AWS-native resources.
aws acm request-certificate --certificate-authority-arnlets ALBs, CloudFront, API Gateway, App Runner consume certs from Private CA; ACM auto-renews. - CRL distribution points go in S3 + CloudFront. Serve the CRL over a different trust chain so the CRL-server cert validates without bootstrapping issues.
- Per-subordinate IAM scoping. Each consumer has
acm-pca:IssueCertificateonly on the subordinate they’re entitled to issue from; no dev service can accidentally issue a prod cert. - External root + AWS subordinate is a real option. For orgs with an existing on-prem root, ACM PCA can run just the subordinates while the root stays off-AWS.
- Short leaf validity is the primary containment lever. 3-day certs with automated renewal mean a leaked private key expires before it can be widely exploited.
A CA hierarchy is a pyramid of compromise containment. The root rarely signs; the subordinates scope the blast radius; the leaves are disposable. ACM PCA builds it; the design is choosing the shape.