The situation
A platform team supports:
- AWS EC2: ~400 instances across
dev,staging,prodin eu-west-1. All have SSM agent andAmazonSSMManagedInstanceCore. - On-premises: ~200 servers in a London data centre. Mix of RHEL 8, RHEL 9, Amazon Linux 2 running on ESXi, and Windows Server 2019/2022. Connected via Direct Connect to the AWS private-network side.
- Existing patch tooling on-prem: Red Hat Satellite for RHEL, WSUS for Windows. Both working; both running on a different cadence than AWS.
- Audit finding: the compliance baseline requires the same patch approval criteria and the same retention of patch evidence across all environments. Currently, evidence is in three places (Satellite, WSUS, AWS Patch Manager) and the auditor has to reconcile manually.
The asks:
- Single patch baseline source. Approval rules defined once, applied everywhere.
- Single compliance report. One query answers “which servers are current on security patches” across on-prem and cloud.
- Maintenance windows per environment respecting data-centre change freezes as well as cloud schedules.
- Graceful degradation when the Direct Connect link drops; a partial patch run shouldn’t corrupt state.
- Gradual migration path. The team wants to move on-prem Satellite/WSUS usage to SSM over 18 months, not a flag day.
What actually matters
Before reaching for a tool, it’s worth being precise about what unification across the cloud/on-prem boundary actually demands.
The first thing worth thinking about is what “single baseline” really means. The audit finding is about approval rules, not commands. The team needs one document that says “security patches Critical and Important auto-approved after 7 days,” and a way for that document to be the source of truth for both fleets. Today there are three (Satellite, WSUS, AWS Patch Manager) and the auditor reconciles by hand. Whatever pattern is chosen, the win is collapsing three definitions into one.
The second is node identity. An on-prem server has no EC2 metadata, no IAM instance profile, no i-xxxx identifier. To be addressed by a cloud control plane, it needs some installed agent and some registered identity. The mechanics of that registration (credential bootstrap, identity lifecycle, what happens when a server is rebuilt) shape the operational story far more than the patch baseline does. Servers don’t show up by accident.
The third is connectivity assumptions. Cloud-side patching assumes the control plane can reach the node and the node can report status back. Over Direct Connect, the path is reliable but not redundant; over the public internet (NATed outbound), the path is redundant but routes traffic where the network team would rather it didn’t. A patch run that starts and then loses its control-plane connection needs to fail safely (not leave a server half-patched), and the link assumptions are part of that.
The fourth is change-window discipline. Cloud and on-prem data centres operate on different change calendars: AWS happily patches at 03:00 UTC any night of the week, but the data centre has a freeze for the week before quarterly close and a weekly Sunday-only change window. A single shared schedule punishes both sides; the design has to express “same policy, different windows” cleanly, including the case where the on-prem freeze means a baseline-approved patch waits a week longer than its AWS twin.
The fifth is reboot policy. The cloud side typically lets the patcher reboot when the patch demands it. On-prem change processes often require reboots to be separate, approved events. The patch lifecycle needs a clean separation between “install patches” and “reboot,” with each able to run on its own window.
The sixth is evidence and compliance reporting. The auditor wants one query that answers “which servers are current on security patches” across both fleets. That means patch state has to be reported from both sides into one store, with consistent fields (node id, baseline id, scan time, missing-patch list, install-time). Anything less ends with the auditor reconciling spreadsheets again.
And finally, migration shape. The team isn’t decommissioning Satellite and WSUS on day one; they want an 18-month parallel-run. Whatever path is chosen has to coexist with the existing tools without double-applying patches or contradicting their approval state.
What we’ll filter on
Filtering:
- Unified baselines across cloud and on-prem.
- Environment-specific maintenance windows.
- Single compliance dataset exportable for audit.
- Resilient to intermittent connectivity (hybrid nodes with flaky links).
- Incremental migration path from existing tools.
The hybrid-patching landscape
1. Separate tools per environment. Status quo. Fails unified baselines and single reporting.
2. SSM Patch Manager on EC2, keep Satellite/WSUS for on-prem. Partial solution; approval rules diverge between systems; audit still has to reconcile two sources.
3. SSM Patch Manager for both EC2 and hybrid, shared baselines. Single source of truth; hybrid nodes use the activations flow. Runs alongside Satellite/WSUS during migration; replaces them once confident.
4. SSM Patch Manager + Red Hat Satellite integration. SSM can use Satellite as an alternate patch source for RHEL nodes (the RHSatellite product type in baselines, or by overriding package sources). Useful when Satellite’s private Red Hat repository is the approved source and on-prem nodes can’t reach the public cdn.redhat.com.
5. Fleet Manager-only approach. SSM Fleet Manager gives the console view of patches, inventory, and session access; relies on Patch Manager underneath. Same capability, different UI.
Side by side
| Option | Unified baselines | Env-specific windows | Single compliance dataset | Connectivity-resilient | Incremental migration |
|---|---|---|---|---|---|
| Separate tools | ✗ | ✓ | ✗ | ✓ | ✓ |
| EC2-only Patch Manager | ✗ (cloud only) | ✓ | ✗ | ✓ | ✓ |
| Patch Manager (EC2 + hybrid) | ✓ | ✓ | ✓ | Partial (see below) | ✓ |
| + Satellite integration | ✓ | ✓ | ✓ | Partial | ✓ |
| Fleet Manager console | ✓ | ✓ | ✓ | Partial | ✓ |
SSM Patch Manager with hybrid activations is the sustainable pick. Satellite integration for RHEL is complementary.
How hybrid activations sit alongside EC2
The pick in depth
Activations by environment and location. Rather than one activation covering all on-prem servers, the team creates four:
onprem-prod-linux: tags on registrationPatch Group=prod-linux, Environment=prod, Location=onprem, OSFamily=linuxonprem-prod-windows: tags on registrationPatch Group=prod-windows, Environment=prod, Location=onprem, OSFamily=windowsonprem-nonprod-linux: equivalent for non-prodonprem-nonprod-windows: equivalent for non-prod
Activation IDs are two-year-lived; rotating them is a bookkeeping task, not a constant concern. When a new on-prem server registers, it uses the activation ID+code for its environment and family; tags apply automatically; the server appears in the right patch group with the right baseline.
Baselines (same as EC2). The four baselines from the cloud side (prod-linux, nonprod-linux, prod-windows, nonprod-windows) apply to both EC2 and hybrid nodes via the Patch Group tag. The baselines’ approval rules don’t need to know the node is on-prem; SSM handles the package-manager call (yum, apt, RPM via Satellite if configured, or Windows Update).
Separate maintenance windows per environment and location. Four maintenance windows:
MW-prod-aws: Saturday 02:00 UTC, targetstag:Environment=prod AND tag:Location=cloud.MW-prod-onprem: Sunday 22:00 local (23:00 UTC in summer), targetstag:Environment=prod AND tag:Location=onprem.MW-nonprod-aws: Tuesday 12:00 UTC.MW-nonprod-onprem: Tuesday 18:00 local.
On-prem windows respect the data centre’s change freeze (no Friday 18:00 → Monday 06:00 activity) while cloud windows run on the cloud schedule. The baselines are shared; the scheduling is independent.
Satellite integration for RHEL. The on-prem RHEL servers pull packages from a Red Hat Satellite server, which mirrors the approved Red Hat content. The AWS-RunPatchBaseline document accepts the RebootOption parameter and the underlying patch engine uses the configured package manager; on RHEL with Satellite registered, yum update pulls from Satellite’s repositories. The Inspector-flagged CVEs in AWS Patch Manager’s compliance view reflect what Satellite’s repositories have made available, keeping both systems aligned.
Resilient to Direct Connect blips. A 30-second Direct Connect outage during a maintenance window: the SSM agent on the on-prem server retries the control channel; Patch Manager’s execution is tolerant of short disconnects because the agent buffers status locally and uploads when reconnected. A 30-minute outage: the maintenance window times out for that node; the node’s local dnf update (or equivalent) may have completed, but compliance state is stale until reconnection. The next daily scan picks it up and marks it COMPLIANT (or not).
A flag-day outage (DC link down for a day) means compliance state is frozen; it’s not wrong, just stale. Alarms on “managed nodes that haven’t reported in 24 hours” detect the condition; the compliance report can exclude unreachable nodes or include them with a UNREACHABLE status, depending on audit preference.
Migration path
The 18-month migration from Satellite/WSUS to SSM is incremental:
Months 1-3: Install SSM agent on all on-prem servers; register via activations; start compliance scans. Satellite/WSUS continue to handle installation. Effect: SSM has read-only compliance data on the on-prem fleet, visible in the same dashboards as EC2.
Months 4-9: Pilot a subset of on-prem Linux servers on SSM-driven installation. Start with non-prod; validate reboot timing, patch success rates, rollback processes. Keep WSUS/Satellite for the rest.
Months 10-15: Expand to prod Linux; begin pilot on Windows. Windows typically lags because WSUS approval workflows are more embedded in the on-prem change-management culture; parity requires moving the approval process into SSM or preserving an external approval gate via a maintenance window trigger.
Months 16-18: Complete Windows migration. Decommission WSUS and Satellite (or keep Satellite as the approved repository source for RHEL while SSM handles the execution and compliance).
At every stage, the compliance report pulls from the SSM side; Satellite and WSUS stop being evidence sources as soon as SSM has coverage.
What’s worth remembering
- Hybrid activations turn on-prem servers into SSM managed nodes. The
mi-*identifier is first-class; patch baselines, maintenance windows, Run Command, and Session Manager all work on it. - Patch Group tags are the baseline binding. Same tag + same baseline regardless of node origin. Four baselines (two environments × two OS families) typically cover the common case.
- Separate maintenance windows per environment and location. Cloud schedules and data-centre schedules rarely align; one window per combination keeps them independent.
- Compliance data goes through Resource Data Sync. Same S3 destination for both fleets; Athena queries answer “who’s non-compliant” without caring which environment.
- Direct Connect + VPC endpoints is the secure path.
ssm,ssmmessages,ec2messagesendpoints deployed in a shared-services VPC; on-prem traffic never touches the public internet. - Activations have a TTL. Two years by default; rotate before expiry. New registrations after expiry fail; existing
mi-*identities persist. - Advanced instance tier for scale. Standard tier caps at 1,000 managed nodes; advanced tier raises the ceiling and unlocks some features (notably Session Manager for hybrid nodes). Switch once; forget.
- Incremental migration beats flag-day. Start with SSM as the compliance observer while Satellite/WSUS still do installation; move installation to SSM a tier at a time; retire old tooling once parity is demonstrated.
The on-prem fleet stops being a separate compliance story; it becomes another patch group in the same SSM account. The auditor’s reconciliation across three systems becomes one Athena query. The Direct Connect link is the network detail that makes the architecture boring, which is the goal: patching shouldn’t care whether the server’s IP is in the VPC or the data centre rack.