Operationalising Inspector Findings

November 15, 2028 · 14 min read

The situation

A security team inherits an estate: eight accounts, four Regions, a mix of Lambda functions, ECS and EKS workloads on container images, and ~200 EC2 instances across two auto-scaling groups. Inspector was enabled months ago by an enthusiastic contractor. The dashboard is a sea of findings and nobody is quite sure what to do with them.

The concrete issues:

Findings volume. Inspector has produced ~4,800 findings across the estate. Some are critical, most are high or medium, a handful are informational. The team cannot remediate 4,800 things; they need prioritisation.
Remediation ownership. Findings are attached to resources, but resources are owned by product teams who don’t read the security console. Routing findings to the team that can actually fix them is unsolved.
Base-image rot. Container images built six months ago are now showing critical findings against packages that weren’t vulnerable at build time. Rebuild cadence is the fix, but the current pipeline rebuilds only when product code changes, not when base images update.
Lambda coverage. Inspector’s Lambda scanning was enabled late; there’s a gap where old Lambda versions were never scanned. The team is unsure whether to scan every version or only the current one.
Cost. Inspector charges per-resource per-scan, and somebody has asked whether the bill is justified.

All of this is really one question: how does Inspector actually work, and how does a team operationalise its output?

What actually matters

A vulnerability scanner is a simple idea with subtle operational failure modes. Every scanner walks an inventory, compares software versions against a CVE database, and emits findings. Where they differ is what they can walk, how continuously they walk it, and how much of the scan result is signal vs noise.

The first thing to think about is coverage breadth. A scanner that only sees EC2 is useless if the workload is on Lambda; one that only sees container images in a registry misses the running containers that have been live for weeks. The relevant question is which compute surfaces a scanner walks, and on what trigger.

The second is continuity vs. point-in-time. A one-off scan tells you what was vulnerable yesterday. Continuous scanning, where new CVE disclosures automatically re-evaluate existing resources, tells you what’s vulnerable right now. This is a big deal. It means “the function I wrote in 2025 is suddenly a finding in 2028 because a new CVE landed against its crypto library” is a thing that just happens when continuous re-evaluation is the default.

The third is signal prioritisation. 4,800 findings is not a scanner problem; it’s an operationalisation problem. Every finding has a severity and a base CVSS score, but the signal that actually matters combines CVSS with exploitability context and the network reachability of the finding, a critical RCE on a service that only listens on a loopback interface is not the same as a critical RCE on a public-facing ALB target. The scanner can supply some of that context; the rest is what the team owning the workload knows.

The fourth is operational plumbing. Findings have to flow into the org’s posture and event pipelines. The routing from “finding lands” to “engineer sees a ticket” has to be built; without it, findings decay in a dashboard forever. Tag-based routing (send findings on resources tagged Team=payments to the payments Jira project) is the usual shape.

The fifth is scan depth. Package scanning (compare installed package versions against the CVE feed) is the floor. Deep code scanning (look for vulnerable code patterns beyond package versions, like hardcoded secrets, insecure defaults) is a separate layer with its own cost, and not always worth it for every workload.

What we’ll filter on

Coverage. EC2, ECR containers, Lambda functions, or all three?
Continuity, re-scan on new CVE disclosure, or point-in-time?
Triggers, what causes a scan, and how often?
Severity signal, raw CVSS, an exploitability-adjusted risk score, or network-reachability context?
Routing, how does a finding become a ticket for the right team?

The vulnerability-scanning landscape

1. Amazon Inspector. Managed vulnerability scanner covering EC2, ECR images, and Lambda functions. Enabled per-account, per-Region, or org-wide via a delegated administrator. Agent-based for EC2 (uses the SSM Agent and the Amazon Inspector plugin; no separate install if SSM Agent is current). Agentless for ECR (reads image manifests at push; re-scans continuously) and Lambda (reads function code and layer content from the Lambda control plane). Findings stream to Security Hub and EventBridge.

2. ECR native scanning (basic). ECR’s built-in scanning, free, at-push only, uses the Clair open-source scanner. Good as a floor; limited against anything modern. Inspector replaces it when enabled; the same ECR API returns Inspector’s results.

3. Third-party scanners. Snyk, Prisma, Qualys, Tenable. Marketplace-integrated or standalone. Priced per-node or per-scan. Useful when the organisation has a cross-cloud estate and wants one scanner everywhere, or when a specific capability (SBOM depth, SCA on source code) matters more than AWS-native integration.

4. Systems Manager Patch Manager. Not a CVE scanner in the Inspector sense. Patch Manager applies OS patches on a schedule. Complementary: Inspector tells you what’s vulnerable, Patch Manager is one of the remediation mechanisms for the EC2 findings.

5. GuardDuty Malware Protection. Scans EBS volumes for known malware on suspicious findings. Different problem: “is there a backdoor installed” vs. “is there a known CVE in the shipped software.”

Side by side

Option	EC2	Containers	Lambda	Continuous	Routing
Amazon Inspector	✓ (SSM Agent)	✓ (ECR)	✓ (functions + layers)	✓	Security Hub + EventBridge
ECR basic scan	✗	✓ (at-push only)	✗	✗	ECR API
Third-party	✓	✓	±	±	Vendor console
SSM Patch Manager	(applies patches)	✗	✗	(schedule)	SSM console
GuardDuty Malware	✓ (EBS scan)	✗	✗	Event-triggered	GD finding

Reading the table: Inspector is the AWS-native answer and the only one that’s continuous across the three workload shapes the team actually runs. Everything else is either an upstream input (Patch Manager) or an adjacent problem (Malware Protection, third-party SCA).

Inspector’s three scanning paths

Three scanning paths, one findings stream, every finding is continuously re-evaluated as new CVEs land.

The picks in depth

EC2 scanning via SSM Agent. Inspector’s EC2 scanning depends on the SSM Agent being present, current, and able to reach the SSM endpoints (which means AmazonSSMManagedInstanceCore on the instance profile, and either public internet egress or VPC endpoints for SSM, ssmmessages, and ec2messages). Once the agent is there, Inspector enrolls the instance automatically; no per-instance configuration. Scans are continuous, but the inventory gathering is triggered by SSM inventory, run on an interval, ideally daily. The Inspector plugin reads the installed-package list and sends it to the Inspector service, which compares against the CVE feed server-side. This matters: the instance doesn’t download the CVE feed; Inspector doesn’t SSH in. The scan footprint on the instance is tiny.

ECR scanning at push and continuous. Enable enhanced scanning on each ECR repo (aws ecr put-registry-scanning-configuration --scan-type ENHANCED --rules '[{"repositoryFilters":[{"filter":"*","filterType":"WILDCARD"}], "scanFrequency":"CONTINUOUS_SCAN"}]'). At-push scanning happens on every docker push; continuous re-scanning re-evaluates every image in the repo against the latest CVE feed. This is the scanner that catches base-image rot, an image that was fine six months ago can develop critical findings without rebuilding. The finding is tied to the image digest, not the tag, so latest moving to a new digest doesn’t lose the finding on the old digest. ECR lifecycle rules should be configured to remove digests that haven’t been pulled in N days so the scanning cost doesn’t grow unbounded.

Lambda scanning, package and optional code. Two tiers: Standard (SCA against the function’s zip or container image, and its layers) and Code Scanning (deep scan of the function’s source for hardcoded secrets, vulnerable patterns, overly permissive IAM policy references). Standard scans are the default when Lambda scanning is enabled; code scanning is an opt-in. Coverage is the function’s $LATEST by default, but versions can be scanned too (each scanned version incurs a scan cost). For most teams, $LATEST is sufficient, old versions are not in production unless aliased, and if they’re aliased, make them scannable explicitly.

Findings routing via EventBridge + tags. Inspector findings fire as Inspector2 Finding events on the default event bus. A rule with a filter pattern on detail.severity routes HIGH and CRITICAL findings to different targets; a rule on detail.resources[0].tags.Team can dispatch to per-team Jira queues via Lambda. The tag-based routing requires two discipline points: resources must be tagged consistently, and the tag must be propagated onto the finding. Inspector preserves the resource’s tags on the finding by default; the pipeline only needs to read them.

A worked remediation loop

Tuesday, 09:30. Inspector emits a CRITICAL finding on a Lambda function prod-payments-process in the 111122223333 account: CVE-2024-7777 in cryptography version 42.0.1, fixed in 43.0.0.

EventBridge rule matches (detail.severity IN ["HIGH","CRITICAL"]), invokes a Lambda that reads the finding, extracts resources[0].tags.Team (value: payments), and calls Jira API to open a ticket in the PAY project. SLA 14 days (the org’s CRITICAL SLA), linked to the finding’s ARN.

09:35. Payments team’s on-call sees the ticket, runs a local test: the function’s requirements.txt pins cryptography==42.0.1. Bumps to 43.0.0, runs tests, deploys via the normal pipeline.

10:15. New Lambda version deploys. Inspector’s continuous Lambda scanner re-evaluates within the next scan cycle (~15 minutes). Finding transitions to CLOSED with fixAvailable. EventBridge rule on status-change updates the Jira ticket to Resolved; if the status is still OPEN after SLA, a second rule escalates.

The loop closes without a human reading the finding twice. The remediation is the engineer’s work; the routing, escalation, and closure are pipeline work.

A worked prioritisation triage

The 4,800-findings backlog is attacked with a single query in Security Hub:

Severity = CRITICAL, ProductFields.aws/inspector/FindingRiskScore > 70, and Resources[0].Details.AwsEc2Instance.IamInstanceProfileArn or Lambda equivalent contains a role with AdministratorAccess → 18 findings. Top of the queue.
Severity = CRITICAL, network-reachable from 0.0.0.0/0 (Inspector publishes InspectorScore with network reachability context) → 47 findings. Next.
Severity = HIGH, on production-tagged resources (tags.Environment = production), fixAvailable = YES → 412 findings. Sprint backlog material.
Severity = MEDIUM or below on non-production → 3,600 findings. Suppressed with a suppression rule expiring in 90 days, re-reviewed quarterly.

The reduction is 4,800 → 65 active, with everything else visible but dedented. The risk-score filter is doing most of the work; 4,800 findings by raw severity is a mountain, but by risk context it’s a small pile.

What’s worth remembering

Inspector covers EC2, ECR images, and Lambda functions. Enable all three from the same delegated admin account for org-wide coverage.
EC2 scanning is SSM-based. The SSM Agent must be present, current, and able to reach SSM endpoints. No agent, no scan.
ECR and Lambda scanning are agentless. Inspector reads image manifests and function packages from AWS control planes; no instrumentation on the workload itself.
Continuous scanning is the headline feature. New CVE disclosures re-evaluate existing resources automatically; “this library became vulnerable yesterday” is a finding that appears without action.
Inspector risk score is a better filter than raw CVSS. It combines severity with exploitability and network-reachability context; filter on it for triage.
Findings emit to Security Hub (ASFF) and EventBridge. Routing to teams is tag-based; resource tags propagate onto findings automatically.
Lambda code scanning is an opt-in tier beyond SCA. It catches hardcoded secrets and insecure patterns, at extra per-function cost.
ECR lifecycle rules bound the scan cost. Images not pulled in N days should be pruned; otherwise continuous scanning cost grows unbounded as images accumulate.

Inspector is the scanner that’s already on. The work is less “enabling the scan” and more “closing the loop from finding to ticket to fix to verification.” The 4,800-findings problem is always a routing problem, and the pipeline that solves it is the actual deliverable.

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.