How to Turn Structured Logs Into CloudWatch Metrics

October 04, 2027 · 17 min read

The situation

Acme’s checkout service logs everything in structured JSON. A typical log line:

{
  "timestamp": "2027-06-15T09:14:33.012Z",
  "level": "INFO",
  "service": "checkout",
  "requestId": "req-abc123",
  "route": "/api/checkout",
  "status": 200,
  "latencyMs": 142,
  "customerId": "c-0xyz",
  "region": "eu-west-1"
}

Error paths log level: "ERROR" with additional context. External dependency calls log a dependency field. Feature-flag evaluations log flag and flagResult. The schema is consistent, version-controlled, and reviewed in PRs.

The operational problem: there are no CloudWatch metrics for this service. When the team wants to see “error rate by route over the last day,” they open Logs Insights, write a query, and wait 30 seconds for it to return. The on-call dashboard is seven of these queries side by side and is too slow to be useful during an incident. Attempting to build a metric alarm on “error rate > 1%” from Logs Insights isn’t possible; alarms need a metric.

Three ways to fix this.

Instrument the application. Add a metrics library (OpenTelemetry, the AWS SDK’s PutMetricData, or a lower-level client) that emits metrics directly. Good shape; requires code changes in every service, a deployment, and potentially a library standardisation exercise across the org.

CloudWatch agent with custom metrics. Install the agent, configure it to scrape a local metrics endpoint (StatsD or CollectD) or a file, and publish. Works; another piece of infrastructure per service; agent configuration becomes another thing to manage.

Metric filters on the log group. CloudWatch Logs metric filters scan log events as they arrive, pattern-match, and emit a CloudWatch metric. No code changes, no agent, no new infrastructure. The metric is derived from the logs already being written.

For a service that already logs cleanly structured JSON, metric filters are the shortest path. The question is whether the data they can produce is rich enough for the operational needs, and where the trade-offs sit.

What actually matters

Before reaching for a specific mechanism, it’s worth being explicit about where metrics should come from and what the trade-offs look like across that decision.

The first thing worth thinking about is what the metric is for. Metrics are not the same shape as logs: a metric is a small numeric timeseries with bounded cardinality that powers dashboards and alarms in seconds, while a log is an open-ended structured record that powers ad-hoc investigation. The operational dashboard and the pager want metrics; the post-incident review and the customer-specific question want logs. A solution that produces metrics for “error rate by route” is doing the right job for that question and the wrong job for “why did customer 0xyz get a 500 at 14:03 yesterday?”. Both needs are real; both belong somewhere. Don’t try to make the metric serve the log’s job.

The second is latency from emission to alarm. A pager rule needs metrics that arrive within a minute or two of the underlying event; a dashboard needs renders measured in seconds, not “wait for the Logs Insights query to come back.” Anything in the path that batches, scans, or queries at read time is too slow for these jobs. The right shape is “emission cheap at write time, read instantly at query time”, which means writing the metric, not deriving it on demand.

The third is cost shape. Three places to pay: at the point of emission (per API call, per ingest event), per unique metric timeseries (each dimension-value combination is its own series), and per query/alarm. Each mechanism shifts cost across those three. PutMetricData is cheap per timeseries but expensive per call at volume; agents amortise the per-call cost across many emissions but cost per host; deriving metrics from log lines is cheap per emission but pays for every log line scanned. The right answer depends on where the volume is.

The fourth is cardinality discipline. A metric dimensioned by customerId for a million customers is a million billable timeseries and a dashboard nobody can read. Whatever mechanism is chosen has to make it easy to keep dimensions to a small, bounded set (route, region, status code) and push high-cardinality drill-downs back into log searches. The mechanism that makes the cheap thing easy and the expensive thing hard is the one that survives a year in production without a surprise bill.

The fifth is whether code changes are on the table. Adding metric instrumentation to every service is a real piece of work: a library choice, a deployment, a code review across every team. If the data the operator wants is already being written somewhere (like a structured log line), then deriving the metric from that existing stream avoids the code change entirely. The strongest argument for log-derived metrics is the presence of already-structured logs; without that, it’s a second-best choice.

The sixth is who owns the source of truth. If the metric exists because of an emission call in application code, the application owns the metric definition; if the metric exists because of a filter on a log group, the platform team (or whoever owns the log group) can define metrics without touching the application. That’s a power and a danger: a platform team can answer operational questions without app-team involvement, but the app team can break the metric by changing the log schema without realising. The chosen mechanism implies a contract between teams; agreeing the contract is the work, not the YAML.

The seventh is backfill. Some mechanisms apply only to events from the moment of creation onward; others can be replayed against historical data. For a brand-new service, this doesn’t matter; for an existing service, it determines whether the first week of charts is real or sparse. Knowing this in advance is the difference between “the dashboard is ready” and “wait seven days for the dashboard to populate.”

What we’ll filter on

No code changes: does the service need to be modified?
No new infrastructure: does it require an agent, sidecar, or collector?
Low latency: how quickly does an emission appear as a metric?
Dimension support: can values from the log event become metric dimensions?
Structured-log friendly: does it parse JSON without custom regex?
Cost shape: per-metric cost, per-emission cost, or flat?

The metric-source landscape

Metric filters on log group. No code, no agent. Filter runs at ingest; metric emitted within seconds. Up to 3 dimensions per filter, JSON or text pattern matching. Costs: a small filter-evaluation fee and the standard custom-metric cost per unique dimension combination.
EMF (Embedded Metric Format). A log shape that CloudWatch recognises and auto-extracts into metrics. The application writes a specially-formatted JSON log line; the CloudWatch Logs service notices the _aws.CloudWatchMetrics stanza and emits the named metrics directly. Requires code changes (to format logs as EMF) but no explicit PutMetricData call, and supports up to 30 metrics per log event. Very cheap at volume compared to per-call PutMetricData.
PutMetricData API from the application. Direct SDK call from code. Total control; costs scale with API calls (and PutMetricData is not free at volume); requires code and library changes.
CloudWatch agent with StatsD or CollectD. Agent installed on each instance or container, scraping a local endpoint. Standard for EC2; more friction for Lambda and Fargate.
OpenTelemetry / ADOT. Open-source instrumentation with an AWS Distro. Metrics, traces, logs in one library. Multi-cloud portable; most expressive; biggest engineering investment.
Third-party observability (Datadog, New Relic, Honeycomb). Outside AWS, different cost model, different features. Adjacent.

Side by side

Option	No code changes	No new infra	Low latency	Dimension support	JSON-friendly	Cost shape
Metric filters	✓	✓	Seconds	3 per filter	✓	Per-filter + per-metric
EMF	✗ (but minimal)	✓	Seconds	Many (via `_aws.Dimensions`)	✓ (required)	Per-metric (efficient)
PutMetricData	✗	✓	Seconds	✓	n/a	Per-API call
CloudWatch agent	✓ (for scraped)	Agent needed	Seconds	✓	via metric source	Per-metric
OpenTelemetry	✗	Collector needed	Seconds	✓	n/a	Per-metric + collector
Third-party	Varies	Agent needed	Varies	✓	n/a	Subscription

For Acme’s case (structured JSON logs already being written, no agent overhead desired, metrics needed yesterday), metric filters are the immediate win. EMF is the upgrade path when the team wants richer metric support without paying per-API-call for PutMetricData at high volume.

From log line to dashboard

A single log event is evaluated by each of the log group's metric filters in turn. Matching events emit a value with extracted dimensions; CloudWatch Metrics stores them in the named namespace; dashboards and alarms read the metric without touching the log group.

The filter in depth

Creating a metric filter is one API call per filter:

aws logs put-metric-filter \
  --log-group-name /aws/ecs/checkout-service \
  --filter-name errors-by-route \
  --filter-pattern '{ $.level = "ERROR" }' \
  --metric-transformations '[
    {
      "metricName": "Errors",
      "metricNamespace": "acme/checkout",
      "metricValue": "1",
      "defaultValue": 0,
      "unit": "Count",
      "dimensions": {
        "route":  "$.route",
        "region": "$.region"
      }
    }
  ]'

Four fields earn their keep. metricValue: "1" means “count each matching event”; metricValue: "$.latencyMs" would take the numeric value from the log event. defaultValue: 0 emits zero when no events match in a period: the reason an alarm of “Errors > 10” can use treatMissingData: notBreaching cleanly. unit: "Count" tags the metric for display. dimensions turns one log line into a metric emission dimensioned by the extracted fields.

A metric filter that emits a numeric value (latency) is subtly different from one that counts events (errors):

aws logs put-metric-filter \
  --log-group-name /aws/ecs/checkout-service \
  --filter-name latency \
  --filter-pattern '{ $.latencyMs = * }' \
  --metric-transformations '[{
    "metricName": "Latency",
    "metricNamespace": "acme/checkout",
    "metricValue": "$.latencyMs",
    "unit": "Milliseconds",
    "dimensions": { "route": "$.route" }
  }]'

This emits one datapoint per matching log event, with the value being the latencyMs field. The reader (dashboard or alarm) picks the statistic: Average for mean latency, p99 for tail latency, Maximum for worst-case, SampleCount for “how many requests.” One metric, many useful statistics.

Alarms on metric-filter-derived metrics are identical to alarms on any other metric:

aws cloudwatch put-metric-alarm \
  --alarm-name checkout-error-rate-high \
  --namespace acme/checkout \
  --metric-name Errors \
  --dimensions Name=route,Value=/api/checkout Name=region,Value=eu-west-1 \
  --statistic Sum \
  --period 60 \
  --evaluation-periods 2 \
  --datapoints-to-alarm 2 \
  --threshold 10 \
  --comparison-operator GreaterThanThreshold \
  --treat-missing-data notBreaching \
  --alarm-actions arn:aws:sns:eu-west-1:111122223333:pager

Dimension discipline

The dimension decision is where cost and query usefulness collide. A metric with a customerId dimension and a million customers produces a million unique metric timeseries. Each is billed; the dashboard has a million lines.

Two rules that survive contact with reality:

Never use unbounded-cardinality fields as dimensions. customerId, orderId, requestId, sessionId are not dimensions. They belong in log searches, not metrics.

Three dimensions or fewer per metric. The CloudWatch limit is three; hitting it is already a signal the metric is doing too much. route + region + status is a reasonable high-water mark.

For the “error rate by customer” question, the answer isn’t a customer-dimensioned metric; it’s a Logs Insights query, run ad-hoc, or a log subscription to a downstream analytics pipeline.

When EMF is the upgrade

Embedded Metric Format is the next step when:

The service already writes structured logs, and a small code change to add the _aws stanza is cheaper than maintaining a growing collection of metric filters.
You want to emit many metrics per log event. EMF supports up to 30 metrics in a single log line, which pays for itself at high event volumes.
You want histograms or percentiles on a per-event basis. EMF supports emitting arrays of values, which CloudWatch turns into high-resolution distributions.

An EMF log line looks like:

{
  "_aws": {
    "Timestamp": 1717000000000,
    "CloudWatchMetrics": [{
      "Namespace": "acme/checkout",
      "Dimensions": [["route", "region"]],
      "Metrics": [
        {"Name": "Latency", "Unit": "Milliseconds"},
        {"Name": "Errors", "Unit": "Count"}
      ]
    }]
  },
  "route": "/api/checkout",
  "region": "eu-west-1",
  "Latency": 142,
  "Errors": 0,
  "requestId": "req-abc123",
  "customerId": "c-0xyz"
}

CloudWatch Logs notices the _aws.CloudWatchMetrics stanza, extracts the metrics with their dimensions, and emits. The log event itself retains the full payload (including high-cardinality fields like requestId) for log searches; only the named metrics and dimensions become metric timeseries.

Metric filters are the unchanged-code answer. EMF is the small-code-change answer. PutMetricData is the full-SDK answer. All three coexist.

A worked migration

Acme’s checkout team goes through the three steps in a single sprint.

Week 1: deploy three metric filters (Errors, Latency, Http5xx) across the checkout-service log group. No code changes. Metrics start flowing within minutes.

Week 2: build the dashboard on the new metrics. Seven graphs, each reading Errors or Latency with appropriate statistics and dimensions. The dashboard renders in under a second. The Logs Insights queries in the old dashboard stay for cases that genuinely need log-level detail.

Week 3: add three metric alarms: high error rate, p99 latency over 1000ms, 5xx spike. All three wired to the on-call pager via SNS. The previous approach (manually watching the Logs Insights dashboard) is retired.

Month 2: for a second service that logs ten distinct metrics per request, the team moves that service to EMF rather than maintaining ten metric filters. Three weeks of library work; after that, every service using the library gets structured logs and emitted metrics in one code change.

What’s worth remembering

Metric filters derive metrics from log events. No code changes, no agent, no new infrastructure. Pattern matching on structured JSON (or unstructured text); transformation emits a CloudWatch metric per match.
Four knobs per filter: pattern, value, default, dimensions. Count by setting value to 1; measure by setting value to a JSON selector. Default value lets alarms use treatMissingData cleanly.
Three dimensions maximum per filter. Cost discipline: never use unbounded-cardinality fields as dimensions. customerId, requestId, orderId are log fields, not metric dimensions.
The reader picks the statistic. One numeric-valued metric (latency emitted per request) serves Average, p99, Maximum, Minimum, SampleCount; pick at dashboard or alarm time.
Metric filters don’t backfill. Filter sees events from creation time onward. First-week charts will be incomplete; plan accordingly.
EMF is the next step when filters get cumbersome. One log event, up to 30 metrics, efficient at volume. Small code change; keeps the log-as-truth model.
Subscription filters and metric filters coexist. Same log group can have both; different limits, different purposes. Metric filters produce metrics; subscription filters forward events to destinations.
Logs Insights stays as the ad-hoc query layer. High-cardinality questions (error rate per customer) belong in log search, not in a high-cardinality metric that will blow up the bill.

A single PutMetricFilter call per metric turns a structured-log service into a metrics-emitting service without touching the code. The dashboard goes from seven slow Logs Insights queries to seven fast metric graphs. Alarms that were impossible to write become one-line put-metric-alarm calls. The logs still exist for the deep-investigation cases; the metrics exist for the steady-state operational view. Same data, two shapes, each at the layer that fits.

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.