amazon-cloudwatch amazon-kinesis cloudwatch-alarms

Cloudwatch Alarm Based on A Kinesis Metric not Triggered When Value is Over The Threshold

Issue Description

We have a AWS Cloudwatch alarm that's very clearly gone over the threshold line indicated in the metric graph that's being monitored but it didn't trigger.

What is going on here? How can an alarm clearly go over the threshold for way longer than it's period and evaluation time and not trigger?

Alarm Configuration and Empty History

Solution

If we look at the settings for the alarm there's two very interesting things of note.

The first interesting thing is that the alarm is in the Insufficient Data state for a continuous line graph.

The second is that the alarm is configured for seconds as the unit and the above graph shows milliseconds. And in fact if we list a set of metrics for the iterator age

aws cloudwatch get-metric-statistics --namespace "AWS/Lambda" --metric-name "IteratorAge" --dimensions Name=FunctionName,Value=prod-pipeline-rules-exec --statistics Maximum --start-time $(gdate -u -d '20 minutes ago' +%Y-%m-%dT%TZ) --end-time $(gdate -u +%Y-%m-%dT%TZ) --period 60 --region <region>
    [
        {
            "Timestamp": "2019-12-18T01:43:00Z",
            "Maximum": 2327.0,
            "Unit": "Milliseconds"
        },
        {
            "Timestamp": "2019-12-18T01:25:00Z",
            "Maximum": 2188.0,
            "Unit": "Milliseconds"
        },
        {
            "Timestamp": "2019-12-18T01:34:00Z",
            "Maximum": 2459.0,
            "Unit": "Milliseconds"
        }
    ]

The units are in Milliseconds.

Unfortunately, Cloudwatch will treat unit mismatches as missing data and this will lead to your alarms never triggering.