Search code examples
amazon-cloudwatchamazon-kinesiscloudwatch-alarms

Cloudwatch Alarm Based on A Kinesis Metric not Triggered When Value is Over The Threshold


Issue Description

We have a AWS Cloudwatch alarm that's very clearly gone over the threshold line indicated in the metric graph that's being monitored but it didn't trigger.

enter image description here

What is going on here? How can an alarm clearly go over the threshold for way longer than it's period and evaluation time and not trigger?

Alarm Configuration and Empty History

enter image description here


Solution

  • If we look at the settings for the alarm there's two very interesting things of note.

    enter image description here

    The first interesting thing is that the alarm is in the Insufficient Data state for a continuous line graph.

    The second is that the alarm is configured for seconds as the unit and the above graph shows milliseconds. And in fact if we list a set of metrics for the iterator age

    aws cloudwatch get-metric-statistics --namespace "AWS/Lambda" --metric-name "IteratorAge" --dimensions Name=FunctionName,Value=prod-pipeline-rules-exec --statistics Maximum --start-time $(gdate -u -d '20 minutes ago' +%Y-%m-%dT%TZ) --end-time $(gdate -u +%Y-%m-%dT%TZ) --period 60 --region <region>
        [
            {
                "Timestamp": "2019-12-18T01:43:00Z",
                "Maximum": 2327.0,
                "Unit": "Milliseconds"
            },
            {
                "Timestamp": "2019-12-18T01:25:00Z",
                "Maximum": 2188.0,
                "Unit": "Milliseconds"
            },
            {
                "Timestamp": "2019-12-18T01:34:00Z",
                "Maximum": 2459.0,
                "Unit": "Milliseconds"
            }
        ]
    

    The units are in Milliseconds.

    Unfortunately, Cloudwatch will treat unit mismatches as missing data and this will lead to your alarms never triggering.