Search code examples
amazon-web-servicesamazon-cloudwatchcloudwatch-alarms

AWS CloudWatch auto-reset (OK) alarm on trigger


I am deploying an AutoScalingGroup with AutoScalingPolicies (ScaleUp & ScaleDown) triggered by CloudWatch Alarm (CPU > 70%, CPU < 10%).

AutoScaling is working great but...Once the AutoScalingGroup has reached the minimal number of instance (2), the CPU < 10% alarm stays in ALARM STATE for hours...days...without resetting to OK STATE.

Because the CPUs utilization stay under 10%, I know it is normal the alarm never back to OK STATE.

I know it exists some AlarmActions like:

arn:aws:automate:${AWS::Region}:ec2:recover (for EC2)

I searched for similar Cloudwatch actions, did not find anything.

I have a custom solution: using a Lambda to change the Alarm State to OK but I would like to know if a smarter/easier solution exists.

Does anybody know how to do that?

Thanks.


Solution

  • Sounds like what you need is the ability to aggregate alarms with an AND clause. Alarm if CPU < 10% AND instance_count > 2. Unfortunately CloudWatch does not allow you to combine alarms like that directly.

    The current solution to this problem is to use Metric Math to create a metric that meets your criteria and then alarm on that.

    https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Create-alarm-on-metric-math-expression.html

    Here's the list of available functions:

    https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/using-metric-math.html#metric-math-syntax

    You will have to work out the Math to see if this is possible for you.

    CPU+10+(-10*CEIL((instance_count-2)/<MAX_ALLOWED_INSTANCE_COUNT>))