I am deploying an AutoScalingGroup with AutoScalingPolicies (ScaleUp & ScaleDown) triggered by CloudWatch Alarm (CPU > 70%, CPU < 10%)
.
AutoScaling is working great but...Once the AutoScalingGroup has reached the minimal number of instance (2), the CPU < 10%
alarm stays in ALARM STATE
for hours...days...without resetting to OK STATE
.
Because the CPUs utilization stay under 10%, I know it is normal the alarm never back to OK STATE
.
I know it exists some AlarmActions
like:
arn:aws:automate:${AWS::Region}:ec2:recover
(for EC2)
I searched for similar Cloudwatch actions, did not find anything.
I have a custom solution: using a Lambda to change the Alarm State to OK
but I would like to know if a smarter/easier solution exists.
Does anybody know how to do that?
Thanks.
Sounds like what you need is the ability to aggregate alarms with an AND clause. Alarm if CPU < 10% AND instance_count > 2. Unfortunately CloudWatch does not allow you to combine alarms like that directly.
The current solution to this problem is to use Metric Math to create a metric that meets your criteria and then alarm on that.
Here's the list of available functions:
You will have to work out the Math to see if this is possible for you.
CPU+10+(-10*CEIL((instance_count-2)/<MAX_ALLOWED_INSTANCE_COUNT>))