Trying to get an alert when the GCE VM is in down state by creating Alerting Policy.
Metric: compute.googleapis.com/instance/uptime
Resource : VM instance
And made the configuration that in order to trigger an alert when this condition is absent for 3 minutes.
To simulate this above behavior , I have stopped the VM but it is not triggering an alert , meanwhile data is not visible in graph of the alerting policy
Have attached trigger configuration
None of the metrics are giving reliable alerts when the VM is in stopped state,which are compute.googleapis.com/instance/uptime or uptime of the monitoring agent or cpu utilization metrics until you create alerting poilicy with MQL - Monitoring Query language.
"metrics associated with TERMINATED or DELETED Google Cloud resources are not considered for metric-absence policies. This means you can't use metric-absence policies to test for TERMINATED or DELETED Google Cloud VMs." https://cloud.google.com/monitoring/alerts/types-of-conditions#metric-absence
So as per the above statement we cannot use metic absence policy for stopped vm - As It goes to terminated state after it stopped for sometime.The reason is , it calculates the instance stop time only when it becomes running state again.
But when you configure the same condition with MQL with the same set of metrics , Metric-absence policies works without any issues.
Sample:
Instead of configuring the condition by selecting resource & metric , go to Query Editor and type the below query for getting the alert when the Development environment VM is not in running state for 3 minutes.
fetch gce_instance
| metric 'compute.googleapis.com/instance/uptime'
| filter (metadata.user_labels.env == 'dev')
| group_by 1m, [value_uptime_aggregate: aggregate(value.uptime)]
| every 1m
| absent_for 180s
Not sure this is the bug or not , but this is limitation when we configure the alerting condition in a traditional way and we can resolve this by leveraging MQL.