Search code examples
google-cloud-platformmonitoringstackdriver

GCP incident wont resolve


I have a service which I want to know how many errors he throws. So I've created a metric and an alert based on that metric.

The metric is a counter, and it filters out all the unneeded logs, leaving only the relevant onces.

The alert is using the metric, with an aggregator of type 'count' and aligner of type 'delta' resulting in value '1' when the metric catches any errors. The condition for the alert is to check if the most recent value is above 0.99.

After an incident from that alert has been fired, it just wont close. I went to the summary page and it shows that for some reason the condition is still being met (atleast that is what I understand from the red lines that keeps increasing) even though the errors when thrown last time a few hours ago.

incident summary

In the picture you can see the red lines which indicates the duration of the incident, and below it in the graph you can see three small points where an error was detected. The first one caused the incident to fire.

Any help on how to make the incident resolve? Thanks!


Solution

  • Was able to fix the problem as soon as I set the aggregator to 'sum' instead of 'count'.