I have some some metrics like this:
restarts{service="foo-1"}
restarts{service="foo-2"}
restarts{service="bar-1"}
restarts{service="bar-2"}
restarts{service="bar-3"}
I'm trying to use Alertmanager to trigger an alert when count of restarts of all instances of a service is more than a threshold.
The thing comes to my mind is to create a rule for each foo
and bar
separately using a query like this:
sum(restarts{service=~"bar-.*"}) > 10
But my services are too many to write a rule for each of them.
Is there any way to find restarts of each service in a single query?
You could use label_replace
in your query like:
label_replace(restarts, "servicegroup", "$1", "service", "(.+)-.+")
Then you can group the results with sum by (servicegroup)
and get what you want.