Search code examples
prometheusprometheus-alertmanagerpromql

Use variable inside Alertmanager Promql query


I have some some metrics like this:

restarts{service="foo-1"}
restarts{service="foo-2"}
restarts{service="bar-1"}
restarts{service="bar-2"}
restarts{service="bar-3"}

I'm trying to use Alertmanager to trigger an alert when count of restarts of all instances of a service is more than a threshold.

The thing comes to my mind is to create a rule for each foo and bar separately using a query like this:

sum(restarts{service=~"bar-.*"}) > 10

But my services are too many to write a rule for each of them.

Is there any way to find restarts of each service in a single query?


Solution

  • You could use label_replace in your query like:

    label_replace(restarts, "servicegroup", "$1", "service", "(.+)-.+")

    Then you can group the results with sum by (servicegroup) and get what you want.