Search code examples
prometheusprometheus-alertmanager

Is there a way to set alert for multiple ENUM metrics with similar name


I'm trying to handle multiple (around 500) metrics with similar names like:

INSTANCE03{INSTANCE03="Dead"} == 1
INSTANCE05{INSTANCE05="Dead"} == 1
INSTANCE07{INSTANCE07="Dead"} == 1

Each of them is specified as an Enum which shows status like this:

INSTANCE03{INSTANCE03="Dead"} == 1
INSTANCE03{INSTANCE03="Alive"} == 0

Is there a way to make an alert for switching the status from Alive to Dead for all those metrics in some short way e. g. using regex for __name__ value?

Alerting works if I specify one metric instance per line but it's not a clean way for so many metrics.

Below my alert_rules.yml

groups:
 - name: example
   rules:
   - alert: InstanceDown
     expr: INSTANCE03{INSTANCE03="Dead",instance="127.0.0.1:8888",job="prometheus"} == 1
     for: 15s
     annotations:
       summary: "Instance is down."
       description: "Instance down for 15 seconds. Please check mentioned instance."

Solution

  • You could use a labelmap action in metric_relabel_configs to fix up these label and metric names.

    As Alin says though, fixing the source of the metrics is best. A gauge with a 0/1 per instance would be simplest.