Search code examples
prometheusprometheus-alertmanagerserver-monitoring

Verify certain metric on different instances in Prometheus alert rules


I have multiple targets in prometheus which generate multiple metrics. I need to verify the values generated by a certain metric on multiple instances and trigger an alert incase the values are not equal to each other.

metric_name: treds_load_peer_db_doc_cnt

values log:

treds_load_peer_db_doc_cnt{instance="com.peer0",ip="192.168.191.2",job="prod"} 2136589 treds_load_peer_db_doc_cnt{instance="com.peer1",ip="10.121.81.38",job="prod"} 2136590 treds_load_peer_db_doc_cnt{instance="com.peer2",ip="10.121.1.57",job="prod"} 2136590

here's the query i'm using currently: treds_load_peer_db_doc_cnt{instance="com.peer0"} != ignoring(instance,ip) treds_load_peer_db_doc_cnt{instance="com.peer1"}

which works out but messes up all the labels. Is there a way to check metric in all targets at once & alert in case of miss-match?


Solution

  • I'd do something like:

    max without(instance,ip)(treds_load_peer_db_doc_cnt) != min without(instance,ip)(treds_load_peer_db_doc_cnt)

    which will generate an alert if they aren't all the same.