I'm trying to create an alert in Prometheus that should notify me if two hosts are unavailable. I have two metrics:
probe_success{instance="host1"}
probe_success{instance="host2"}
that returns 1 if host availiable and 0 if host unavailable. How I can combine this two metrics in one expression?
I tried using on() probe_success{instance="host1"} == 0 and on() probe_success{instance="host2"} == 0
, it works but it only returns the first expression so the email only has labels from host1. It looks like the problem is with only one host.
Also tried sum(probe_success{ instance=~"host1|host2"}) == 0
, but here I have no labels at all.
Is there some posibility to create expression with logic like
if((probe_success{instance="host1"}==0) and (probe_success{instance="host2"}==0)){
notifyMe("host1","host2")
}
In prometheus, you can create a rule that will alert on each host that is unavailable if none of of them is up.
- alert: MissingAllHosts
expr: (probe_success == 0) UNLESS ON() ( sum(probe_success) != 0 )
...
Then use alertmanager's grouping for sending a single notification
routes:
- match:
alertname: 'MissingAllHosts'
group_by: ['alertname']
group_wait: 1m
group_interval: 10s
The resulting alert should contain the list of hosts that were expected.