Search code examples
prometheus-alertmanager

Prometheus alert if two expressions are true


I'm trying to create an alert in Prometheus that should notify me if two hosts are unavailable. I have two metrics:

probe_success{instance="host1"}
probe_success{instance="host2"} 

that returns 1 if host availiable and 0 if host unavailable. How I can combine this two metrics in one expression?

I tried using on() probe_success{instance="host1"} == 0 and on() probe_success{instance="host2"} == 0, it works but it only returns the first expression so the email only has labels from host1. It looks like the problem is with only one host.

Also tried sum(probe_success{ instance=~"host1|host2"}) == 0, but here I have no labels at all.

Is there some posibility to create expression with logic like

if((probe_success{instance="host1"}==0) and (probe_success{instance="host2"}==0)){
       notifyMe("host1","host2")
    }

Solution

  • In prometheus, you can create a rule that will alert on each host that is unavailable if none of of them is up.

    - alert: MissingAllHosts
      expr: (probe_success == 0) UNLESS ON() ( sum(probe_success) != 0 )
      ...
    

    Then use alertmanager's grouping for sending a single notification

    routes:
      - match:
          alertname: 'MissingAllHosts'
        group_by: ['alertname']
        group_wait: 1m
        group_interval: 10s
    

    The resulting alert should contain the list of hosts that were expected.