Search code examples
prometheusprometheus-alertmanager

How to inhibit alerts outside business hours with prometheus alertmanager?


Our application relies on data sources that are only active during business hours. We have alerts setup in Prometheus to notify us when streams dry up. However, we don't want to get "false" alerts outside business hours.

I followed this post to setup a "fake alert" that triggers outside business hours and is supposed to inhibit all other alerts.

The setup looks like below. In prometheus:

rules:

# This special alert will be used to inhibit all other alerts outside business hours
- alert: QuietHours
  expr: day_of_week() == 6 or day_of_week() == 0 or europe_amsterdam_hour >= 18 or europe_amsterdam_hour <= 7
  for: 1m
  labels:
    notification: page
    severity: critical
  annotations:
    description: 'This alert fires during quiet hours. It should be blackholed by Alertmanager.'

The europe_amsterdam_hour function is defined as a rule and left out of this sample for conciseness.

In alertmanager:

routes:
# ensure to forward to blackhole receiver during quiet hours
- match:
    alertname: QuietHours
  receiver: blackhole

inhibit_rules:
- source_match:
    alertname: QuietHours
  target_match_re:
    alertname: '[^(QuietHours)]'

I verified that the logic for triggering the QuietHours alert is working. It is nicely triggered after business hours and resolves during business hours. However, the inhibition part doesn't seem to work because I still receive other alerts as well when QuietHours is active. I cannot find a good resource with a detailed explanation on the inhibition config.

Any ideas what I am doing wrong?


Solution

  • The issue is with your target re, the syntax is not correct. There is no need to exclude QuietHours as explained in inhibit_rule documentation.

    To prevent an alert from inhibiting itself, an alert that matches both the target and the source side of a rule cannot be inhibited by alerts for which the same is true (including itself).

    The regex should simply match the alerts related to your data sources.

    It is easier to add a label to identify the alerts related to the sources to inhibit and use it rather than using the alertname.

    inhibit_rules:
    - source_match:
        alertname: QuietHours
      target_match:
        component: 'data_source'
    

    That way, any new alert related to the source will be inhibited.