Search code examples
kubernetesprometheusprometheus-alertmanagerprometheus-operator

How to silence Prometheus Alertmanager using config files?


I'm using the official stable/prometheus-operator chart do deploy Prometheus with helm.

It's working good so far, except for the annoying CPUThrottlingHigh alert that is firing for many pods (including the own Prometheus' config-reloaders containers). This alert is currently under discussion, and I want to silence its notifications for now.

The Alertmanager has a silence feature, but it is web-based:

Silences are a straightforward way to simply mute alerts for a given time. Silences are configured in the web interface of the Alertmanager.

There is a way to mute notifications from CPUThrottlingHigh using a config file?


Solution

  • Well, I managed it to work by configuring a hackish inhibit_rule:

    inhibit_rules:
    - target_match:
         alertname: 'CPUThrottlingHigh'
      source_match:
         alertname: 'DeadMansSwitch'
      equal: ['prometheus']
    

    The DeadMansSwitch is, by design, an "always firing" alert shipped with prometheus-operator, and the prometheus label is a common label for all alerts, so the CPUThrottlingHigh ends up inhibited forever. It stinks, but works.

    Pros:

    • This can be done via the config file (using the alertmanager.config helm parameter).
    • The CPUThrottlingHigh alert is still present on Prometheus for analysis.
    • The CPUThrottlingHigh alert only shows up in the Alertmanager UI if the "Inhibited" box is checked.
    • No annoying notifications on my receivers.

    Cons:

    • Any changes in DeadMansSwitch or the prometheus label design will break this (which only implies the alerts firing again).

    Update: My Cons became real...

    The DeadMansSwitch altertname just changed in the stable/prometheus-operator 4.0.0. If using this version (or above), the new alertname is Watchdog.