Search code examples
prometheusprometheus-alertmanager

create a Prometheus alert that flips on and off every 1 minute


I would like to create a Prometheus alert that sends a firing alert every minute and then resolves itself and sends a resolved alert. What i am instead seeing is that the alert stays firing instead of ever becoming resolved.

This is the rule file:

groups:
- name: example
  rules:
  - alert: 'flipping rule'
    expr: minute() % 2
    for: 30s

# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - 192.168.8.158:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "prom-rule.yaml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]
    relabel_configs:
      - source_labels: [branch]
        regex: HEAD
        action: drop
  - job_name: "nginx-exporter"
    static_configs:
      - targets: ["192.168.8.158:9113"]
  - job_name: "node-exporter"
    static_configs:
      - targets: ["localhost:9100"]
    metric_relabel_configs:
      - regex: 'node_arp_entries'
        source_labels: [__name__]
        action: keep
      - regex: 'node_boot_time_seconds'
        source_labels: [__name__]
        action: keep
  - job_name: "cadvior"
    static_configs:
      - targets: ["localhost:9999"]

These photos show that the alert just stays active instead of flipping up and down every minute like I would expectenter image description hereenter image description hereenter image description here


Solution

  • Adding an explicit threshold to the expression for your rule should solve the issue, like this:

    expr: minute() % 2 == 0