Search code examples
prometheusprometheus-alertmanager

Prometheus rule to fire an alert if the condition sustains for over a period of time


I've this metric, at its working state or intended state looks like below (both are acceptable state)

 1. flight_api_calls_to_mq_total{status="", region="EMEA"} 40.0  
 2. flight_api_calls_to_mq_total{status="arrived", region="US"} 10.0

I wanted to kick off an alert when this metric having label - status changes to transit and continues to be in the same state for more than an hour flight_api_calls_to_mq_total{status="transit" ,region="US"} 20.0

I can think of something like this - increase(flight_api_calls_to_mq_total{status="transit"}[1h]) > 50 . Would like to hear whether this is the right approach or can it be done better ? Thanks.


Solution

  • You can use alert rule with expression returning all transit metrics, and for field set to one hour.

      - alert: TransitTakesTooLong
        expr: flight_api_calls_to_mq_total{status="transit"}
        for: 1h
        labels:
          severity: page
        annotations:
          summary: Transit takes more than 1h to finish
    

    This will create alert, if metrics is returned for one hour by said query.