I've this metric, at its working state or intended state looks like below (both are acceptable state)
1. flight_api_calls_to_mq_total{status="", region="EMEA"} 40.0
2. flight_api_calls_to_mq_total{status="arrived", region="US"} 10.0
I wanted to kick off an alert when this metric
having label - status
changes to transit
and continues to be in the same state for more than an hour flight_api_calls_to_mq_total{status="transit" ,region="US"} 20.0
I can think of something like this - increase(flight_api_calls_to_mq_total{status="transit"}[1h]) > 50
.
Would like to hear whether this is the right approach or can it be done better ? Thanks.
You can use alert rule with expression returning all transit metrics, and for
field set to one hour.
- alert: TransitTakesTooLong
expr: flight_api_calls_to_mq_total{status="transit"}
for: 1h
labels:
severity: page
annotations:
summary: Transit takes more than 1h to finish
This will create alert, if metrics is returned for one hour by said query.