Is there a way to create an alert for "If upsBasicOutputStatus != 2 AND (upsBasicOutputStatus != 4 over the last 1 minute)" within promenteus alerts
I have a Promentheus instance scraping SNMP data from a range of UPS's, as part of this i have also setup alerting in promentheus to alert the moment a UPS State changes to "On Battery", we want this to alert to moment it happens rather then wait for another scrape to occure
upsBasicOutputStatus != 2
Sadly this has the side effect of alerting when a self test takes place every two weeks. Adding the exclusion to the expresion was simple
upsBasicOutputStatus != 2 and upsAdvTestDiagnosticsResults != 4
This works some of the time, sadly it seems that the On Battery status last longer then the "Test in Progress" status so an alert is fired when the test ends but the UPS is still on battery
I would rather not extent the for: as that would delay an actual alert going out and although we have PCNS system inplace to shutdown racks, in my experiance having someone on hand for the critical systems is needed for just in case it fails, which has happened
Full alert rule
- alert: UPSState
expr: upsBasicOutputStatus != 2 and upsAdvTestDiagnosticsResults != 4 #Notonline and not in self test
labels:
severity: "critical"
annotations:
summery: "UPS {{ $labels.instance }} is no longer online"
description: "UPS has entered the state {{ $value }}"
dashboard: "d/FBsdas/?orgId=1&refresh=10s&var-datasource={{ $labels.source }}&var-ups={{ $labels.instance }}"
After trying the suggested rule form @markalex the unless_over_time seems the shift the data points later but ~5 seconds which then triggers the alert
You can modify you expression in two steps:
and metric != 4
with unless metric == 4
,last_over_time
Additionally, since selector inside of last_over_time
is not simple vector selector you need to use subquery syntax.
upsBasicOutputStatus != 2 unless last_over_time((upsAdvTestDiagnosticsResults == 4)[1m:])