Search code examples
prometheuspromql

Sum the number of seconds the value has been in prometheus query language


I have metric data being pulled from telegraf to prometheus, and built a dashbboard with prometheus metric. I am trying to find the query which would give me downtime percentage. The formula that I use is Downtime percentage = (No. of seconds the status has been success/Total no of seconds in a day)*100

My metric data looks something like below, Query: test_jobevent_status{logname="123_abc",instance="job123"} output: 0-success or 1-failure

So, downtime percentage is the number of seconds test_jobevent_status is 2. Scrape interval that we have is 15s. So, it would be okay to consider the last state at any second within those 15 secs.

Could someone please help me out in writing a query to find out the sum of seconds(or mins) when the jobevent's status was in failing state?

FWIW, summarize, sumSeries and group were helpful in doing the same in graphite. But not sure what should be helpful in getting the same in prometheus.


Solution

  • Try the following query:

    100-100*avg_over_time(test_jobevent_status{logname="123_abc",instance="job123"}[1d])