Search code examples
prometheusgrafana

Grafana / Prometheus : Detect Time of Spike Back to Zero


I have a pattern of load I am monitoring in Grafana / Prometheus, that looks like this: enter image description here

What I want to do is to track the time of day at which the spike comes back down to zero.

The spike typically does not reoccur within a 24 hour period. Can you suggest how this data point could be captured in this scenario?

Looking to trigger an alarm if the zero point doesn't occur before a given time of day, as well as track the zero time day over day historically.


Solution

  • You can get timestamp of point in time as you described with query like this:

    timestamp(my_metric == 0) 
     and my_metric offset 1m > 0
     and max_over_time(my_metric[8h])>500000
    

    It checks, that current value is 0. One minute ago value bigger then 0. And over last 8 hours value was more than 500.000.

    If all conditions are met, timestamp is returned. Otherwise, nothing is returned.

    Be sure to change 1m used in offset, to your evaluation evaluation_interval.

    If you'll create recording rule with such expression, recorded metric will have values only when conditions are met. If you want to use it's value over extended period of time (for example on the panel), you'll need to use query like this:

    last_over_time(last_spike:my_metric:timestamp[1w])
    

    I don't now clear way to convert timestamp to number of seconds since last midnight. Most likely simple timestamp(my_metric == 0) % (24*60*60) and ... should work, but I can't guarantee that there are no nuances regarding time in that case.

    If it works, logic about need to use last_over_time still stands.