Search code examples
prometheusgrafanaalarm

Grafana to alert if Prometheus/Pushgateway have old data


I have a few clients that push their metrics to Pushgateway, which then gets scraped by Prometheus. Finally I use Grafana for dashboards - not a too exotic setup I guess.

What puzzles me is when one of the clients stops working and no longer pushes it's metrics, the Pushgateway will further provide the last values it received to Prometheus, and Grafana will happily display a horizontal line.

However I'd prefer receiving an alarm if the metrics are too old. How to accomplish that?


Solution

  • Prometheus provides the current time with time(), which provides the seconds since January 1, 1970 UTC. The Pushgateway keeps a metric for every job: push_time_seconds, which shows the time of the last push in seconds since January 1, 1970 UTC.

    So the query

    time() - push_time_seconds
    

    will show you the age in seconds for every exported_job you have. Now it is easy to further filter and alarm if the value exceeds a defined threshold. For jobs expected to run once a day (so their metrics are expected to never get older than 24 hours) I configured the threshold to 25 hours (90000 seconds) in Grafana and it works like a charm.