Search code examples
prometheusprometheus-pushgateway

How to set a retention time for Pushgateway for metrics to expire?


I'm using Pushgateway with Prometheus and everything is OK but after a couple of weeks Pushgateway collapses ... giving it a look there are tons of metrics that are not used anymore and delete them manually is practically impossible ... so ->

There is a way to expire Pushgateway metrics with a TTL or some other retention settings like by size or by time ? ... or maybe both ?

NOTE: I read at the mailing list of Prometheus a lot of people requiring something like this from one year ago or more ... and the only answer so far is -> this is not the Promethean way to do it ... really ? ... common, if this is a real pain for a lot of people maybe there should be a better way (even if it's not the Promethean way)


Solution

  • Supposing you want to remove the metrics related to a group when they become too old (for a given definition of too old), you have the metric push_time_seconds which is automatically defined by the pushgateway.

    push_time_seconds{instance="foo",job="bar",try="longtime"} 1.598280005888635e+09
    

    With this information, you can write a script that request/grab this metric and identify the old group of data ({instance="foo",job="bar",try="longtime"}) with the value. The API let you remove of metrics related to your old data:

     curl -X DELETE http://pushgateway:9091/metrics/job/bar/instance/foo/try/longtime
    

    This can be done in a few lines of bash script or python.