Search code examples
kubernetesprometheusstatsd

Prometheus statsd-exporter - how to tag status code in request duration metric (histogram)


I have setup statsd-exporter to scrape metric from gunicorn web server. My goal is to filter request duration metric only for successful request(non 5xx), however in statsd-exporter there is no way to tag status code in duration metric. Can anyone suggest a way to add status code in request duration metric or a way to filter only successful request duration in prometheus.

In particular I want to extract successful request duration hitogram from statsd-exporter to prometheus.


Solution

  • To export successful request duration histogram metrics from gunicorn web server to prometheus you would need to add this functionality in gunicorn sorcecode.

    First take a look at the code that exports statsd metrics here. You should see this peace of code:

    status = resp.status
    ...
    self.histogram("gunicorn.request.duration", duration_in_ms)
    

    By changing the code to sth like this:

    self.histogram("gunicorn.request.duration.%d" % status, duration_in_ms)
    

    from this moment you will have metrics names exported with status codes like gunicorn_request_duration_200 or gunicorn_request_duration_404 etc.

    You can also modify it a little bit and move status codes to label by adding a configuration like below to your statsd_exporter:

    mappings:
      - match: gunicorn.request.duration.*
        name: "gunicorn_http_request_duration"
        labels:
          status: "$1"
          job: "gunicorn_request_duration"
    

    So your metrics will now look like this:

    # HELP gunicorn_http_request_duration Metric autogenerated by statsd_exporter.
    # TYPE gunicorn_http_request_duration summary
    gunicorn_http_request_duration{job="gunicorn_request_duration",status="200",quantile="0.5"} 2.4610000000000002e-06
    gunicorn_http_request_duration{job="gunicorn_request_duration",status="200",quantile="0.9"} 2.4610000000000002e-06
    gunicorn_http_request_duration{job="gunicorn_request_duration",status="200",quantile="0.99"} 2.4610000000000002e-06
    gunicorn_http_request_duration_sum{job="gunicorn_request_duration",status="200"} 2.4610000000000002e-06
    gunicorn_http_request_duration_count{job="gunicorn_request_duration",status="200"} 1
    gunicorn_http_request_duration{job="gunicorn_request_duration",status="404",quantile="0.5"} 3.056e-06
    gunicorn_http_request_duration{job="gunicorn_request_duration",status="404",quantile="0.9"} 3.056e-06
    gunicorn_http_request_duration{job="gunicorn_request_duration",status="404",quantile="0.99"} 3.056e-06
    gunicorn_http_request_duration_sum{job="gunicorn_request_duration",status="404"} 3.056e-06
    gunicorn_http_request_duration_count{job="gunicorn_request_duration",status="404"} 1
    

    And now to query all metrics except these with 5xx status in prometheus you can run:

    gunicorn_http_request_duration{status=~"[^5].*"}
    

    Let me know if it was helpful.