Search code examples
prometheuspromqlmicrometer

PromQL: What is rate() function meant for?


I have a question regarding PromQL and its query functions rate() and how to use it properly. In my application, I have a thread running, and I use Micrometer's Timer to monitor the thread's runtime. Using Timer gives you a counter with suffix _count and another counter with the sum of the seconds spent with suffix _sum. E.g. my_metric_sum and my_metric_count.

My raw data looks like this (scrape interval 30 s, range vector 5m):

enter image description here

Now according to the docs, https://prometheus.io/docs/prometheus/latest/querying/functions/#rate calculates the per-second average rate of increase of the time series in the range vector (which is 5m here).

Now my question is: why would I want that? The relative change of my execution runtime seems pretty useless to me. In fact, just using sum/count looks more useful as it gives me the avg absolute duration for each moment in time. At the same time, and this is what confused me, in the docs I find

To calculate the average request duration during the last 5 minutes from a histogram or summary called http_request_duration_seconds, use the following expression:

rate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m])

Source: https://prometheus.io/docs/practices/histograms/

But as I understand the docs, it looks like this expression would calculate the per-second average rate of increase of the request duration, ie not how long a request takes on average, but instead how much the request duration has changed on average in the last 5 minutes.


Solution

  • The rate(m[d]) function calculates the increase of a counter metric m over the given lookbehind window d in square brackets and then divides the increase by d. The calculation is performed independently per each matching time series m. For example, suppose there are http_requests_total metrics with url label:

    http_requests_total{url="/foo"}
    http_requests_total{url="/bar"}
    

    If they have the following values at time t0:

    http_requests_total{url="/foo"} 123
    http_requests_total{url="/bar"} 456
    

    ... and the following values at time t0 + 5 minutes:

    http_requests_total{url="/foo"} 345
    http_requests_total{url="/bar"} 789
    

    Then rate(http_requests_total[5m]) at time t0 + 5 minutes is calculated in the following way:

    1. To calculate increase for these metrics between t0 and t0 + 5 minutes:
    increase(http_requests_total{url="/foo"}[5m]) = 345 - 123 = 222
    increase(http_requests_total{url="/bar"}[5m]) = 789 - 456 = 333
    
    1. To divide the calculated increase by 5 minutes expressed in seconds (5*60s = 300s):
    rate(http_requests_total{url="/foo"}[5m]) = 222 / 300 = 0.74
    rate(http_requests_total{url="/bar"}[5m]) = 333 / 300 = 1.11
    

    So the end result of rate(http_requests_total[5m]) is a per-second average rps for the last 5 minutes, which is calculated individually per each time series with http_requests_total name.

    A few notes:

    • Both rate() and increase() properly handle e.g. counter resets, when the counter is reset to zero.

    • Sometimes Prometheus can return unexpected results from rate() and increase() because of the chosen data model. See this issue. This issue is addressed in VictoriaMetrics - Prometheus-like monitoring system I work on - see this comment and this article.

    • Some PromQL-compatible query engines such as MetricsQL allow skipping the lookbehind window in square brackets when using rate() function, so rate(http_requests_total) is a valid MetricsQL query. In this case it automatically adds [$__interval] lookbehind window before query execution. See these docs for more details.