Search code examples
prometheuspromql

How Prometheus subquery notation works?


I am getting confused about how the subquery notation in prometheus (like [5m:1m]) is aggregating data. I know when I mention range selector as this [5m], it aggregates every data point in each 5 minute window. But when I plot graphs for [5m:1m], it shows entire different set of data and I do not know how it got there. The queries I am using are

sum_over_time(metric_name{labelName="label"}[5m])
sum_over_time(metric_name{labelName="label"}[5m:1m])

How these 2 are different and how do they calculate data?


Solution

  • General use for subqueries

    Subqueries are an inline substitution for recording rules. They behave very similarly, but without actually producing stored metric.

    So expression like sum_over_time( (<some_query>) [5m:1m]) is equivalent to sum_over_time(instance:some_query_result:resolution[5m]) with evaluation_time set to 1m and recording rule

    - record: instance:some_query_result:resolution
      expr:   <some_query>
    

    in place, where <some_query> is any query returning instant vector results.

    Subqueries in promql are often used as a substitution for range selectors over something other than vector selector. For example, you can use max_over_time(my_metric[2m]), but you cannot use max_over_time( (my_metric + my_other_metric) [2m]): you need to use something like max_over_time( (my_metric + my_other_metric) [2m:1s]) instead.

    Of course, subquery can also be used with simple vector selectors. In that case it behaves as a range selector, but with additional resolution.

    Resolution

    Queries with resolution are executed in the following manner: timeframe is divided into blocks of resolution length, and expression inside of subquery is evaluated at the end if each block. Then results of each evaluation are gathered according to range provided.

    So in your example sum_over_time(metric_name{labelName="label"}[5m:1m]): value of metric_name{labelName="label"} will be taken on each minute, and then put into range of 5 minutes length. This will result in 5 samples, one for each minute, being put into range vector, and later being summed up by sum_over_time.

    A couple important notices:

    1. Evaluation is carried out on the "round" position relation to resolution. So if your resolution is 1m evaluation will take place at every :00 second, if 1d - every day at midnight (UTC), and if 17m - every 17 minutes, with alignment point being 0 by epoch time1.
    2. There is no limitation on relation of range and resolution. You are allowed to use query like sum_over_time(up[1h:1d]), and it will produce interesting, but meaningful results.
    3. Some functions like rate, delta and more are expecting at least two samples being present in provided range vector. Thus queries like rate(metric[1m:1m]) will not yield any results, and something like rate(metric[1m:57s]) will result in noncontinuous graph. Demo.

    1 : not sure if it's officially guaranteed, as I haven't seen such in documentation, but this is how Prometheus is factually carrying out resolution (and evaluation of recording rules) and I don't see any reason for this to change in future.