Search code examples
prometheusgrafanapromql

Is there a possibility to execute a range function over an aggregation in PromQL?


Usually, having a range function and an aggregation in PromQL is constructed like this:

sum by(label) (increase(metric{label="label"}[1h]))

with sum being the aggregator and increase being the range function.

However, the metric that I'm trying to query only returns 1 at different times with different label values. It's basically an ever-increasing counter. The (stacked) graph in Grafana without any functions applied to it looks like this: graph So, applying a range function on this metric is useless, because it never changes (remains 1). I would like to sum by(label) first, and then execute increase over the result. It would look like something along the lines of this:

increase((sum by(label) (metric{label=~".*"}))[1h])

I tried doing this, naturally, looking at the documentation and experimenting with the syntax, to no avail. I even tried to use two seperate queries, which also wasn't very successful. I also tried the new query 'Builder' in Grafana 9 to see if that's possible, without success...

So, does anybody have a suggestion on how to apply a range function to an aggregation by(label)? Unfortunately, I can't directly change the data in Prometheus, and have to rely on the result of the query.


Solution

  • Prometheus Subqueries

    It would look like something along the lines of this: increase((sum by(label) (metric{label=~".*"}))[1h])

    The problem with this query is that you try to pass an instant vector to a function that accepts a range vector (see increase and range vs. instant vectors).

    To get a range vector you need to make several queries, which is possible with either recording rules or subqueries:

    increase((sum by(label) (metric{label=~".*"}))[1h:])

    Count Aggregation

    It's simple, the sum aggregation is used, but since you don't care about the value (always 1), consider using count (see aggregation-operators).

    Wrong Data Model

    Obviously, there is a problem with your data, that won't let you properly apply the counter functions (such as increase).

    You have:

    12:00
    {label="blue"} 1
    
    12:30
    {label="blue"} 1
    {label="red"} 1
    
    13:00
    {label="blue"} 1
    {label="red"} 1
    {label="green"} 1
    

    While it should be:

    12:00
    {label="blue"} 1
    
    12:30
    {label="blue"} 2
    {label="red"} 1
    
    13:00
    {label="blue"} 3
    {label="red"} 2
    {label="green"} 1
    

    The current approach has many drawbacks, only a few of them:

    • waste of memory resources (N metrics with a value instead of a metric with value N)
    • waste of computation resources (additional query to "get a counter")
    • the result is not resistant to node restarts, since aggregation is applied before the counter function (increase) (see "rate then sum")