Search code examples
prometheusgrafanapromql

Prometheus graph for counter over time


I've been looking into Prometheus and Grafana for the the past couple of days and have some trouble wrapping my head around how it works exactly.

What I have at the moment is a counter which increases over time (i.e. the counter can go only up). I am trying to do a linear graph to show how this value increases over specified time range.

For this I am using the following query:

sum(increase(my_metric[$__range]))

I have also set up the type to both to show Range and Instant in the query options.

What I don't understand is that if I set a large range (i.e. from the moment I was taking this metric, which in this case is 4 months) the graph looks fine to me:

Counter over 4 months

However, if I set a short time range (i.e. 24 hours) the line fluctuates:

enter image description here

So my question is do I have the right approach for what I am trying to do?
Also why does the short time-range fluctuates when the counter goes only up?
My assumption is that the increase() function does some estimates which result in that line. But in this case does it mean that anyway the "Instant" value (green dot), which is 24 means that in that period of time the counter increased by 24?


Solution

  • Counter 101

    why does the short time-range fluctuates when the counter goes only up?

    Counter indeed only goes up. But this is of little use on its own - if a value only goes up why do you care about exact value? Today it's 1, tomorrow it's 42, and in a week it's 100500. What we really want is to know how quickly a counter is increasing over time (if it's of value 42 afer the first day, why is it 100500 in just a week and not near 294?).

    If you really care about the actual value the Gauge should be used, which also might go down.

    increase()

    The increase() function in PromQL does exactly what you want to do with counters - it takes the history of metrics over a time frame and calculates how fast value is increasing.

    Could a counter be increased by 10 during the first minute, and by 3 during the next minute? Sure, that's why increase graph goes up and down ("fluctuates"). When you set a small range interval like on the 2nd graph it's visible, when you set a large range interval like on the 1st graph - it's not (how much my_metric was increased over 4 month period at 08/14?, at 08/20?, .., at 11/30? always bigger and bigger since the range is a huge).

    The details how exactly it's calculated and nuances (such as "why increase might return floating point number") are already described, f.e. see the answers at this post.

    Aggregations

    So my question is do I have the right approach for what I am trying to do?

    Since you don't use labels, the query might be simplified:

    sum(increase(my_metric[$__range])) ---> increase(my_metric[$__range]).

    sum as an aggregation operator is especially useful with without and by optional clauses.

    F.e. if you have my_metric being reported by several machines, labeled with instance label and you want the aggregate it away in a query (i.e. you don't care about each instance results but want them as a whole), you might want to use:

    sum without (instance) (increase(my_metric[$__range]))