I've been looking into Prometheus and Grafana for the the past couple of days and have some trouble wrapping my head around how it works exactly.
What I have at the moment is a counter which increases over time (i.e. the counter can go only up). I am trying to do a linear graph to show how this value increases over specified time range.
For this I am using the following query:
sum(increase(my_metric[$__range]))
I have also set up the type to both to show Range and Instant in the query options.
What I don't understand is that if I set a large range (i.e. from the moment I was taking this metric, which in this case is 4 months) the graph looks fine to me:
However, if I set a short time range (i.e. 24 hours) the line fluctuates:
So my question is do I have the right approach for what I am trying to do?
Also why does the short time-range fluctuates when the counter goes only up?
My assumption is that the increase()
function does some estimates which result in that line. But in this case does it mean that anyway the "Instant" value (green dot), which is 24 means that in that period of time the counter increased by 24?
why does the short time-range fluctuates when the counter goes only up?
Counter indeed only goes up. But this is of little use on its own - if a value only goes up why do you care about exact value? Today it's 1
, tomorrow it's 42
, and in a week it's 100500
. What we really want is to know how quickly a counter is increasing over time (if it's of value 42
afer the first day, why is it 100500
in just a week and not near 294
?).
If you really care about the actual value the Gauge should be used, which also might go down.
The increase() function in PromQL does exactly what you want to do with counters - it takes the history of metrics over a time frame and calculates how fast value is increasing.
Could a counter be increased by 10
during the first minute, and by 3
during the next minute? Sure, that's why increase
graph goes up and down ("fluctuates").
When you set a small range interval like on the 2nd graph it's visible, when you set a large range interval like on the 1st graph - it's not (how much my_metric
was increased over 4 month period at 08/14?, at 08/20?, .., at 11/30? always bigger and bigger since the range is a huge).
The details how exactly it's calculated and nuances (such as "why increase might return floating point number") are already described, f.e. see the answers at this post.
So my question is do I have the right approach for what I am trying to do?
Since you don't use labels, the query might be simplified:
sum(increase(my_metric[$__range]))
---> increase(my_metric[$__range])
.
sum
as an aggregation operator is especially useful with without
and by
optional clauses.
F.e. if you have my_metric
being reported by several machines, labeled with instance
label and you want the aggregate it away in a query (i.e. you don't care about each instance results but want them as a whole), you might want to use:
sum without (instance) (increase(my_metric[$__range]))