Given a gauge called gin_in_flight_requests
We have two queries in prometheus:
green line:
sum(avg_over_time(gin_in_flight_requests{app="my-service",cluster="prod", url="/api/v1/url1"}[1m]))
yellow line
sum(gin_in_flight_requests{app="my-service",cluster="prod", url="/api/v1/url1"})
The green line has 14:35 a higher peak than every individual point of the sum line but how can it be that the sum of averages over time produce a higher result then the max of the sum itself ?
The graph was made with grafana 9 explore
By default Prometheus wraps time series selectors into last_over_time() rollup function with 5 minutes lookbehind window in square brackets if the time series selector isn't wrapped into any rollup function. So the sum(gin_in_flight_requests{app="my-service",cluster="prod", url="/api/v1/url1"})
query is automatically converted into the following query before execution:
sum(
last_over_time(
gin_in_flight_requests{app="my-service",cluster="prod", url="/api/v1/url1"}[5m]
)
)
See these docs for more details.
E.g. this query takes into account a subset of raw samples, actually the last raw samples just before each point displayed on the graph. It ignores the remaining raw samples. So it may return values smaller than the sum(avg_over_time(...))
query. If you want taking into account all the max raw samples, then use max_over_time function.
P.S. If you want capturing all the raw sample maximums and minimums on the selected time range in Grafana, then just use max_over_time()
and min_over_time()
queries with $__interval
lookbehind window in square brackets:
sum(max_over_time(...[$__interval]))
and
sum(min_over_time(...[$__interval]))
P.P.S. FYI, an alternative Prometheus-like monitoring solution I work on - VictoriaMetrics - provides a rollup function, which simultaneously returns min, max and avg values on the selected time range. E.g. it can be used instead of three queries with min_over_time()
, max_over_time()
and avg_over_time()
functions.