Search code examples
prometheusmetricspromqlgrafana-loki

Which PromQL function do I use to find the highest and lowest response time attained in a given time interval


I would like to visualize the highest response time as well as the lowest response attained by an API in the last x minutes but I am not sure which PromQL I should be using exactly.

Currently I can scrape metrics and visualize the average response time (minute by minute) using :

sum(rate(request_duration_seconds_sum[1m]))/sum(rate(request_duration_seconds_count[1m]))

and the corresponding StatPanel :

avg

Now I want to design a similar StatPanel that will show the highest response time that would have been recorded within the last minute e.g if the scrape for the last minute has 7ms, 92ms, 6ms, 50ms then I want a panel that will show the highest response time attained i.e 92ms. Conversely the lowest response time Stat panel should show 7ms.

In my client instrumentation I have configured both a counter and a gauge as below :

public MetricReporter(ILogger<MetricReporter> logger)
{
    _logger = logger ?? throw new ArgumentNullException(nameof(logger));

    _requestCounter = Metrics.CreateCounter("total_requests", "The total number of requests serviced by this API.");
    _requestGauge = Metrics.CreateGauge("total_requests_gauge", "The total number of requests serviced by this API.");

    _responseTimeHistogram = Metrics.CreateHistogram("request_duration_seconds",
        "The duration in seconds between the response to a request.", new HistogramConfiguration
        {
            Buckets = new[] { 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10  },
            LabelNames = new[] { "status_code", "method" , "path"}
        });
}

For this use case I cannot seem to find a working example that shows these calculations when using a counter.

I have tried using max_over_time with subquery as given here but from my research I gather that the resultant calculation will be inaccurate (see comment here).

As per Prometheus documentation functions min_over_time(), max_over_time(), avg_over_time(), ets makes sense to use only with gauge metrics.

Should I be using a gauge instead and if so how ?

What am I missing?

UPDATE

I have added a new panel that uses the histogram quantiles below but the resulting values are not correct ( I made requests in a 1 minute interval and I had a max of 25ms (1st request) and another random one at 3ms) :

histogram_quantile(1, increase(request_duration_seconds_bucket[1m]))

and this

histogram_quantile(0, increase(request_duration_seconds_bucket[1m]))

quantile


Solution

  • Instead of the stat panel, if you use the graph panel in Grafana then min, max, avg values are out of the box functionality. You can see them at the bottom right hand corner.

    **update: adding queries and screenshot

    Here is my query:

    rate(http_server_requests_seconds_sum{job="",method="",namespace="",uri=""}[5m])
    /
    rate(http_server_requests_seconds_count{job="",method="",namespace="",uri=""}[5m])
    

    Please see below: enter image description here

    to enable the min/max values in the legend - while editing the graph check the legend properties as shown below:

    enter image description here