How to understand envoy latency using `upstream_rq_time` metric?

I'm trying to understand how Envoy upstream_rq_time works. Essentially, I can see the following metrics in Grafana:

upstream_rq_time.p25
upstream_rq_time.p50
upstream_rq_time.p95

When querying any of the above metrics in Grafana the result is a not a single number but rather a data series. What is confusing is that when I choose histogram graph in Grafana the histograms looks almost identical for each type (e.g. p25/50/95) which I don't understand. In addition I don't understand how to get the p25/50/95 latency for the last hour? How would I get the such value using Prometheus?

Below are the screenshots:

upstream_rq_time.p25:

upstream_rq_time.p50:

upstream_rq_time.p95:

Solution

The metric upstream_rq_time measures the time (in milliseconds) elapsed from the point where the caller's entire request has been received by the Envoy HTTP router filter until the entire upstream response from the cluster has been received (source).

It is an histogram, as the doc states, and as you can see in Prometheus metrics (# TYPE envoy_cluster_external_upstream_rq_time histogram).

To handle histograms in Prometheus, you can use quantiles. The histogram_quantile function is probably what you are looking for.

To have your 25th percentile, 50th percentile and 95th percentile, you can use the following Prometheus queries:

histogram_quantile(0.25, sum(rate(envoy_cluster_upstream_rq_time_bucket[1h])) by (le))
histogram_quantile(0.5, sum(rate(envoy_cluster_upstream_rq_time_bucket[1h])) by (le))
histogram_quantile(0.95, sum(rate(envoy_cluster_upstream_rq_time_bucket[1h])) by (le))

By default, the buckets used by Envoy are set in the code:

const ConstSupportedBuckets& HistogramSettingsImpl::defaultBuckets() {
  CONSTRUCT_ON_FIRST_USE(ConstSupportedBuckets,
                         {0.5, 1, 5, 10, 25, 50, 100, 250, 500, 1000, 2500, 5000, 10000, 30000,
                          60000, 300000, 600000, 1800000, 3600000});
}

That means that you can know how many queries took less than 500 ms for example (they are in the bucket le=500), but you can't really know how many queries took less than 300 ms or 499 ms. You can configure the buckets though by changing histogram_bucket_settings.

Also, you must know that the histogram_quantile() function interpolates quantile values by assuming a linear distribution within a bucket.