I need to join two promql queries to gain their results but faced some troubles. We have two metrics:
I have two queries for these metrics:
endpoints with highest number of requests
topk(40,
sum(http_server_requests_total{app=~"web_api_service", response_status_code=~"2..|4..|5.."})
by (app, path, method, response_status_code)
)
endpoints with highest execution time
histogram_quantile(
0.95,
sum by (le, app, path, method, response_status_code)
(rate(http_server_request_duration_seconds_bucket{
app=~"web_api_service",response_status_code=~"2..|4..|5.."
}[$__rate_interval]))
)
Now I want to combine results of these two queries to examine slowest endpoints with highest number of requests. I've tried several methods from this article to join queries:
and
between two queries gives me number of requests from the first query as a Value (I run these queries in Grafana Explore tab since we have no direct access to Prometheus server).+
or + on(app, path, method, response_status_code)
or + on(app, path, method, response_status_code) group_left
. Also I get only number of requests as Value (and NaN also).+ on(app, path, method, response_status_code, le) group_right
returns slightly different results. But still there are no float le
values of request duration from the second metric.My questions are:
ORDER BY first_metric, second_metric DESC
in the promql?So you need to combine the results of two PromQL queries into one, f.e. topk aggregation and histogram_quantile() function, to get something like "a latency of N most frequent requests".
The right way to combine metrics in PromQL is vector matching, that might be one-to-one or many-to-one depending on labels matching.
The first query returns N most frequent requests:
topk(40, sum(http_server_requests_total{}) by(app, path, method, response_status_code))
The second query returns a latency:
histogram_quantile(
0.95,
sum(rate(http_server_request_duration_seconds_bucket{}[1m])) by(le, app, path, method, response_status_code)
)
But how to combine them? You have 2 vectors with equal labels (due to by() clause on the same labels) so it's one-to-one vector matching.
The result value (95p latency) is provided by the 2nd query, so the trick here is to discard the 1st value. You could achieve this by making the 1st value equal to 1
and by multiplying the 1
to the 2nd value. How to make it a 1
? Any number in a power of 0
returns 1
, and Prometheus does support arithmetic operations:
topk(40, sum(http_server_requests_total{}) by(app, path, method, response_status_code)) ^ 0
*
histogram_quantile(
0.95,
sum(rate(http_server_request_duration_seconds_bucket{}[1m])) by(le, app, path, method, response_status_code)
)
To get results sorted you could use one of the sorting queries either on final or one of sub-results.
topK returns already sorted results in descending order, so you could just sort the final latency vector:
sort_desc(
topk(40, sum(http_server_requests_total{}) by(app, path, method,
response_status_code)) ^ 0
*
histogram_quantile(
0.95,
sum(rate(http_server_request_duration_seconds_bucket{}[1m])) by(le, app,
path, method, response_status_code)
)
)