Search code examples
prometheuspromql

How to join two prometheus metrics to get CPU usage per user-defined pod label?


I have these two PromQL queries that I want to join on pod name.

One query gets recent CPU usage per pod

(sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by (pod))
//Format: TIMESTAMP, cpu_usage, pod

The other query filters pods and selects specific labels. I've updated my kube-metrics-server to emit these labels

kube_pod_labels{label_role=~'role1|role2',label_app='identifier1'}
//Format: TIMESTAMP, label_name, pod, ...
// This returns a ton of labels but the only one I care about is pod for joining, and label_name for my final result

How can I join these queries together, so I get something in the format: // label_name, cpu_usage (order here does not matter)

My first attempt was something like this, using this guide (https://ypereirareis.github.io/blog/2020/02/21/how-to-join-prometheus-metrics-by-label-with-promql/)

(sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by (pod)) * on(pod) group_left(label_name) kube_pod_labels{label_role=~'foo|bar',label_app='appvalue'}

This almost works but it returns TIMESTAMP, label_name, pod. How can I force my query to include the cpu usage value from the first query, and discard pod from my query while still doing a join with pod? The goal is to create a recording rule, which defines this new metric from a query on these two existing metrics. The goal is not to create a grafana dashboard though i am using one for testing


Solution

  • PEBCAK. I was misreading the results of my query in Grafana. This query actually does exactly what I want

    (sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by (pod)) * on(pod) group_left(label_name) kube_pod_labels{label_role=~'foo|bar',label_app='appvalue'}