Search code examples
kubernetesprometheusgrafana

Collecting labels and query results data from multiple metrics


I am learning and experimenting with Prometheus metrics and Grafana dashboards. The information is coming from a Kubernetes cluster.

I am struggling with figuring out how to collect together information about pods, that is coming from multiple metrics. I believe the metrics involved are all related in some way. They are all in the kube_pod... "family".

Background

I've used a technique like the following that works for a simple metric to metric case:

(metric_A) + on(<common_label>) group_left(<metric_B_label>, ...) (0 * metric_B)

This allows me to associate a label from the right side that is absent from the left side via a common label. There is no arithmetic involved, so the add and multiply really do nothing. The on (or ignoring) operator apparently requires a binary operator between the left and right sides. This seems really clunky, but it does work and I know of no other way to achieve this.

Here's a concrete example:

(kube_pod_status_phase != 0) + on (pod) group_left (node) (0 * kube_pod_info)

The kube_pod_status_phase provides the phase (and, of course, pod) of each pod (Running, Failed, etc.), but does not have the node information. The kube_pod_info has the node label and a matching pod label. Using the query above provides a collection of pods, their current phase and which node they're associated with.

Problem

My current task is to collect the following information:

Node Pod Status Created Age
node_1 pod_A_1 Running mm/dd/yyyy hh:mm:ss {x}d{y}h
node_1 pod_B_1 Running mm/dd/yyyy hh:mm:ss {x}d{y}h
node_2 pod_C_1 Pending mm/dd/yyyy hh:mm:ss {x}d{y}h
node_3 pod_A_2 Running mm/dd/yyyy hh:mm:ss {x}d{y}h
node_3 pod_B_2 Failed mm/dd/yyyy hh:mm:ss {x}d{y}h
... ... ... ... ...

My plan is to get the status (phase) from the kube_pod_status_phase metric, the created date/time and the age from the kube_pod_start_time metric and include the corresponding node from the kube_pod_info metric. The age is calculated as time() - kube_pod_start_time.

Another detail that complicates this is that the phase and node are labels in their respective metrics, while the created date/time and age are the "result" of running the queries (i.e. they are not labels). This has been causing me problems in several attempts.

I tried seeing if I could somehow chain together queries, but in addition to being incredibly ugly and complicated, I couldn't get the result values (created date and age) to be included in the results that I managed to get to work.

If anyone knows how this could be done, I would very much appreciate knowing about it.


Solution

  • I was able to find a method that does what I need. Thanks to comments from https://stackoverflow.com/users/21363224/markalex that got me going on the thought track.

    I ended up creating 3 queries:

    (kube_pod_status_phase{namespace=~"runner.*"} != 0) + on (pod) group_left (node) (0 * kube_pod_info)
    
    time() - kube_pod_start_time{namespace=~"runner.*"}
    
    kube_pod_start_time{namespace=~"runner.*"}
    

    Then joining them together with a "Join by field" transform on the pod label. Finally, used an "Organize fields" transform to hide the columns I don't care about as well as some re-ordering.