I am learning and experimenting with Prometheus metrics and Grafana dashboards. The information is coming from a Kubernetes cluster.
I am struggling with figuring out how to collect together information about pods, that is coming from multiple metrics. I believe the metrics involved are all related in some way. They are all in the kube_pod...
"family".
I've used a technique like the following that works for a simple metric to metric case:
(metric_A) + on(<common_label>) group_left(<metric_B_label>, ...) (0 * metric_B)
This allows me to associate a label from the right side that is absent from the left side via a common label. There is no arithmetic involved, so the add and multiply really do nothing. The on
(or ignoring
) operator apparently requires a binary operator between the left and right sides. This seems really clunky, but it does work and I know of no other way to achieve this.
Here's a concrete example:
(kube_pod_status_phase != 0) + on (pod) group_left (node) (0 * kube_pod_info)
The kube_pod_status_phase
provides the phase
(and, of course, pod
) of each pod (Running, Failed, etc.), but does not have the node information. The kube_pod_info
has the node
label and a matching pod
label. Using the query above provides a collection of pods, their current phase and which node they're associated with.
My current task is to collect the following information:
Node | Pod | Status | Created | Age |
---|---|---|---|---|
node_1 | pod_A_1 | Running | mm/dd/yyyy hh:mm:ss | {x}d{y}h |
node_1 | pod_B_1 | Running | mm/dd/yyyy hh:mm:ss | {x}d{y}h |
node_2 | pod_C_1 | Pending | mm/dd/yyyy hh:mm:ss | {x}d{y}h |
node_3 | pod_A_2 | Running | mm/dd/yyyy hh:mm:ss | {x}d{y}h |
node_3 | pod_B_2 | Failed | mm/dd/yyyy hh:mm:ss | {x}d{y}h |
... | ... | ... | ... | ... |
My plan is to get the status (phase) from the kube_pod_status_phase
metric, the created date/time and the age from the kube_pod_start_time
metric and include the corresponding node from the kube_pod_info
metric. The age is calculated as time() - kube_pod_start_time
.
Another detail that complicates this is that the phase
and node
are labels in their respective metrics, while the created date/time and age are the "result" of running the queries (i.e. they are not labels). This has been causing me problems in several attempts.
I tried seeing if I could somehow chain together queries, but in addition to being incredibly ugly and complicated, I couldn't get the result values (created date and age) to be included in the results that I managed to get to work.
If anyone knows how this could be done, I would very much appreciate knowing about it.
I was able to find a method that does what I need. Thanks to comments from https://stackoverflow.com/users/21363224/markalex that got me going on the thought track.
I ended up creating 3 queries:
(kube_pod_status_phase{namespace=~"runner.*"} != 0) + on (pod) group_left (node) (0 * kube_pod_info)
time() - kube_pod_start_time{namespace=~"runner.*"}
kube_pod_start_time{namespace=~"runner.*"}
Then joining them together with a "Join by field" transform on the pod
label. Finally, used an "Organize fields" transform to hide the columns I don't care about as well as some re-ordering.