This is realy two questions in one - I think they are related.
kube_pod_status_phase
metric value represent?When I view the kube_pod_status_phase
metric in Prometheus, the metric value is always a 0 or 1, but it's not clear to me what 0 and 1 means. Let's use an example. The query below returns the value of this metric where the "phase" label equals "Running".
Query:
kube_pod_status_phase{phase="Running"}
Result: (sample)
kube_pod_status_phase{container="kube-state-metrics", endpoint="http", instance="10.244.211.138:8080", job="kube-state-metrics", namespace="argocd", phase="Running", pod="argocd-server-6f8487c84d-5qqv7", service="prometheus-kube-state-metrics", uid="ee84e48d-0302-4f5a-9e81-f4f0d7d0223f"}
1
kube_pod_status_phase{container="kube-state-metrics", endpoint="http", instance="10.244.211.138:8080", job="kube-state-metrics", namespace="default", phase="Running", pod="rapid7-monitor-799d9f9898-fst5q", service="prometheus-kube-state-metrics", uid="1561cd66-b5c4-48b9-83d0-11f4f1f0d5d9"}
1
kube_pod_status_phase{container="kube-state-metrics", endpoint="http", instance="10.244.211.138:8080", job="kube-state-metrics", namespace="deploy", phase="Running", pod="clean-deploy-cronjob-28112310-ljws6", service="prometheus-kube-state-metrics", uid="5510f859-74ca-471f-9c50-c1b8976119f3"}
0
kube_pod_status_phase{container="kube-state-metrics", endpoint="http", instance="10.244.211.138:8080", job="kube-state-metrics", namespace="deploy", phase="Running", pod="clean-deploy-cronjob-28113750-75m8v", service="prometheus-kube-state-metrics", uid="d63e5038-a8bb-4f88-bd77-82c66d183e1b"}
0
Why do some "running" pods have a value of 0, while others have a value of 1? Are the items with a value of 1 "currently" running (at the time the query was run) and the items with a value of 0 "had been" running, but are no longer?
kube_pod_status_phase
metric produces between Prometheus and Grafana. Why?If I use a slightly different version of the query above, I get different results between Prometheus and what is shown in Grafana.
Query:
kube_pod_status_phase{phase=~"Pending"} != 0
Result: (Prometheus}
empty query result
Result: (Grafana table view)
pod namespace phase
clean-deploy-cronjob-28115190-2rhv5 deploy Pending
If I go back to Prometheus and focus on that pod specifically:
Query:
kube_pod_status_phase{pod="clean-deploy-cronjob-28115190-2rhv5"}
Result:
kube_pod_status_phase{container="kube-state-metrics", endpoint="http", instance="10.244.211.138:8080", job="kube-state-metrics", namespace="deploy", phase="Failed", pod="clean-deploy-cronjob-28115190-2rhv5", service="prometheus-kube-state-metrics", uid="4dd948f6-327b-4c00-abc9-57d16bd588d0"}
0
kube_pod_status_phase{container="kube-state-metrics", endpoint="http", instance="10.244.211.138:8080", job="kube-state-metrics", namespace="deploy", phase="Pending", pod="clean-deploy-cronjob-28115190-2rhv5", service="prometheus-kube-state-metrics", uid="4dd948f6-327b-4c00-abc9-57d16bd588d0"}
0
kube_pod_status_phase{container="kube-state-metrics", endpoint="http", instance="10.244.211.138:8080", job="kube-state-metrics", namespace="deploy", phase="Running", pod="clean-deploy-cronjob-28115190-2rhv5", service="prometheus-kube-state-metrics", uid="4dd948f6-327b-4c00-abc9-57d16bd588d0"}
0
kube_pod_status_phase{container="kube-state-metrics", endpoint="http", instance="10.244.211.138:8080", job="kube-state-metrics", namespace="deploy", phase="Succeeded", pod="clean-deploy-cronjob-28115190-2rhv5", service="prometheus-kube-state-metrics", uid="4dd948f6-327b-4c00-abc9-57d16bd588d0"}
1
kube_pod_status_phase{container="kube-state-metrics", endpoint="http", instance="10.244.211.138:8080", job="kube-state-metrics", namespace="deploy", phase="Unknown", pod="clean-deploy-cronjob-28115190-2rhv5", service="prometheus-kube-state-metrics", uid="4dd948f6-327b-4c00-abc9-57d16bd588d0"}
0
Notice that the entry with phase "Running" has a value of 0, while the entry with a value of 1 has the phase "Succeeded". You could argue that the status changed during the period when I ran these queries. No, it has not. It has been showing these results for a long time.
This is just one example of strange inconsistencies I've seen between a query run in Prometheus vs. Grafana.
UPDATE:
I think I have gained some insight into the inconsistencies question. When I run the query in Prometheus it gives me the results as of "now" (a guess on my part). In Grafana, it takes into account the "time window" that's available in the dashboard header. When I dialed it back to "the last 5 minutes", the pending entry disappeared.
I see that there is an option at the dashboard level in Grafana to hide the time picker, which if set to hide, hides not only the time picker, but also the refresh period selector. If this option is used, I'm curious as to how often the dashboard is actually refreshed. Should I use this to effectively make Grafana only care about "now", instead of some time window into the past?
What does the
kube_pod_status_phase
metric value represent?
kube_pod_status_phase
contains a set of metrics for every pod with label phase
being set to "Failed", "Pending", "Running", "Succeeded", "Unknown".
Only one of those metrics (for every pod) will have value 1. It means that pod is in corresponding phase.
Why do some "running" pods have a value of 0, while others have a value of 1?
Remember, that Prometheus is not real time solution. It has values only with resolution of scrape_interval
. Check suspicious pods for other states, it's quite possible, that pod's state wasn't updated. Plus, for short-lived pods all kinds of strange behavior in metrics is possible.
There seem to be inconsistencies with what the kube_pod_status_phase metric produces between Prometheus and Grafana. Why?
Most likely your query in Grafana has type "Range" or "Both" and in table mode it shows all values over time range selected for dashboard.
If you only want to see last values (according to "To" value of dashboard time range), you can go to query options (under query in panel edit mode) and set type to "Instant".
I see that there is an option at the dashboard level in Grafana to hide the time picker, which if set to hide, hides not only the time picker, but also the refresh period selector. If this option is used, I'm curious as to how often the dashboard is actually refreshed. Should I use this to effectively make Grafana only care about "now", instead of some time window into the past?
No. This is for other uses. For example for presentation mode.