I don't suppose anybody has a handy way of fixing labels such that you can do a group_left/join on one series from kube_state_metrics and one from node-exporter. This is surprisingly difficult to do given that out of the box at least for me they do not share any labels in common that can be pivoted on.
The way I see this I have two options:
somehow mess with the scrape targets such that I can put a common label to the series, which may or may not be possible. Looking at this stuff is complicated. But in any case this increases my cardinality probably in ways I don't want.
Do a gross regex relabel to create my own common label to pivot on.
For option 2: I have the following close-ish labels to work with where I can do a transform to get my pivot point:
instance="10.26.10.113:9100" < node-exporter
internal_ip="10.26.10.113" < kube_state_metrics
I'm doing the following query as a test with the following truncated output
label_replace(node_cpu_seconds_total{mode="idle"}, "instance", "$1", "instance", "[^:]*")
node_cpu_seconds_total{instance="10.26.10.113:9100", mode="idle"} 4975537.86
I would have expected that regex to capture everything before the :
and then replace the instance label just that first capture group. It however does not. If I change the second last field of the query, the instance label disappears entirely.
label_replace(node_cpu_seconds_total{mode="idle"}, "internal_ip", "$1", "instance", "[^:]*")
node_cpu_seconds_total{mode="idle"} 4975537.86
What am I doing wrong? I believe the regex is valid RE2 regex syntax according to https://regoio.herokuapp.com/. As best as I can read the documentation, my syntax appears to be correct as well. https://prometheus.io/docs/prometheus/latest/querying/functions/#label_replace
I have no idea why, but stealing the regex here seems to work. I guess I need to provide capture groups for the entire source label, even if I don't care about the whole thing.
https://newbedev.com/relabel-instance-to-hostname-in-prometheus
label_replace(node_cpu_seconds_total{mode="idle"}, "internal_ip", "$1", "instance", "([^:]+)(:[0-9]+)?")
node_cpu_seconds_total{internal_ip="10.26.10.113", instance="10.26.10.113:9100, mode="idle"} 4975537.86
===
The instance
label on kube_state_metrics and node-exporter are different things by default! One refers to the node-exporter's binding address of hostip:9100
by default whereas the kube-state-metrics one is podip:8080
. Do not try to do a join on the series without first doing a label_replace!
You will get an error
found duplicate series for the match group {instance="10.2.2.45:8080"}
Total instructions for what I did for the next poor soul: to join node-exporter and kube_state_metrics series on node label
instance=10.10.10.10:9001 < label laid on node-exporter series
internal_ip=10.10.10.10 < label laid on kube_state_metrics series
First relabel by regex to the common internal_ip label
label_replace(node_cpu_seconds_total{mode="idle"}, "internal_ip", "$1", "instance", "([^:]+)(:[0-9]+)?")
group_left to add the node label to the series
label_replace(node_cpu_seconds_total{mode="idle"}, "internal_ip", "$1", "instance", "([^:]+)(:[0-9]+)?")
* on (internal_ip) group_left (node) kube_node_info{}
The first argument of label_replace takes in a series. Sub in the rate query. note the [1m] positioning
label_replace(rate(node_cpu_seconds_total{mode="idle"}[1m]), "internal_ip", "$1", "instance", "([^:]+)(:[0-9]+)?")
* on (internal_ip) group_left (node) kube_node_info{"}
Do the rest of the query
1 - avg by (node) (
label_replace(rate(node_cpu_seconds_total{mode="idle"}[5m]), "internal_ip", "$1", "instance", "([^:]+)(:[0-9]+)?")
* on (internal_ip) group_left (node) kube_node_info{}
)