Search code examples
prometheuspromqlprometheus-alertmanager

get node labels in prometheus alertmanager rule


so I have this rule

- alert: InstanceNotReady
        expr: kube_node_status_condition{condition="Ready", status=~"unknown|false"} == 1
        for: 1m
        labels:
          severity: critical
          
        annotations:
          summary: {{`Kubernetes node {{ $labels.node }} is in NotReady state`}}
          description: Node entered NotReady unresponsive state

But it does not contain the labels of the node:

kube_node_status_condition{
 app_kubernetes_io_instance="prometheus", 
 app_kubernetes_io_managed_by="Helm", 
 app_kubernetes_io_name="kube-state-metrics", 
 argocd_argoproj_io_instance="prometheus", 
 condition="Ready", 
 helm_sh_chart="kube-state-metrics-3.5.2", 
 instance="10.120.1.147:8080", 
 job="kubernetes-service-endpoints", 
 kubernetes_name="prometheus-kube-state-metrics", 
 kubernetes_namespace="prometheus", 
 kubernetes_node="ip-10-120-1-39.us-west-2.compute.internal", 
 node="ip-10-120-3-76.us-west-2.compute.internal", 
 status="unknown"
}

So I need to add labels assigned to kubernetes node to make the alert more informative.

I have kube_node_labels having what I want

kube_node_labels{
  app_kubernetes_io_instance="prometheus", 
  app_kubernetes_io_managed_by="Helm",
  app_kubernetes_io_name="kube-state-metrics",
  argocd_argoproj_io_instance="prometheus", 
  helm_sh_chart="kube-state-metrics-3.5.2", 
  instance="10.120.0.226:8080", 
  job="kubernetes-service-endpoints", 
  kubernetes_name="prometheus-kube-state-metrics", 
  kubernetes_namespace="prometheus", 
  kubernetes_node="ip-10-120-1-39.us-west-2.compute.internal", 
  label_grafana="true", 
  label_node_kubernetes_io_instance_type="t3.small",
  label_node_kubernetes_io_lifecycle="on-demand", 
  label_topology_kubernetes_io_region="us-west-2", 
  node="ip-10-120-3-76.us-west-2.compute.internal"
}

So I'd like these label_* labels to the alert and display them in slack.

I tried this:

kube_node_status_condition{condition="Ready", status=~"false|unknown"}==1 group_left kube_node_labels
kube_node_status_condition{condition="Ready", status=~"false|unknown"}==1 group_left(node) kube_node_labels

which didn't work with error

Error executing query: invalid parameter "query": 1:75: parse error: unexpected <group_left>

So my questions

  • How to get these labels with promql query?
  • How to modify the go tpl to display labels with label_ prefix for the alert rule

Solution

  • solution

    (kube_node_status_condition{condition="Ready", status="unknown"} * on (node) group_right() kube_node_labels) == 1
    

    output

    {
     app_kubernetes_io_instance="prometheus",
     app_kubernetes_io_managed_by="Helm",
     app_kubernetes_io_name="kube-state-metrics",
     argocd_argoproj_io_instance="prometheus",
     helm_sh_chart="kube-state-metrics-3.5.2",
     instance="10.120.0.226:8080",
     job="kubernetes-service-endpoints",
     kubernetes_name="prometheus-kube-state-metrics",
     kubernetes_namespace="prometheus",
     kubernetes_node="ip-10-120-1-39.us-west-2.compute.internal",
     label_grafana="true",
     label_node_kubernetes_io_instance_type="t3.small",
     label_node_kubernetes_io_lifecycle="on-demand",
     label_topology_kubernetes_io_region="us-west-2",
     node="ip-10-120-3-76.us-west-2.compute.internal"
    }
    

    See more details: https://www.robustperception.io/left-joins-in-promql