Search code examples
regexkuberneteslabelprometheusprometheus-alertmanager

Dynamically Adding 'team' Label to Alerts in Prometheus Using Regex


I'm working with Prometheus alerts, and I would like to dynamically add a 'team' label to all of my alerts based on a regex pattern. I have an example alert:

expr: label_replace(label_replace(increase(kube_pod_container_status_restarts_total{job="kube-state-metrics",namespace=~".*",pod!~"app-test-.*"}[30m]) > 2, "team", "data", "container", ".*test.*"), "team", "data", "pod", ".*test.*")

This example alert adds the 'team' label with the value 'data' for metrics matching the regex pattern ".test." in the 'container' and 'pod' labels.

However, I want to apply this logic to all of my alerts, not just this specific one. Is there a way to do this dynamically in Prometheus or Alertmanager? Any guidance would be appreciated.

I tried using the label_replace function in the expression of the alert, and it worked as expected for the specific alert mentioned above. I was expecting to find a way to apply this label addition to all of my alerts without having to modify each alert expression individually.

Is there a way to achieve this? Any help or guidance would be greatly appreciated.


Solution

  • AFAIK, there is no possibility to add labels to your alerts based on condition without rewriting all rules.

    Best solution for your exact question is to create separate alerts for all environments/teams/conditions and just add static labels.

    Something along the lines of

      - alert: many_restarts_data
        expr: increase(kube_pod_container_status_restarts_total{job="kube-state-metrics",namespace=~".*",pod!~"app-test-.*", container=~".*test.*"}[30m]) > 2
        labels:
          team: data
        
      - alert: many_restarts_data
        expr: increase(kube_pod_container_status_restarts_total{job="kube-state-metrics",namespace=~".*",pod!~"app-test-.*", container=~".*prod.*"}[30m]) > 2
        labels:
          team: sre
    

    But it will require multiplying number of alerts by number of teams.

    I would argue way easier solution is to use routing capabilities of alertmanager (or PagerDuty if it provides similar functionality). This way you write criteria which alerts with which labels should be routed to which teams, at alertmanager configuration, and it works independently from alerts creation part.

        routes:
        - matchers:
            - container =~ ".*test.*"
            - severity =~ ".*test.*"
            - alertname =~ "my_alert_1|my_alert_2"
          receiver: team-data
    
        - matchers:
            - container =~ ".*prod.*"
            - severity =~ ".*prod.*"
            - alertname =~ "my_alert_1|my_alert_2"
          receiver: team-sre