Search code examples
prometheusprometheus-alertmanager

How to exclude job in Alertmanager?


I have a rule in Alertmanager:

  - alert: HostOutOfMemory
    expr: (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 10) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: Host out of memory (instance {{ $labels.instance }})
      description: "Node memory is filling up (< 10% left)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

Prometheus config:

  - job_name: "VM1"
    scrape_interval: 5s
    static_configs:
    - targets: ["192.168.0.24:9100"]
  - job_name: "VM2"
    scrape_interval: 5s
    static_configs:
    - targets: ["192.168.0.25:9100"]

How to exclude job "VM1" for this rule?


Solution

  • All metrics have job label associated with them, based on job that scraped this metric.

    To exclude metrics from a single job you can use != selector:

    node_memory_MemAvailable_bytes{job!="VM1"}
    

    To exclude metrics from multiple jobs you can use != selector multiple times, or use regex not matching selector !~:

    node_memory_MemAvailable_bytes{job!="VM1", job!="VM2"}
    

    or

    node_memory_MemAvailable_bytes{job!~"VM[12]"}
    

    Resulting expression from your question will look like this:

    (node_memory_MemAvailable_bytes{job!~"VM[12]"} / node_memory_MemTotal_bytes * 100 < 10) * on(instance) group_left (nodename) node_uname_info{nodename!=""}
    

    Notice:

    1. in your case it is enough to apply said filter to a single metric selector. Others will be filtered in the same way due to label matching of biary operations.
    2. nodename=~".+" is semantically equal to nodename!="" - check that nodename is not empty. But latter with have better performance.

    You can see demo of related queries here.