I have a rule in Alertmanager:
- alert: HostOutOfMemory
expr: (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 10) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}
for: 1m
labels:
severity: critical
annotations:
summary: Host out of memory (instance {{ $labels.instance }})
description: "Node memory is filling up (< 10% left)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
Prometheus config:
- job_name: "VM1"
scrape_interval: 5s
static_configs:
- targets: ["192.168.0.24:9100"]
- job_name: "VM2"
scrape_interval: 5s
static_configs:
- targets: ["192.168.0.25:9100"]
How to exclude job "VM1" for this rule?
All metrics have job
label associated with them, based on job that scraped this metric.
To exclude metrics from a single job you can use !=
selector:
node_memory_MemAvailable_bytes{job!="VM1"}
To exclude metrics from multiple jobs you can use !=
selector multiple times, or use regex not matching selector !~
:
node_memory_MemAvailable_bytes{job!="VM1", job!="VM2"}
or
node_memory_MemAvailable_bytes{job!~"VM[12]"}
Resulting expression from your question will look like this:
(node_memory_MemAvailable_bytes{job!~"VM[12]"} / node_memory_MemTotal_bytes * 100 < 10) * on(instance) group_left (nodename) node_uname_info{nodename!=""}
Notice:
nodename=~".+"
is semantically equal to nodename!=""
- check that nodename
is not empty. But latter with have better performance.You can see demo of related queries here.