Let's say I have the following metrics:
system_cpu_usage{hostname="host1"} 10
system_cpu_usage{hostname="host2"} 92
system_cpu_usage{hostname="host3"} 95
process_cpu_usage{hostname="host2", cpu_usage="high"} 90
I have an alert condition as follows:
avg_over_time(system_cpu_usage[5m]) > 90
Which returns all instances where CPU usage is above 90:
system_cpu_usage{hostname="host2"} 92
system_cpu_usage{hostname="host3"} 95
But I would like to exclude instances which have the process_cpu_usage{cpu_usage="high"}
metric present.
So, in that case it would just return:
system_cpu_usage{hostname="host3"} 95
Is this even possible using Prometheus/Grafana?
You can filter out metrics based on the other metrics with unless
operator. It removes metrics from left-hand-side of this operator with same values of labels as those at the right-hand-side.
For example if you have metrics
metric1{label1="value1"}
metric1{label1="value2"}
metric2{label1="value1"}
expression
metric1 unless metric2
will return
metric1{label1="value2"}
For your exact case you'll additionally need to use on
for label matching
avg_over_time(system_cpu_usage[5m]) > 90
unless on(hostname) process_cpu_usage{cpu_usage="high"}