Search code examples
prometheusgrafanapromql

Filter Prometheus metrics by label of another metric


Let's say I have the following metrics:

system_cpu_usage{hostname="host1"} 10
system_cpu_usage{hostname="host2"} 92
system_cpu_usage{hostname="host3"} 95

process_cpu_usage{hostname="host2", cpu_usage="high"} 90

I have an alert condition as follows:

avg_over_time(system_cpu_usage[5m]) > 90

Which returns all instances where CPU usage is above 90:

system_cpu_usage{hostname="host2"} 92
system_cpu_usage{hostname="host3"} 95

But I would like to exclude instances which have the process_cpu_usage{cpu_usage="high"} metric present.

So, in that case it would just return:

system_cpu_usage{hostname="host3"} 95

Is this even possible using Prometheus/Grafana?


Solution

  • You can filter out metrics based on the other metrics with unless operator. It removes metrics from left-hand-side of this operator with same values of labels as those at the right-hand-side.

    For example if you have metrics

    metric1{label1="value1"}
    metric1{label1="value2"}
    
    metric2{label1="value1"}
    

    expression

    metric1 unless metric2
    

    will return

    metric1{label1="value2"}
    

    For your exact case you'll additionally need to use on for label matching

    avg_over_time(system_cpu_usage[5m]) > 90
     unless on(hostname) process_cpu_usage{cpu_usage="high"}