I need to create a rule for Prometheus that compares the value of some metric to a threshold value. The threshold value is the same for most of instances, but differs for two or three. Is there an easy and reliable way to parameterize a rule?
May be something like this:
- alert: HighValueAlert
expr: my_metric > my_metric_threshold
for: 5m
Where my_metric_threshold
is an "artificial" metric which is defined somewhere e.g. using Node exporter textfile collector (or perhaps using another method that I have no idea about):
my_metric_threshold{instance="special1"} 101
my_metric_threshold{instance="special2"} 102
my_metric_threshold 100 # default for most of instances
Wishes for reliability:
.prom
file), I should receive some kind of alert about the incorrect configuration (perhaps using a separate rule).I'm new to Prometheus and I couldn't find any examples of solving this problem.
I'm setting up indoor temperature monitoring rules. The upper temperature threshold for most rooms is the same, but for 2-3 rooms it needs to be increased. Otherwise we will get too frequent alerts. The same applies to the lower temperature threshold.
Your general approach is correct. But here are a couple suggestions how to make your life easier.
my_metric_threshold_special{instance="server1"} 101
my_metric_threshold_special{instance="server42"} 102
my_metric_threshold_default 100
# if you'll expose it through textfile collector too, it will also have an instance label.
# It doesn't matter, just make sure to expose it only once.
- alert: HighValueAlertDefault
expr: my_metric > on() group_left() my_metric_threshold_default unless on(instance) my_metric_threshold_special
- alert: HighValueAlertSpecial
expr: my_metric > on(instance) group_left() my_metric_threshold_special
Here default alert rule compares ignoring all labels, and disregards metrics of instances that have special thresholds.
And special alert rule simply compares metric to threshold.
To check my_metric_threshold_default
you can use expression
absent(my_metric_threshold_default) or count(my_metric_threshold_default)>1