I'm testing Bosun (open-source monitoring and alerting system by Stack Exchange) and I'm quite confused about how to monitor "boolean" metrics.
I would like to get alerted if some process is not running.
To collect the metric and I've tried 2 ways of doing it:
In the documentation of scollector I see that some processes can be configured I don't receive any related metric. Do I need any special configuration for enabling those processes checks?
I've created a custom collector to count those processes.
For getting alerted, I created the following rule:
alert test {
template = test
crit = avg(q("avg:myprocess.running{host=*}", "10m", "")) < 1
}
Is this the proper way of doing it or is there a better way?
last
, max
or min
.The scollector conf goes on each host. The configuration lines should be as specified in that documentation link you specified. Also keep in mind that your example alert has no warnNotification or critNotification, so it will only be on the dashboard (no emails or http posts will be set).
It is import to understand that first argument in "avg:myprocess.running{host=*}". So avg means to take all the tags that you did not specify and average them out. So for instance if you also had an ID tag like our scollector ones you might want to do sum
in the query string instead of avg
, and alert if there is less than one process.