Search code examples
timemonitoringalertsbosun

bosun time depended alerts


Good Morning.

I have been working with bosun monitoring application for the last few days and i like it a lot. But i need 1 thing that i am not able to get worked out.

I want to have 1 alert that response different depending on what time it is. So amount of logins to my site per hour in the day time needs to be 100 and at night time needs to be 10. When it falls below that i want to create a alert.

If i do that with 2 alerts the day time alert will go off at night. So i need to have a lookup that checks what time it is and then gives the correct threshold.

Anybody know how to do that.

Marcel Koert


Solution

  • Bosun doesn't have this feature. I've considered it, but I have never been shown a necessary use case. Why?

    There are two general cases I have considered regarding this:

    • Some job or event runs at time X, and you don't want to alert because it expect certain things when that job runs. In this instance, it would be better to monitor the job, and not alert when the job is running. That makes the coupling more tight - so when you change the time of the job, the alert still won't false trigger.
    • Things that vary throughout time. This is the case you are referring to if I am not mistake. When this happens, we see some seasonality to the data (In the following example, weekly seasonality):

    enter image description here

    To handle this case, we use anomalous alerts. Which effectively says something like "This is not what it was like at the same hour of the week for the past few weeks, send an alert". The key function for this is the band function. Here is an example of doing this from examples page:

    alert slower.route.performance {
        template = route.performance
        $notes = Response time is based on HAProxy's Tr Value. This is the web server response time (time elapsed between the moment the TCP connection was established to the web server and the moment it send its complete response header
        $duration = "1d"
        $route=*
        $metric = "sum:10m-avg:haproxy.logs.route_tr_median{route=$route}"
        $route_hit_metric = "sum:10m-avg:rate{counter,,1}:haproxy.logs.hits_by_route{route=$route}"
        $total_hit_metric = "sum:10m-avg:rate{counter,,1}:haproxy.logs.hits_by_route"
        $route_hits = change($route_hit_metric, $duration, "")
        $total_hits = change($total_hit_metric, $duration, "")
        $hit_percent = $route_hits / $total_hits * 100
        $current_hitcount =  len(q($metric, $duration, ""))
        $period = "7d"
        $lookback = 4
        $history = band($metric, $duration, $period, $lookback)
        $past_dev = dev($history)
        $past_median = percentile($history, .5)
        $current_median = percentile(q($metric, $duration, ""), .5)
        $diff = $current_median - $past_median
        warn = $current_median > ($past_median + $past_dev*2) && abs($diff) > 10 && $hit_percent > 1
        warnNotification = default
        ignoreUnknown = true
    }
    

    Hopefully this path solves your alerting needs?