Search code examples
nagios

Nagios - How to Handle Expected Non-OK Statuses?


Every night/morning at about 1:00 AM a server runs it's daily backup. During this time it is normal for the CPU-Usage to be higher then the WARNING/CRITICAL levels while the backup is running. But, I receive the Problem and Recovery notifications during this time every day...

Since it should be considered "normal" for high CPU during this time, what would be the best way to handle this situation?

Would using something like "notification_period" be something I would want to use for this?

I'm thinking if the CPU is high for this host between 1:00 and 2:00, then ignore/don't send notification during this period. And if service state is non-OK after 2:00, then send a notification...

Any thoughts or suggestions would be greatly appreciated!


Solution

  • Probably best solution is check_period directive in your service definition, because Nagios don't have something like machine learning.
    I suggest you to disable active check for this service during your daily backups. Config example for time period:

        define timeperiod{
             timeperiod_name               24X7custom
             alias                                   24X7custom
             sunday                               00:00-01:00,02:01-23:59
             monday                             00:00-01:00,02:01-23:59
             tuesday                             00:00-01:00,02:01-23:59
             wednesday                        00:00-01:00,02:01-23:59
             thursday                           00:00-01:00,02:01-23:59
             friday                                00:00-01:00,02:01-23:59
             saturday                           00:00-01:00,02:01-23:59
    }