Search code examples
nagios

Nagios: How to check service three times a day


I need a service to be checked three times a day at fixed times. The check should run at 7, 15 and 23 hours (every 8 hours at those times).

What I have tried is define a this time period:

define timeperiod{
    timeperiod_name         three_times_a_day
    monday                  07:00-07:10, 15:00-15:10, 23:00-23:10
    tuesday                 07:00-07:10,15:00-15:10,23:00-23:10
    wednesday               07:00-07:10,15:00-15:10,23:00-23:10
    thursday                07:00-07:10,15:00-15:10,23:00-23:10
    friday                  07:00-07:10,15:00-15:10,23:00-23:10
    saturday                07:00-07:10,15:00-15:10,23:00-23:10
    sunday                  07:00-07:10,15:00-15:10,23:00-23:10
}

And the service (on several host) like this:

define service{
    use                     all_templates
    host_name               some_host
    service_description     some_service
    check_command           some_command
    check_period            three_times_a_day
    max_check_attempts      1
    check_interval          480 ; run every 8 hours
}

From here https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/4/en/timeperiods.html it says "When Nagios Core attempts to reschedule a host or service check, it will make sure that the next check falls within a valid time range within the defined timeperiod. If it doesn't, Nagios Core will adjust the next check time to coincide with the next "valid" time in the specified timeperiod."

But the thing is that this it's not happening.

When i check the Scheduling Queue, i see:

+--------------+--------------+-----------------+-----------------+
|    Host      |   Service    |   Last Check    |   Next Check    |
+--------------+--------------+-----------------+-----------------+
| some_host    | some_service | 8/12/2019 9:35  | 8/12/2019 15:01 |
| some_host_1  | some_service | 8/12/2019 7:01  | 8/12/2019 15:01 |
| some_host_2  | some_service | 8/12/2019 8:50  | 8/12/2019 15:02 |
| some_host_3  | some_service | 8/12/2019 9:30  | 8/12/2019 15:02 |
| some_host_4  | some_service | 8/12/2019 9:22  | 8/12/2019 15:02 |
| some_host_5  | some_service | 8/12/2019 7:03  | 8/12/2019 15:03 |
| some_host_6  | some_service | 8/12/2019 8:53  | 8/12/2019 15:04 |
| some_host_7  | some_service | 8/12/2019 9:58  | 8/12/2019 15:04 |
| some_host_8  | some_service | 8/12/2019 9:30  | 8/12/2019 15:04 |
| some_host_9  | some_service | 8/12/2019 7:05  | 8/12/2019 15:05 |
| some_host_10 | some_service | 8/12/2019 9:01  | 8/12/2019 15:05 |
| some_host_11 | some_service | 8/12/2019 10:02 | 8/12/2019 15:05 |
| some_host_12 | some_service | 8/12/2019 9:21  | 8/12/2019 15:05 |
| some_host_13 | some_service | 8/12/2019 7:08  | 8/12/2019 15:08 |
| some_host_14 | some_service | 8/12/2019 7:08  | 8/12/2019 15:08 |
| some_host_15 | some_service | 8/9/2019 14:49  | 8/12/2019 16:24 |
+--------------+--------------+-----------------+-----------------+

Why the service is beign checked outside the timperiod? Why some_host_15 didn't check on 8/10 and 8/11 and 8/12? How can I achive to check a service 3 times a day at fixed times?

Thanks!


Solution

  • "When Nagios Core attempts to reschedule a host or service check, it will make sure that the next check falls within a valid time range within the defined timeperiod. If it doesn't, Nagios Core will adjust the next check time to coincide with the next "valid" time in the specified timeperiod."

    I was actually feeling pretty sure this wouldn't be the case, but maybe this is a bug if you're seeing a different behavior. I would expect the time periods and the check intervals to create a timing issue that would cause many checks to be dropped. Regardless of how things should work and what is/isn't expected behavior, I wouldn't personally configure it like this. Since you say that:

    I need a service to be checked three times a day at fixed times.

    Here's what I would do, if I were you:

    • I would run this check as a cron job, and send in the result of the check as a passive check command to Nagios. This way, you know for sure that the check will always run on time.
    • I would then configure a freshness_threshold to ensure that this passive service has actually phoned home recently.
    • I would also configure a check_command that prepares for the eventuality of the service not having a fresh result, i.e. something that executes only if no service check has been received -- perhaps a script that re-runs the check and notifies me somehow.