Search code examples
nagios

Detecting disconnected host with Nagios passive check


I registered list of hosts and their services to Nagios. Nagios is notified by the hosts using passive checks. All working fine , but have a problem when the communication lost between the hosts and Nagios, all the services stay in the same state (e.g. ok) and there is no indication that the hosts stop notifying Nagios.

Any idea ?

Thanks in advance ...


Solution

  • You're looking for "Freshness checking".

    A freshness check is performed when the last check received has become 'stale'.

    define service{
        use                     generic-service
        host_name               My_Server
        service_description     CPU Load
        active_checks_enabled   1
        passive_checks_enabled  1
        check_command           check_active
        check_interval          99999999
        check_period            24x7
        check_freshness         1
        freshness_threshold     600
        }
    

    It's worth mentioning that when a service breaches the freshness threshold, an active check is performed against the service using the command defined within the check_command parameter.

    I created a custom command that writes out a critical alert to Nagios immediately without actually performing any checks. (It doesn't need to as the command will only be triggered when the last check has become 'stale').

    #!/usr/bin/perl
    print "CRITICAL: Server has not checked in"\n; exit(2);
    

    The above should be saved under your "libexec" folder as "check_active".

    Define a command under your commands.cfg file as below:

    define command{
         command_name      check_active
         command_line      $USER1$/check_active 
         }
    

    As long as Nagios has the authority to run your new command, the service will become critical if the freshness threshold is breached.