Search code examples
sshvpsnagioshosts

Hosts in Nagios are disappearing


This may belong in ServerFault, but I wanted to approach this community first. If this is not correct, please move this thread or close and I will open on the correct thread.

PROBLEM:
Hosts, along with their associated services, disappear and reappear upon refresh (F5 / Ctrl+F5 / etc).

STEPS TO REPRODUCE:
1. Log into Nagios
2. Click Service Detail
3. See a breakdown of services but you don't see the last one you added.
4. Refresh screen by using F5 / Ctrl+F5 / etc and it doesn't show up still
5. Refresh screen by using F5 / Ctrl+F5 / etc and it doesn't show up still
6. Refresh screen and it will show up.

(!) - Steps 4-6 vary

WHAT I'VE TRIED:

  • Restarting Nagios service (service Nagios restart)
  • Restarting HTTPD service (service httpd restart)
  • Restarting VPS
  • Refresh browser including "Clear Cache and Hard Reload"
  • Tried different browsers
  • Tried different computers
  • Tried different networks

SCREENSHOTS:
GOOD Good Nagios https://i.sstatic.net/brtY4.png

BAD Bad Nagios https://i.sstatic.net/PjWCi.png

POSSIBLE CAUSE:
The reason we're in this situation now is because we had an intern add this latest host and its associated service. He added it correctly, and I even checked his work. He did the normal preflight but instead of issuing the reset command via SSH he issued the command on the Web interface itself by accessing "Process Info > Restart the Nagios process". Seems like it would work OK, but we've never restarted like this and is the only reason I suspect it's the culprit of the issue we are seeing. Is there something different that this restart does over the normal SSH restart?

EDIT: To add to all of this, we have updated a different file today, unrelated to this host or it's services and Nagios is not updating.

Thanks for helping! Rich

EXTRA:
Here is a screenshot of the config file: Config File https://i.sstatic.net/ladvp.png


Solution

  • This can happen if you have multiple Nagios services running, There could be a secondary instance of the service running which hasn't been updated with the new configuration files as it technically hasn't been restarted. I've had this happen once or twice.

    First, shut down Nagios

    service nagios stop
    

    Next, kill all remaining instances.

    killall -9 nagios
    

    Finally, start Nagios back up

    service nagios start
    

    That should fix your problem.