Search code examples
bashshellscriptingnagiosnrpe

Custom written Nagios plugin always returns incorrect value but works on command line


I wrote a plugin that would check to see if two hosts are online at the same time and if so, return a critical. When I run this command locally at the command line the logic works correctly and the echo statements all match up depending on the state of the hosts checked (e.g. "CRITICAL - Both testbed controllers online" or "OK - $VM1 is the only testbed controller online." etc). The problem is that when I run these commands via ./check_nrpe -H <NRPEHost> -c "controller_check" (the same values also show up on the Nagios webpage) It always returns the same value no matter what the status of the hosts: "CRITICAL - Both testbed controllers currently offline" echoing the actual values of $VM1 and $VM2 shows that the initial if check is always being set to 0.

The script works by creating binary values from a ping -c 1 -W 1 $HOSTNAME if check first and then using those values to create the actual alert/exit value. Here's the if statements creating binary values for host online states:

if ping -c 1 -W 1 $VM1HOSTNAME; then
  VM1=1
else
  VM1=0
fi

if ping -c 1 -W 1 $VM2HOSTNAME; then
  VM2=1
else
  VM2=0
fi

And the actual logic that creates the NRPE return:

if [ $VM1 -ne $VM2 ]; then
  if [ $VM1 -gt $VM2 ]; then
    echo "OK - $VM1 is currently the only testbed controller online."
    exit 0
  else
    echo "OK - $VM2 is currently the only testbed controller online."
    exit 0
  fi
elif [ $VM1 -eq $VM2 ]; then
  if [ $VM1 -eq 0 ]; then
    echo "CRITICAL - Both testbed controllers currently offline"
    exit 2
  else
    echo "CRITICAL - Both testbed controllers currently online."
    exit 2
  fi
else
  echo "UNKNOWN - Unable to read output."
  exit 3
fi

I've never written my own NRPE plugin before so I'm assuming I'm doing something simple wrong here but the NRPE plugin writing tutorials I've seen online seem to match up with what I've written. As a sidenote if I use check_ping instead of ping -c 1 -W 1 the values returned are correct but the only value that shows up on the Nagios webpage is the output of the first check_ping command.

For instance (this is correct):

./check_nrpe -H ikor -c "check_testbed_controller_status"
PING OK - Packet loss = 0%, RTA = 0.81 ms|rta=0.811000ms;10.000000;20.000000;0.000000 pl=0%;2;5;0
PING OK - Packet loss = 0%, RTA = 0.79 ms|rta=0.787000ms;10.000000;20.000000;0.000000 pl=0%;2;5;0
CRITICAL - Both testbed controllers currently online.

But the Nagios status information only shows PING OK - Packet loss = 0%, RTA = 0.79 ms instead of the echo statement that I want.

So I guess if I can either A) fix what's wrong with using /usr/bin/ping for the if check that causes NRPE to always read these hosts as offline (the if check always returns 0) or B) use check_ping but return only the third stdout line to Nagios that has the actual status information. Does anyone have any ideas or reading recommendations for me here? Thanks so much.


Solution

  • Turns out SELinux was preventing /usr/bin/ping from being executed by the NRPE daemon. Rather than attempt to write an SELinux policy to allow this I used the Nagios plugin check_ping and piped output to /dev/null. Final logic of the NRPE plugin looks like this:

    if $NRPEPING -H $VM1HOSTNAME -w 10,2% -c 20,5% > /dev/null 2>&1; then
      VM1=1
    else
      VM1=0
    fi
    
    if $NRPEPING -H $VM2HOSTNAME -w 10,2% -c 20,5% > /dev/null 2>&1; then
      VM2=1
    else
      VM2=0
    fi
    

    Which means that A) I don't have to set SELinux to permissive or allow the NRPE daemon to execute ping and B) my output in the Nagios status information column correctly shows the echo statements and no other information.