Hi Stackoverflow community,
i need a help with bash script since i am new to it. What i am trying to accomplish, we have a windows server, sometimes it hits 90% memory, so whenever nagios catches it, we want to restart these services via nrpe. But before restarting all of the services, first service has to come up and once its up continue with the next service restart.
Another option is to stop all 4 services and then start them sequentially.
Here is script that i wrote:
case "$1" in
OK)
;;
WARNING)
;;
UNKNOWN)
;;
CRITICAL) ## DECISION ENGINE RESTART
echo -n "Restarting Decision Engine_1"
cat /usr/local/nagios/libexec/mail/DeServiceRestart.txt | mail -s "Restarting DE services" [email protected] -r Nagios@ATL-NM-01
/usr/local/nagios/libexec/check_nrpe -H "$2" -p 5666 -c restart_service -a DecisionEngine_1;
if /usr/local/nagios/libexec/check_nrpe -H "$2" -t 30 -c check_service -a DecisionEngine_1 'crit=not state_is_ok()' > OK:
then
echo -n "Restarting Decision Engine_2"
/usr/local/nagios/libexec/check_nrpe -H "$2" -p 5666 -c restart_service -a DecisionEngine_2
if /usr/local/nagios/libexec/check_nrpe -H "$2" -t 30 -c check_service -a DecisionEngine_2 'crit=not state_is_ok()' > OK:
then
echo -n "Restarting Decision Engine_3"
/usr/local/nagios/libexec/check_nrpe -H "$2" -p 5666 -c restart_service -a DecisionEngine_3
if /usr/local/nagios/libexec/check_nrpe -H "$2" -t 30 -c check_service -a DecisionEngine_3 'crit=not state_is_ok()' > OK:
then
echo -n "Restarting Decision Engine_4"
/usr/local/nagios/libexec/check_nrpe -H "$2" -p 5666 -c restart_service -a DecisionEngine_4
else
echo " Restart is complete"
fi
;;
esac
exit 0
Not sure where i made a mistake, would appreciate any feedback.
Thanks!
All comments are in code. Double-check StopService function, because you not mentioned the way how to stop service, so I made it similarly.
#!/bin/bash
SERVICESTATE=$1; #Common Check State (OK,WARNING,CRITICAL or UNKNOWN)
Host=$2; #HostName or IP
SERVICESTATETYPE=$3; #Hard or Soft service type
TimeOut=3; #Time (seconds) to wait service start/stop
#before next service processing
#You could not make infinite TimeOut, because
#nagios process will kill this handler if it
#will run too long
#Services is array with service names
Services=(DecisionEngine_1 DecisionEngine_2 DecisionEngine_3 DecisionEngine_4)
#add path to nagios plugins dir
PATH=$PATH:/usr/local/nagios/libexec
RestartService() {
#function restarts services via NRPE.
#Usage: RestartService ServiceName
echo -n " Restarting $1;"
check_nrpe -H "${Host}" -p 5666 -c restart_service -a "$1" >/dev/null 2>&1
return $?
}
StopService() {
#function stops services via NRPE.
#Usage: StopService ServiceName
echo -n " Stopping $1;"
check_nrpe -H "${Host}" -p 5666 -c stop_service -a "$1" >/dev/null 2>&1
return $?
}
ServiceWait() {
#function do continious checks service via NRPE, until success,
#unsuccess check or TimeOut
#Usage: ServiceWait ServiceName {start|stop}
#start optin waits for success check
#stop option waits for unsuccess check
Logic="";
[ "$2" == "start" ] && Logic="-eq"; #RC for start check should be 0
[ "$2" == "stop" ] && Logic="-ne" ; #RC for stop check should NOT be 0
[ -z "$Logic" ] && { echo "ServiceWait function usage error"; exit 19; }
t=${TimeOut}
while [ "$t" -ge 0 ]; do
check_nrpe -H "${Host}" -p 5666 -t 30 \
-c check_service -a "$1" 'crit=not state_is_ok()' >/dev/null 2>&1
RC=$?
[ "$RC" $Logic 0 ] && { echo -n "CheckRC=$RC;"; return $RC; }
#success check, no need to wait anymore
let t--
sleep 1
done
echo -n "TimeOut; "
return 3
}
#check if script received zero params in $1, $2 and $3
[ -z "${SERVICESTATE}" -o -z "${Host}" -o -z "${SERVICESTATETYPE}" ] && {
echo "Usage: $0 {OK|WARNING|UNKNOWN|CRITICAL} Hostname {SOFT|HARD}";
exit 1;
}
case "${SERVICESTATE}" in
OK)
;;
WARNING)
;;
UNKNOWN)
;;
CRITICAL) ## DECISION ENGINE RESTART
#uncomment if you need @mail
#cat /usr/local/nagios/libexec/mail/DeServiceRestart.txt | \
# mail -s "Restarting DE services" [email protected] -r Nagios@ATL-NM-01
RC=0
if [ "$SERVICESTATETYPE" == "SOFT" ] ; then
for (( i=0; i<${#Services[*]}; i++ )); do
RestartService ${Services[$i]}
ServiceWait ${Services[$i]} start
RC=$?
#if previous check failed, then do not try to do any restarts anymore
[ "$RC" -ne 0 ] && break;
SuccessRestart+=(${Services[$i]})
done
echo "Restart is complete. ${SuccessRestart[*]} Return Code is ${RC}"
elif [ "$SERVICESTATETYPE" == "HARD" ] ; then
#Stop all services sequentially.
for (( i=0; i<${#Services[*]}; i++ )); do
StopService ${Services[$i]}
#Here you need to experiment what to wait
#May be it will be better to stay here for N seconds while
#service is been stopped
#rather then try to check service state
ServiceWait ${Services[$i]} stop
#sleep $TimeOut
done
#Start all services sequentially.
for (( i=0; i<${#Services[*]}; i++ )); do
RestartService ${Services[$i]}
ServiceWait ${Services[$i]} start
RC=$?
#if previous check failed, then do not try to do any restarts anymore
[ "$RC" -ne 0 ] && break;
SuccessRestart+=(${Services[$i]})
done
else
echo "Unknown SERVICESTATETYPE $SERVICESTATETYPE option"
exit 20
fi
;;
esac
exit 0