We have a mail server that is dying and in the process of having accounts migrated to a new server before decommissioning. With 800+ email accounts across 25+ domains, it is important for this machine to stay up until migration is finished.
Lately it has started to fill up with error logs, which freeze mysql because of no space, stop mail flow, and generally give me a headache. Until the root problem of the errors can be found and fixed, I have come up with a script to check if Dovecot and Amavis-new are running, and if not restarts them.
After reading: https://stackoverflow.com/a/7096003/4820993
As well as a few other common examples, I came up with this.
netstat -an|grep -ce ':993.*LISTEN' >/dev/null 2>&1
if [ $? = 0 ]
then
echo 'Dovecot is up';
else
echo 'Dovecot is down, restarting...';
/etc/init.d/dovecot restart
logger -p mail.info dovecot_keepalive: Dovecot is down, restarting...
fi
/etc/init.d/amavis status |grep -ce 'running' >/dev/null 2>&1
if [ $? = 0 ]
then
echo 'AmavisD is up';
else
echo 'AmavisD is down, restarting...';
/etc/init.d/amavis restart
sleep 2
/etc/init.d/amavis status |grep -ce 'running' >/dev/null 2>&1
if [ $? = 1 ]
then
echo 'AmavisD had a problem restarting, trying to fix it now...';
logger -p mail.info amavis_keepalive: AmavisD had a problem restarting...
output=$(ps aux|grep a\[m\]avisd)
set -- $output
pid=$2
kill $pid
rm /var/run/amavis/amavisd.pid
/etc/init.d/amavis start
else
echo 'AmavisD restarted successfully';
logger -p mail.info amavis_keepalive: AmavisD is down, restarting...
fi
fi
Who knows, I'm probably making it harder that it is, and if so PLEASE LET ME KNOW!!!
I checked it against http://www.shellcheck.net and updated/corrected according to it's debug reports. I am piecing this together from examples elsewhere and would love someone to proofread this before I implement it.
The first part checking dovecot is already working just fine as a cronjob every 6 hours (yes the server is that messed up that we need to check it), it's the section about amavis I'm not sure about.
You can use Monit which will monitor your services and restart itself.
Amavisd:
# File: /etc/monit.d/amavisd
# amavis
check process amavisd with pidfile /var/amavis/amavisd.pid
group services
start program = "/etc/init.d/amavisd start"
stop program = "/etc/init.d/amavisd stop"
if failed port 10024 then restart
if 5 restarts within 5 cycles then timeout
Dovecot:
# File: /etc/monit.d/dovecot
check process dovecot with pidfile /var/run/dovecot/master.pid
start program = "/etc/init.d/dovecot start"
stop program = "/etc/init.d/dovecot stop"
group mail
if failed host localhost port 993 type tcpssl sslauto protocol imap then restart
if failed host localhost port 143 protocol imap then restart
if 5 restarts within 5 cycles then timeout
depends dovecot_init
depends dovecot_bin
check file dovecot_init with path /etc/init.d/dovecot
group mail
check file dovecot_bin with path /usr/sbin/dovecot
group mail