Search code examples
dockermonitzombie-process

Monit not clearing the pid file and restarting a process when the process becomes a zombie


I'm running monit inside a docker container which is monitoring a bunch of processes like vault, nginx, mongodb and few more. I have created wrapper scripts for each processes with start stop functionality, which is fed into the

#!/bin/sh
# vault service script

VAULT_DIR="/tmp/vault"
VAULT_USER="myuser"
USER=$(whoami)
if [ $USER != "root" ]
then
     echo "Only root can run vault-server service"
     exit 1
fi


usage() {
     echo "Usage: `basename $0`: <start|stop|status|restart>"
     exit 1 
}

start() {
     status
     if [ $PID -gt 0 ]
     then
        echo "vault server daemon was already started. PID: $PID"
        return $PID
     fi
     echo "Starting vault server daemon..."
     rm -f /var/run/vault.pid
     VAULT_OPTIONS=""
     VAULT_OPTIONS="-dev"
     su $VAULT_USER -c "/usr/bin/nohup vault server $VAULT_OPTIONS 1>/var/log/vault/vault.log 2>/var/log/vault/vault.err &"
     status
     if [ $PID -gt 0 ]
     then
        echo $PID >> /var/run/vault.pid
     fi
     sleep 5
     su $VAULT_USER /opt/vault/setup-vault.sh
}

stop() {

     status
     if [ $PID -eq 0 ]
     then
        echo "vault server daemon is already not running"
        return 0
     fi
     echo "Stopping vault server daemon..."
     rm -f /var/run/vault.pid
     kill $PID
 }
status() {                                                               
     PID=`ps -ef | grep "vault server" | grep -v grep | grep -v "\[" | awk '{print $1}'`                                                  
     if [ "x$PID" = "x" ]                                     
     then                                                                                                                  
        PID=0                                                       
     fi                                                                                                                    

     # if PID is greater than 0 then vault server is running, else it is not                                               
     return $PID                                                         
}                                                                              

if [ "x$1" = "xstart" ]                                                        
then                                                                                                                          
  start                                                                  
  exit 0                                                                 
fi                                                                                                                            

if [ "x$1" = "xstop" ]                                                                                                        
then                                                                                                                          
  stop                                                                   
  exit 0                                                                  
fi                                                                             

if [ "x$1" = "xrestart" ]                                                      
then                                                                           
  stop                                                     
  start                                                                  
  exit 0                               
fi                                                                             

if [ "x$1" = "xstatus" ]                                                       
then                                          
   status                                                                 
   if [ $PID -gt 0 ]                                        
   then                                                                   
      echo "vault server daemon is running with PID: $PID"
   else                                                                   
      echo "vault server daemon is NOT running"                   
   fi                                                                     
   exit $PID                                                           
fi                                                                             

usage  

For some reason when the process crashes and becomes a zombie, monit doesn't clear the pid files and restart the process. Also to verify and not catch a zombie process in my status function, I've added grep -v "\[" clause to ps -ef statement. Is there anything else I need to do or if anybody has faced this issue before?


Solution

  • If your application is spawning zombies, then add tini into your stack. Your entrypoint/cmd becomes tini which calls your existing entrypoint, and tini will handle zombie reaping.

    This is the result of zombie processes not passing the namespaced container jail to be reaped by the host's init process. So you need a namespaced pid 1 that reaps your zombies.