Search code examples
serviceprocessdaemonupstartgnu-parallel

Upstart server killed using -9 or -15 but child processes are still alive


Upstart service is responsible for creating a gearman workers which run in parallel as number of cpus with the help of gnu-parallel. To understand the problem you can read my stackoverflow post which describes how to run workers in parallel.

Fork processes indefinetly using gnu-parallel which catch individual exit errors and respawn

Upstart service: workon.conf

# workon

description "worker load"

start on runlevel [2345]
stop on runlevel [!2345]

respawn

script
  exec seq 1000000 | parallel -N0 --joblog out.log ./worker
end script

Oright. so above service is started

$ sudo service workon start
workon start/running, process 4620

4620 is the process id of service workon.

4 workers will be spawned as per cpu cores. for example.

___________________
Name   |  PID
worker    1011
worker    1012
worker    1013
worker    1014
perl      1000

perl is the process which is running gnu-parallel. And, gnu-parallel is responsible for running parallel worker processes.

Now, the problem is. If I kill the workon service.

$ sudo kill 4620

The service has instruction to re-spawn if killed so it restarts. But, the processes created by the service are not killed. Which means it creates a new set of processes. Now we have 2 perl and 8 workers.


Name   |  PID
worker    1011
worker    1012
worker    1013
worker    1014
worker    2011
worker    2012
worker    2013
worker    2014
perl      1000
perl      2000

If you ask me, the old process which abandon by service, are they zombies? Well, the answer is no. They are alive cuz I tested them. Everytime the service dies it creates a new set.

Well, this is one problem. Another problem is with the gnu-parallel. Lets say I started the service as fresh. Service is running good. I ran this command to kill the gnu-parallel, i.e. perl

$ sudo kill 1000

This doesn't kill the workers,and they again left without any parent. But, the workon service intercept the death of perl and respawn a new set of workers. This time we have 1 perl and 8 workers. All 8 workers are alive. 4 of them with parent and 4 are orphan.

Now, how do I solve this problem? I want kill all processes created by the service whenever it crashes.


Solution

  • Well, I was able to solve this issue by post-stop. It is an event listener I believe which executes after a service ends. In my case, if I run kill -9 -pid- (pid of the service), post-stop block is executed after the service process is killed. So, I can write the necessary code to remove all the processes spawned by the service.

    here is my code using post-stop.

    post-stop script
        exec killall php & killall perl
    end script