Search code examples
linuxsignalssudo

Why does this program hang on exit? (interaction between signals and sudo)


I am debugging a legacy program (on Linux). To synchronise it with another process I tried naively adding a raise(SIGSTOP). However when run under sudo I get a defunct (zombie) process and a hung terminal. Can someone explain what is happening here and how it can be avoided.

I've reduced the problem to the following simple C program (selfstop.c):

#include <signal.h>
#include <stdio.h>

int main(void)
{
  printf("about to stop\n");
  (void)raise(SIGSTOP);
  printf("resumed\n");
  return 0;
}

If run as normal it displays "about to stop" and halts itself with SIGSTOP. kill -18 <pid> causes it to display "resumed" and exit as desired.

However, if I run it under sudo i.e.

sudo ./selfstop

in another terminal:

sudo kill -18 <pid>

It displays "resumed" and returns control to the terminal but I am left with a defunct process:

>ps aux | grep [s]elf
root      7619  0.0  0.0 215476  4136 pts/4    T    18:16   0:00 sudo ./selfstop
root      7623  0.0  0.0      0     0 pts/4    Z    18:16   0:00 [selfstop] <defunct>

Things get worse if the program is run in a script (runselfstop):

#!/bin/sh
sudo ./selfstop

Now when the process exits it hangs the terminal. In both cases normal service is resumed by killing the sudo process (in this case "7619 = sudo ./selfstop":

sudo kill -9 7619

My question is why do we get the zombie and how do we avoid it.

Note: The reason for using sudo is irrelevant here. It relates to the legacy application.


Solution

  • sudo will suspend itself if the command it's running suspends itself. This allows you to, for example, run sudo -s to start a shell, then type suspend in that shell to get back to your top-level shell. If you have the source code for sudo, you can look at the suspend_parent function to see how this is done.

    When sudo (or any process) has been suspended, the only way to resume it is to send it a SIGCONT signal. Sending SIGCONT to the selfstop process won't do that.

    >ps aux | grep [s]elf
    root      7619  0.0  0.0 215476  4136 pts/4    T    18:16   0:00 sudo ./selfstop
    root      7623  0.0  0.0      0     0 pts/4    Z    18:16   0:00 [selfstop] <defunct>
    

    That indicates that selfstop has exited but hasn't yet been waited for by its parent. It will remain a zombie until sudo is either resumed or killed.

    How can you work around this? sudo and selfstop will be in the same process group (unless selfstop does something to change that). So you could send SIGCONT to sudo's process group, which will resume both processes, by doing kill -CONT -the-pid-of-sudo (note the minus sign before the pid to denote a pgrp).