Search code examples
pythonlinuxsubprocesssignalsposix

Why Python subprocesses won't properly capture signals?


Let's have a tiny little program that is supposed to capture (and ignore) SIGTERM signal:

# nosigterm.py:

import signal
import time

def ignore(signum, frame):
    print("Ignoring signal {}".format(signum))


if __name__ == '__main__':
  signal.signal(signal.SIGINT, ignore)
  signal.signal(signal.SIGTERM, ignore)

  while True:
    time.sleep(2)
    print("... in loop ...")

When executed from another python script as a subprocess, sending SIGTERM terminates this subprocess, which I find strange:

# parent_script.py:

import signal
import subprocess
import sys

args = [sys.executable, "nosigterm.py"]
prog = subprocess.Popen(args)
assert prog.poll() is None

prog.send_signal(signal.SIGTERM)
print("prog.poll(): {}".format(prog.poll()))
assert prog.poll() is None, "Program unexpectedly terminated after SIGTERM"

The output is:

$ python3 parent_script.py 
prog.poll(): None
Traceback (most recent call last):
  File "parent_script.py", line 13, in <module>
    assert prog.poll() is None, "Program unexpectedly terminated after SIGTERM"
AssertionError: Program unexpectedly terminated after SIGTERM

Would you have any idea why it is so?

Note that if nosigterm.py is executed as a standalone python script (python3 nosigterm.py) and SIGTERM sent by the system kill command (in another terminal), it behaves as it should:

$ python3 nosigterm.py 
... in loop ...
... in loop ...
Ignoring signal 15
... in loop ...
... in loop ...
... in loop ...

I have tried three python versions (2.7, 3.6 and 3.7) and two Linux operating systems (CentOS 7 and Debian 9), all with the same results. If I replace nosigterm.py by a binary application written in C that captures SIGTERM (via sigaction()), the behavior is still unchanged, so it must be somehow related to the parent python process.

Also note that Popen parameters restore_signals=True/False or preexec_fn=os.setsid/os.setpgrp did not make any change, either.

I'd appreciate if anyone could help me understand this. Thank you.


Solution

  • This is a race condition.

    You are forking and immediately sending the signal, so it's a race for the child process to ignore it before it gets killed.

    Furthermore, your parent script has a race condition in checking whether the script has died. You signal the script and immediately check if it's dead, so it's a race for the child to die before the check happens.

    If you add a time.sleep(1) before sending the signal, you'll make sure the child wins the race and therefore you get the behavior you expect.