Let's have a tiny little program that is supposed to capture (and ignore) SIGTERM signal:
# nosigterm.py:
import signal
import time
def ignore(signum, frame):
print("Ignoring signal {}".format(signum))
if __name__ == '__main__':
signal.signal(signal.SIGINT, ignore)
signal.signal(signal.SIGTERM, ignore)
while True:
time.sleep(2)
print("... in loop ...")
When executed from another python script as a subprocess, sending SIGTERM terminates this subprocess, which I find strange:
# parent_script.py:
import signal
import subprocess
import sys
args = [sys.executable, "nosigterm.py"]
prog = subprocess.Popen(args)
assert prog.poll() is None
prog.send_signal(signal.SIGTERM)
print("prog.poll(): {}".format(prog.poll()))
assert prog.poll() is None, "Program unexpectedly terminated after SIGTERM"
The output is:
$ python3 parent_script.py
prog.poll(): None
Traceback (most recent call last):
File "parent_script.py", line 13, in <module>
assert prog.poll() is None, "Program unexpectedly terminated after SIGTERM"
AssertionError: Program unexpectedly terminated after SIGTERM
Would you have any idea why it is so?
Note that if nosigterm.py
is executed as a standalone python script (python3 nosigterm.py
) and SIGTERM sent by the system kill
command (in another terminal), it behaves as it should:
$ python3 nosigterm.py
... in loop ...
... in loop ...
Ignoring signal 15
... in loop ...
... in loop ...
... in loop ...
I have tried three python versions (2.7, 3.6 and 3.7) and two Linux operating systems (CentOS 7 and Debian 9), all with the same results. If I replace nosigterm.py
by a binary application written in C that captures SIGTERM (via sigaction()
), the behavior is still unchanged, so it must be somehow related to the parent python process.
Also note that Popen parameters restore_signals=True/False
or preexec_fn=os.setsid/os.setpgrp
did not make any change, either.
I'd appreciate if anyone could help me understand this. Thank you.
This is a race condition.
You are forking and immediately sending the signal, so it's a race for the child process to ignore it before it gets killed.
Furthermore, your parent script has a race condition in checking whether the script has died. You signal the script and immediately check if it's dead, so it's a race for the child to die before the check happens.
If you add a time.sleep(1)
before sending the signal, you'll make sure the child wins the race and therefore you get the behavior you expect.