Search code examples
pythonmacosunixbsd

Why would `killpg` return “not permitted” when ownership is correct?


I've got some code which fork()s, calls setsid() in the child, and starts some processing. If any of the children quit (waitpid(-1, 0)), I kill all the child process groups:

child_pids = []
for child_func in child_functions:
    pid = fork()
    if pid == 0:
        setsid()
        child_func()
        exit()
    else:
        child_pids.append(pid)

waitpid(-1, 0)
for child_pid in child_pids:
    try:
        killpg(child_pid, SIGTERM)
    except OSError as e:
        if e.errno != 3: # 3 == no such process
            print "Error killing %s: %s" %(child_pid, e)

However, occasionally the call to killpg will fail with “operation not permitted”:

Error killing 22841: [Errno 1] Operation not permitted

Why might this be happening?

A complete, working example:

from signal import SIGTERM
from sys import exit
from time import sleep
from os import *

def slow():
    fork()
    sleep(10)

def fast():
    sleep(1)

child_pids = []
for child_func in [fast, slow, slow, fast]:
    pid = fork()
    if pid == 0:
        setsid()
        child_func()
        exit(0)
    else:
        child_pids.append(pid)

waitpid(-1, 0)
for child_pid in child_pids:
    try:
        killpg(child_pid, SIGTERM)
    except OSError as e:
        print "Error killing %s: %s" %(child_pid, e)

Which yields:

$ python killpg.py
Error killing 23293: [Errno 3] No such process
Error killing 23296: [Errno 1] Operation not permitted

Solution

  • I added some debugging too (slightly modified source). It's happening when you try to kill a process group that's already exited, and in Zombie status. Oh, and it's easily repeatable just with [fast, fast].

    $ python so.py 
    spawned pgrp 6035
    spawned pgrp 6036
    Reaped pid: 6036, status: 0
     6035  6034  6035 Z    (Python)
     6034   521  6034 S+   python so.py
     6037  6034  6034 S+   sh -c ps -e -o pid,ppid,pgid,state,command | grep -i python
     6039  6037  6034 R+   grep -i python
    
    killing pg 6035
    Error killing 6035: [Errno 1] Operation not permitted
     6035  6034  6035 Z    (Python)
     6034   521  6034 S+   python so.py
     6040  6034  6034 S+   sh -c ps -e -o pid,ppid,pgid,state,command | grep -i python
     6042  6040  6034 S+   grep -i python
    
    killing pg 6036
    Error killing 6036: [Errno 3] No such process
    

    Not sure how to deal with that. Maybe you can put the waitpid in a while loop to reap all terminated child processes, and then proceed with pgkill()ing the rest.

    But the answer to your question is you're getting EPERMs because you're not allowed to killpg a zombie process group leader (at least on Mac OS).

    Also, this is verifiable outside python. If you put a sleep in there, find the pgrp of one of those zombies, and attempt to kill its process group, you also get EPERM:

    $ kill -TERM -6115
    -bash: kill: (-6115) - Operation not permitted
    

    Confirmed this also doesn't happen on Linux.