Search code examples
clinuxprocesspid

In Linux, how do you learn about newly acquired child processes before they exit?


I'm writing a program that calls prctl(PR_SET_CHILD_SUBREAPER, 1). So, it might acquire processes that it knows nothing about when their parents exit. I would like this process to exit when all of it's children have exited, including the ones it became the parent of because they would've been orphans otherwise.

How do I do this in a reasonable way from a C or C++ program? I don't want to have to be opening up everything in /proc all the time or calling out to a process that does that for me. That's a polling solution anyway, and I would prefer a solution that didn't involve polling.

Also, ptracing all of the children to find out when they call fork isn't really an option either. Just in case someone had that bright idea.


Solution

  • You don't need to do anything fancy with /proc. Just use wait(). From the man page:

    PR_SET_CHILD_SUBREAPER (since Linux 3.4)

    If arg2 is nonzero, set the "child subreaper" attribute of the calling process; if arg2 is zero, unset the attribute. When a process is marked as a child subreaper, all of the children that it creates, and their descendants, will be marked as having a subreaper. In effect, a subreaper fulfills the role of init(1) for its descendant processes. Upon termination of a process that is orphaned (i.e., its immediate parent has already terminated) and marked as having a subreaper, the nearest still living ancestor subreaper will receive a SIGCHLD signal and be able to wait(2) on the process to discover its termination status.

    So your process just needs sit in a loop calling wait until it returns -1 with errno being set to ECHILD. This will happen when all processes in the process tree have exited. You don't need to know when you acquire new child process to wait for as long as you do this.

    Suppose you have the following process tree with process 1000 being the reaper:

    1000
      |---1001
      |     |---1002
      |     |---1003
      |
      |---1004
            |---1005
            |---1006
    

    When you first call wait, both 1001 and 1004 need to exit before it returns -1. Now suppose 1004 exits:

    1000
      |---1001
      |     |---1002
      |     |---1003
      |
      |---1004 (dead)
      |-----|---1005
      |-----|---1006
    

    In the reaper, wait returns 1004. Now 1001, 1005, and 1006 need to exit. Next, 1002 exits:

    1000
      |---1001
      |     |---1002 (zombie)
      |     |---1003
      |
      |---1004 (dead)
      |-----|---1005
      |-----|---1006
    

    The reaper doesn't return from wait yet because 1001 is still running and can still wait for 1002. At this point, 1002 is a zombie. Next, 1001 calls wait:

    1000
      |---1001
      |     |---1002 (dead)
      |     |---1003
      |
      |---1004 (dead)
      |-----|---1005
      |-----|---1006
    

    No change to what the reaper is expecting since 1002 was waited for by 1001. Then 1005 exits:

    1000
      |---1001
      |     |---1002 (dead)
      |     |---1003
      |
      |---1004 (dead)
      |-----|---1005 (dead)
      |-----|---1006
    

    wait in the reaper returns 1005 and now needs 1001 and 1006 to exit. Then 1003 exits:

    1000
      |---1001
      |     |---1002 (dead)
      |     |---1003 (zombie)
      |
      |---1004 (dead)
      |-----|---1005 (dead)
      |-----|---1006
    

    Again, no change to what the reaper is expecting. Now 1001 exits:

    1000
      |---1001 (dead)
      |     |---1002 (dead)
      |-----|---1003 (dead)
      |
      |---1004 (dead)
      |-----|---1005 (dead)
      |-----|---1006
    

    Now wait in the reaper returns twice in a row, once returning 1001 and once returning 1003. Now the reaper is waiting for only 1006. Once that exits:

    1000
      |---1001 (dead)
      |     |---1002 (dead)
      |-----|---1003 (dead)
      |
      |---1004 (dead)
      |-----|---1005 (dead)
      |-----|---1006 (dead)
    

    wait in the reaper returns 1006, and the next call returns -1 ending the loop.