Search code examples
pythonmultiprocessingpython-multiprocessing

How to avoid parent from waiting for child in python multiprocessing?


I'm fairly new to python multiprocessing, and I am trying to write a class that is able to asynchronously execute functions and attached callbacks.

First of all, let's settle a common nomenclature for this specific problem:

┬─ process1 (parent)
└─┬─ process2 (child of parent1)
  └─── process3 (child of parent2)

After documenting me a bit on the subject, and following this SO question, in order to do so, I've come up with the following code for the run method:

import multiprocessing


class AsyncProcess:
    def __init__(self, target, callback=None, args=(), kwargs={}):
        self._target = target
        self._callback = callback
        self._args = args
        self._kwargs = kwargs
        self._process = None

    def run(self):
        def wrapper():
            return_value = self._target(*self._args, **self._kwargs)

            if self._callback is not None:
                process = multiprocessing.Process(target=self._callback, args=(return_value,))
                process.start()
                multiprocessing.process._children.discard(process)

        self._process = multiprocessing.Process(target=wrapper)
        self._process.start()

The AsyncProcess class is bigger than this (it is intended to work as an adapter between multiprocessing.Process and subprocess.Popen for executing both external processes and python functions in new processes); that's why it is not a subclass of multiprocessing.Process and is instead just using it (in case anyone wondered).

What I'm trying to achieve here is to be able to launch a child process (process3) from within another process (process2) without it (process2) having to wait for the child process (process3) (since it may take way longer for the child to finish than for the parent). The daemon attribute of multiprocessing.Process is not useful since, when the parent process dies (process2), the child process (process3) is killed too (and I just want to leave it running until it finishes).

There is, however, two things I don't like at all about the solution I've come up with:

  1. I'm tinkering with the internals of multiprocessing, which I don't like a bit.
  2. Since I'm erasing the child process (process3) from the children pool of the parent (process2), I'm guessing that leaves the poor child orphan (which I don't really know what exactly implies but most certainly is no good practice at all).

The thing here is how can I achieve the parent not waiting for the children without actually killing the children nor falling into orphan processes? Is there any other, more correct or more elegant way of achieving what I'm trying to do?

I was thinking about assigning the child process (process3) to the parent of the process that spawned this child process (process1), i.e. the grandparent of the child process (which I know will be alive for sure), but I haven't found a way to actually do that.


Some clarifications:

  1. Popen does what I want to achieve, but it only does it with external processes, i.e. I cannot execute a python function with all the context by using Popen (that I know, of course).
  2. Using os.fork had come to mind, but I find the way of distinguishing parent vs. child code a bit cumbersome (handling PID == 0 and PID != 0 cases, etc.).
  3. I haven't thought about any solutions using the threading package since I wanted to manage processes, not threads, and leave thread management to the OS.
  4. Starting process3 from process1 directly solves the problem of orphan processes, but then I have to do an active polling on process1 in order to know when process2 finishes, which is actually not an option (process1 manages a server that cannot be blocked).
  5. I need process2 to finish as soon as possible in order to get some data from it; that's why I'm not executing the content of process3 directly inside process2.

Something I came up with while writing the question:

Since my problem is having to launch process3 from within process2 and launching it from process1 solves the problem but active polling on process1 is not an option, I could also launch process2 and process3 from within process1 at the same time, passing the process2 object to process3 and perform the active polling on process3 with a small interval to ensure quick response after process2 finished.

Does this have any sense or is it an over complicated solution for something that is already solved (and that I don't know of)?


Solution

  • What you want to do (ignoring the question of whether it is a good idea) is not possible with the multiprocessing library without the kind of tinkering with the internals that you want to avoid, precisely because the multiprocessing library is designed specifically for child processes that do not outlive the parent.

    I think the answer is indeed to use subprocess.Popen(), although that will mean foregoing the nice high-level API of the multiprocessing library. No, you can't execute a Python function directly but you can create a separate script.py to call the function you want.