Search code examples
pythonmultiprocessing

ProcessPoolExecutor does not mutate instance variable when submitting instance method


Given an instance method that mutates an instance variable, running this method in the ProcessPoolExecutor does run the method but does not mutate the instance variable.

from concurrent.futures import ProcessPoolExecutor


class A:
    def __init__(self):
        self.started = False

    def method(self):
        print("Started...")
        self.started = True


if __name__ == "__main__":
    a = A()

    with ProcessPoolExecutor() as executor:
        executor.submit(a.method)

    assert a.started
Started...
Traceback (most recent call last):
  File "/path/to/file", line 19, in <module>
    assert a.started
AssertionError

Are only pure functions allowed in ProcessPoolExecutor?


Solution

  • For Windows

    Multiprocessing does not share it's state with the child processes on Windows systems. This is because the default way to start child processes on Windows is through spawn. From the documentation for method spawn

    The parent process starts a fresh python interpreter process. The child process will only inherit those resources necessary to run the process object’s run() method. In particular, unnecessary file descriptors and handles from the parent process will not be inherited. Starting a process using this method is rather slow compared to using fork or forkserver

    Therefore, when you pass any objects to child processes, they are actually copied, and do not have the same memory address as in the parent process. A simple way to demonstrate this through your example would be to print the objects in the child process and the parent process:

    from concurrent.futures import ProcessPoolExecutor
    
    
    class A:
        def __init__(self):
            self.started = False
    
        def method(self):
            print("Started...")
            print(f'Child proc: {self}')
            self.started = True
    
    
    if __name__ == "__main__":
        a = A()
        print(f'Parent proc: {a}')
        with ProcessPoolExecutor() as executor:
            executor.submit(a.method)
    

    Output

    Parent proc: <__main__.A object at 0x0000028F44B40FD0>
    Started...
    Child proc: <__mp_main__.A object at 0x0000019D2B8E64C0>
    

    As you can see, both objects reside at different places in the memory. Altering one would not affect the other whatsoever. This is the reason why you don't see any changes to a.started in the parent process.

    Once you understand this, your question then becomes then how to share the same object, rather than copies, to the child processes. There are numerous ways to go about this, and questions on how to share complex objects like a have already been asked and answered on stackoverflow.

    For UNIX

    The same could be said for other methods of starting new processes that UNIX based systems have the option of using (I am not sure the default for concurrent.futures on OSX). For example, from the documentation for multiprocessing, fork is explained as

    The parent process uses os.fork() to fork the Python interpreter. The child process, when it begins, is effectively identical to the parent process. All resources of the parent are inherited by the child process. Note that safely forking a multithreaded process is problematic.

    So fork creates child processes that share the entire memory space of the parent process on start. However, it uses copy-on-write to do so. What this means is that if you attempt to modify any object that is shared from within the child process, it will have to create a duplicate of that particular object as to not interrupt the parent process and localize that object to the child process (much like what spawn does on start).

    Hence the answer still stands: if you plan to modify the objects passed to the child process, or if you are not on UNIX systems, you will need to share the objects yourself to have them point to the same memory address

    Further reading on start methods.