Search code examples
pythonmultiprocessingjupyterspyderpool

Why does mulitprocessing.Pool run but never terminate?


I'm trying to use mulitprocessing.Pool to speed up the execution of a function across a range of inputs. The processes seem to have been called, since my task manager indicates a substantial increase in my CPU's utilization, but the task never terminates. No exceptions are ever raised, runtime or otherwise.

from multiprocessing import Pool

def f(x):
    print(x)
    return x**2

class Klass:
    def __init__(self):
        pass

    def foo(self):
        X = list(range(1, 1000))
        with Pool(15) as p:
            result = p.map(f, X)

if __name__ == "__main__":
    obj = Klass()
    obj.foo()
    print("All Done!")

Interestingly, despite the uptick in CPU utilization, print(x) never prints anything to the console.

I have moved the function f outside of the class as was suggested here, to no avail. I have tried adding p.close() and p.join() as well with no success. Using other Pool class methods like imap lead to TypeError: can't pickle _thread.lock objects errors and seems to take a step away from the example usage in the introduction of the Python Multiprocessing Documentation.

Adding to the confusion, if I try running the code above enough times (killing the hung kernel after each attempt) the code begins consistently working as expected. It usually takes about twenty attempts before this "clicks" into place. Restarting my IDE reverts the now functional code back to the former broken state. For reference, I am running using the Anaconda Python Distribution (Python 3.7) with the Spyder IDE on Windows 10. My CPU has 16 cores, so the Pool(15) is not calling for more processes than I have CPU cores. However, running the code with a different IDE, like Jupyter Lab, yields the same broken results.

Others have suggested that this may be a flaw with Spyder itself, but the suggestion to use mulitprocessing.Pool instead of mulitprocessing.Process doesn't seem to work either.


Solution

  • Could be related to this from python doc:

    Note Functionality within this package requires that the main module be importable by the children. This is covered in Programming guidelines however it is worth pointing out here. This means that some examples, such as the multiprocessing.pool.Pool examples will not work in the interactive interpreter.

    and then this comment on their example:

    If you try this it will actually output three full tracebacks interleaved in a semi-random fashion, and then you may have to stop the master process somehow.

    UPDATE: The info found here seems to confirm that using the pool from an interactive interpreter will have varying success. This guidance is also shared...

    ...guidance [is] to always use functions/classes whose definitions are importable.

    This is the solution outlined here and which works for me (every time) using your code.