Search code examples
pythonpython-3.xmultithreading

Python: Can multiprocessing.Pool() be used in modules other than '__main__'?


I have the following simple case of using a thread pool:

import multiprocessing as mp

def double(i):
    return i * 2

def main():
    pool = mp.Pool()
    for result in pool.map(double, [1, 2, 3]):
        print(result)

main()

For over a day I've been trying to figure out why running that code would cause freezing and errors. Then in my previous question about the crash, someone pointed out the code needs to be filtered behind a __name__ == '__main__' check.

if __name__ == '__main__':
    main()

In fact if I put the original code in init.py which I execute from the console, it works even without this if statement. But there's a problem: I want to use multiprocessing.Pool() inside a class in another script which init.py imports with import myotherscript then creates an instance of a class from: In this case __name__ is no longer '__main__' but the name of this other script, even if I call a function in it from init.py anything defined in this other script will has a different name.

Is there a way to use the multithreading pool inside a module? Or does Python only allow pools to run in the init file being executed? If it's a hard limitation I'll need to define my library directly under init.py but that would be wrong and ugly code structuring so hopefully there exists a workaround.


Solution

  • Yes, it's fine to use multiprocessing from modules. Just don't run code in the body of the module (outside of any function) unless you want it to run on the startup of each worker in the pool. In particular, don't run the code that creates a new pool there.

    Normally, in a module, you don't do that in the first place. It's only in the main file that you usually have that problem.

    But it does push a similar requirement up to the user of your module. Say your module defines a class Cluster and does self.pool = mp.Pool() in __init__, and then someone writes a program that looks like

    #!python
    import yourmodule
    
    c = yourmodule.Cluster()
    c.compute("something")
    

    then that program will explode. Instead they have to write

    #!python
    import yourmodule
    
    if __name__ == '__main__':
        c = yourmodule.Cluster()
        c.compute("something")
    

    which will avoid calling the Cluster constructor on worker startup, and therefore avoid trying to create pools within pools.