Search code examples
pythonpython-multiprocessing

Workaround for using __name__=='__main__' in Python multiprocessing


As we all know we need to protect the main() when running code with multiprocessing in Python using if __name__ == '__main__'.

I understand that this is necessary in some cases to give access to functions defined in the main but I do not understand why this is necessary in this case:

file2.py

import numpy as np
from multiprocessing import Pool
class Something(object):
    def get_image(self):
        return np.random.rand(64,64)

    def mp(self):
        image = self.get_image()
        p = Pool(2)
        res1 = p.apply_async(np.sum, (image,))
        res2 = p.apply_async(np.mean, (image,))
        print(res1.get())
        print(res2.get())
        p.close()
        p.join()

main.py

from file2 import Something
s = Something()
s.mp()

All of the functions or imports necessary for Something to work are part of file2.py. Why does the subprocess need to re-run the main.py?

I think the __name__ solution is not very nice as this prevents me from distribution the code of file2.py as I can't make sure they are protecting their main. Isn't there a workaround for Windows? How are packages solving that (as I never encountered any problem not protecting my main with any package - are they just not using multiprocessing?)

edit: I know that this is because of the fork() not implemented in Windows. I was just asking if there is a hack to let the interpreter start at file2.py instead of main.py as I can be sure that file2.py is self-sufficient


Solution

  • The main module is imported (but with __name__ != '__main__' because Windows is trying to simulate a forking-like behavior on a system that doesn't have forking). multiprocessing has no way to know that you didn't do anything important in you main module, so the import is done "just in case" to create an environment similar to the one in your main process. If it didn't do this, all sorts of stuff that happens by side-effect in main (e.g. imports, configuration calls with persistent side-effects, etc.) might not be properly performed in the child processes.

    As such, if they're not protecting their __main__, the code is not multiprocessing safe (nor is it unittest safe, import safe, etc.). The if __name__ == '__main__': protective wrapper should be part of all correct main modules. Go ahead and distribute it, with a note about requiring multiprocessing-safe main module protection.