Search code examples
pythonpython-multiprocessing

python multiprocessing map function


I encountered a problem while writing the python code with a multiprocessing map function. The minimum code to reproduce the problem is like

import multiprocessing as mp

if __name__ == '__main__':

    def f(x):
        return x*x

    num_workers = 2
    with mp.Pool(num_workers) as p:
        print(p.map(f, [1,2,3]))

If one runs this piece of code, I got the error message

AttributeError: Can't get attribute 'f' on <module '__mp_main__' from 'main.py'>

However, If I move f-function outside the main function, i.e.

import multiprocessing as mp

def f(x):
    return x*x

if __name__ == '__main__':

    num_workers = 2
    with mp.Pool(num_workers) as p:
        print(p.map(f, [1,2,3]))

It works this time. I am wondering what's the difference between them and how can I get an error in the first version. Thanks in advance.


Solution

  • Depending on your operating system, sub-processes will either be forked or spawned. macOS, for example, will spawn whereas Windows will fork.

    You can enforce forking but you need to fully understand the implications of doing so.

    For this specific question a workaround could be implemented thus:

    import multiprocessing as mp
    from multiprocessing import set_start_method
    
    if __name__ == '__main__':
        def f(x):
            return x*x
        set_start_method('fork')
        num_workers = 2
        with mp.Pool(num_workers) as p:
            print(p.map(f, [1,2,3]))