Search code examples
python-3.xparallel-processingpython-multiprocessingprogram-entry-pointpool

How to run Multiprocessing Pool code in Python without mentioning __main__ environment


When I run Python's multiprocessing Pool with main environment, I get the expected output i.e. time is reduced due to parallel processing.

But when I run the same code without main enviroment, it just throws error

from multiprocessing import Pool
import os
import time

def get_acord_detected_json(page_no):
    time.sleep(5)
    return page_no*page_no

def main():
    n_processes = 2
    page_num_list = [1,2,3,4,5]
    print("n_processes : ", n_processes)
    print("page_num_list : ", page_num_list)
    print("CPU count : ", os.cpu_count())
    t1 = time.time()
    with Pool(processes=n_processes) as pool:
        acord_js_list = pool.map(get_acord_detected_json, page_num_list)

    print("acord_js_list : ", acord_js_list)
    t2 = time.time()
    print("t2-t1 : ", t2-t1)

if __name__=="__main__":
    main()

Output :

n_processes :  2
page_num_list :  [1, 2, 3, 4, 5]
CPU count :  8
acord_js_list :  [1, 4, 9, 16, 25]
t2-t1 :  15.423236846923828

But when I do

main()

instead of

if __name__=="__main__":
    main()

I get non-stopping error logs(crash logs)


Solution

  • When Python launches a secondary multiprocessing.Process, it imports the script again in each Process's space. (I use Windows and this is always true, but according to the documentation it is not necessarily the case on other OS's.) This even applies to process Pools, which is what you are using.

    During these extra imports, the __name__ local variable is something other than __main__. So you can separate code that you want to run in every child Process from code that you want to run only once. That is the purpose of the if __name__ == "__main__" statement.

    When you run your script with this statement omitted, the module is loaded again and again, in every one of the child processes. Every process tries to run the function main(), which then tries to launch more child processes, which tries to launch more, and so on. The whole thing crashes, as you observe. With the line of code present, the main() function runs only once, in the main Process where it works properly, launching all the other Processes.

    I'm afraid you are stuck writing that line of code. Life is full of unpleasant necessities. But it's probably less disruptive than switching everything to another operating system.

    See also python multiprocessing on windows, if __name__ == "__main__"

    Unfortunately, the standard python library docs present this issue as a commandment that must be obeyed rather than an explanation that can be understood (in my opinion).