When I run Python's multiprocessing Pool with main environment, I get the expected output i.e. time is reduced due to parallel processing.
But when I run the same code without main enviroment, it just throws error
from multiprocessing import Pool
import os
import time
def get_acord_detected_json(page_no):
time.sleep(5)
return page_no*page_no
def main():
n_processes = 2
page_num_list = [1,2,3,4,5]
print("n_processes : ", n_processes)
print("page_num_list : ", page_num_list)
print("CPU count : ", os.cpu_count())
t1 = time.time()
with Pool(processes=n_processes) as pool:
acord_js_list = pool.map(get_acord_detected_json, page_num_list)
print("acord_js_list : ", acord_js_list)
t2 = time.time()
print("t2-t1 : ", t2-t1)
if __name__=="__main__":
main()
Output :
n_processes : 2
page_num_list : [1, 2, 3, 4, 5]
CPU count : 8
acord_js_list : [1, 4, 9, 16, 25]
t2-t1 : 15.423236846923828
But when I do
main()
instead of
if __name__=="__main__":
main()
I get non-stopping error logs(crash logs)
When Python launches a secondary multiprocessing.Process, it imports the script again in each Process's space. (I use Windows and this is always true, but according to the documentation it is not necessarily the case on other OS's.) This even applies to process Pools, which is what you are using.
During these extra imports, the __name__
local variable is something other than __main__
. So you can separate code that you want to run in every child Process from code that you want to run only once. That is the purpose of the if __name__ == "__main__"
statement.
When you run your script with this statement omitted, the module is loaded again and again, in every one of the child processes. Every process tries to run the function main(), which then tries to launch more child processes, which tries to launch more, and so on. The whole thing crashes, as you observe. With the line of code present, the main() function runs only once, in the main Process where it works properly, launching all the other Processes.
I'm afraid you are stuck writing that line of code. Life is full of unpleasant necessities. But it's probably less disruptive than switching everything to another operating system.
See also python multiprocessing on windows, if __name__ == "__main__"
Unfortunately, the standard python library docs present this issue as a commandment that must be obeyed rather than an explanation that can be understood (in my opinion).