import multiprocessing
import threading
counter = 1
print("Code outside __main__",counter)
lock = threading.Lock()
counter += 1
def foo(i):
#print("Inside foo ",i)
pass
if __name__ == '__main__':
pool = multiprocessing.Pool(processes=10)
pool.map(foo, range(100))
if you run this code from the terminal python run.py
it prints out
Code outside __main__ 1
Code outside __main__ 1
Code outside __main__ 1
Code outside __main__ 1
Code outside __main__ 1
Code outside __main__ 1
Code outside __main__ 1
Code outside __main__ 1
Code outside __main__ 1
and if you uncomment the print on foo() you see sometimes the Code outside __main__ 1
is in between the foo() calls.
Why is it doing that?
import multiprocessing
import threading
counter = 1
print("Code outside __main__",counter)
counter += 1
def foo(i):
global lock
with lock:
print("Inside foo ",i)
if __name__ == '__main__':
lock = threading.Lock()
pool = multiprocessing.Pool(processes=10)
pool.map(foo, range(100))
If I declare the lock inside the __main__
block, it's undefined inside foo() even if I use global lock
Here's the output
Code outside __main__ 1
Code outside __main__ 1
Code outside __main__ 1
Code outside __main__ 1
Code outside __main__ 1
Code outside __main__ 1
Code outside __main__ 1
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Users\David\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "C:\Users\David\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\pool.py", line 48, in mapstar
return list(map(*args))
File "C:\Users\David\Documents\test.py", line 10, in foo
with lock:
NameError: name 'lock' is not defined
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "test.py", line 16, in <module>
pool.map(foo, range(100))
File "C:\Users\David\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\Users\David\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\pool.py", line 771, in get
raise self._value
NameError: name 'lock' is not defined
Code outside __main__ 1
Code outside __main__ 1
This code is just a simplification, I'm trying to scrape a website and write to a file but I want to understand what's happening here.
sometimes the Code
outside __main__ 1
is in between the foo() calls.
Each child process in the 10 you create with multiprocessing.Pool
will import *
from the "main" file, which basically means executing the file. You will get this print for the main process, and then 10 for the children. Particularly with more and more children, some of the early birds may get around to processing inputs from the pool.map
call before others are done initializing, so this is why they can be interleaved. Also during this import, each process gets it's own version of the counter
variable, so it will always be 1
.
If I declare the lock inside the
__main__
block, it's undefined inside foo() even if I useglobal lock
foo
is getting executed in a totally separate memory space. global
can't automatically send objects to the memory of another process, and lock
won't exist in theirs because they won't execute anything inside the if
block (and they shouldn't). The children need to receive the lock as an argument and assign it to their own memory space. When using a Pool
certain things like locks, queues, etc.. can only be passed as arguments to the initializer function (normal Process
es don't have as many restrictions). You can then use the initializer to recieve the lock, and save it to the global space of the child's memory.
import multiprocessing as mp
from time import sleep
#mp.Lock is the same as threading.Lock, so save an import here
print("Code outside __main__")
def foo(i):
global my_lock
with my_lock:
sleep(1)#counting 1 at a time means lock is working to limit access to a resource
print("code inside foo ", i)
def init_worker(l):
global my_lock
my_lock = l
if __name__ == '__main__':
l = mp.Lock()
with mp.Pool(processes=10, initializer=init_worker, initargs=(l,)) as pool:
pool.map(foo, range(10))