I observe a really weird behaviour when using pool.map
to call a method function.
With only one process the behaviour is not the same as a simple for loop and we enter several times in the if not self.seeded:
block whereas we should not.
Here is the codes and outputs below :
import os
from multiprocessing import Pool
class MyClass(object):
def __init__(self):
self.seeded = False
print("Constructor of MyClass called")
def f(self, i):
print("f called with", i)
if not self.seeded:
print("PID : {}, id(self.seeded) : {}, self.seeded : {}".format(os.getpid(), id(self.seeded), self.seeded))
self.seeded = True
def multi_call_pool_map(self):
with Pool(processes=1) as pool:
print("multi_call_pool_map with {} processes...".format(pool._processes))
pool.map(self.f, range(10))
def multi_call_for_loop(self):
print("multi_call_for_loop ...")
list_res = []
for i in range(10):
list_res.append(self.f(i))
if __name__ == "__main__":
MyClass().multi_call_pool_map()
outputs :
Constructor of MyClass called
multi_call_pool_map with 1 processes...
f called with 0
PID : 18248, id(self.seeded) : 1864747472, self.seeded : False
f called with 1
f called with 2
f called with 3
PID : 18248, id(self.seeded) : 1864747472, self.seeded : False
f called with 4
f called with 5
f called with 6
PID : 18248, id(self.seeded) : 1864747472, self.seeded : False
f called with 7
f called with 8
f called with 9
PID : 18248, id(self.seeded) : 1864747472, self.seeded : False
And with the for loop :
if __name__ == "__main__":
MyClass().multi_call_for_loop()
outputs :
Constructor of MyClass called
multi_call_for_loop ...
f called with 0
PID : 15840, id(self.seeded) : 1864747472, self.seeded : False
f called with 1
f called with 2
f called with 3
f called with 4
f called with 5
f called with 6
f called with 7
f called with 8
f called with 9
How can we explain the behaviour with pool.map (first case) ? I don't understand why we enter multiple times inside the if block because self.seeded
is set to False
only in the constructor and the constructor is called only once...
(I have Python 3.6.8)
when running the code and also printing self
inside f
, we can see that before each time we enter the if
clause, the instance actually changes:
def f(self, i):
print("f called with", i, "self is",self)
if not self.seeded:
print("PID : {}, id(self.seeded) : {}, self.seeded : {}".format(os.getpid(), id(self.seeded), self.seeded))
self.seeded = True
this outputs:
Constructor of MyClass called
multi_call_pool_map with 1 processes...
f called with 0 self is <__main__.MyClass object at 0x7f30cd592b38>
PID : 22879, id(self.seeded) : 10744096, self.seeded : False
f called with 1 self is <__main__.MyClass object at 0x7f30cd592b38>
f called with 2 self is <__main__.MyClass object at 0x7f30cd592b38>
f called with 3 self is <__main__.MyClass object at 0x7f30cd592b00>
PID : 22879, id(self.seeded) : 10744096, self.seeded : False
f called with 4 self is <__main__.MyClass object at 0x7f30cd592b00>
f called with 5 self is <__main__.MyClass object at 0x7f30cd592b00>
f called with 6 self is <__main__.MyClass object at 0x7f30cd592ac8>
PID : 22879, id(self.seeded) : 10744096, self.seeded : False
f called with 7 self is <__main__.MyClass object at 0x7f30cd592ac8>
f called with 8 self is <__main__.MyClass object at 0x7f30cd592ac8>
f called with 9 self is <__main__.MyClass object at 0x7f30cd592a90>
PID : 22879, id(self.seeded) : 10744096, self.seeded : False
if you add chunksize=10
to .map()
it will behave just like the for loop:
def multi_call_pool_map(self):
with Pool(processes=1) as pool:
print("multi_call_pool_map with {} processes...".format(pool._processes))
pool.map(self.f, range(10), chunksize=10)
this outputs:
Constructor of MyClass called
multi_call_pool_map with 1 processes...
f called with 0 self is <__main__.MyClass object at 0x7fd175093b00>
PID : 22972, id(self.seeded) : 10744096, self.seeded : False
f called with 1 self is <__main__.MyClass object at 0x7fd175093b00>
f called with 2 self is <__main__.MyClass object at 0x7fd175093b00>
f called with 3 self is <__main__.MyClass object at 0x7fd175093b00>
f called with 4 self is <__main__.MyClass object at 0x7fd175093b00>
f called with 5 self is <__main__.MyClass object at 0x7fd175093b00>
f called with 6 self is <__main__.MyClass object at 0x7fd175093b00>
f called with 7 self is <__main__.MyClass object at 0x7fd175093b00>
f called with 8 self is <__main__.MyClass object at 0x7fd175093b00>
f called with 9 self is <__main__.MyClass object at 0x7fd175093b00>
exactly why this happens is a very elaborate implementation detail and has to do with how multiprocessing
shares data between processes in the same pool.
I'm afraid I'm not qualified enough to answer exactly how and why this works internally.