Search code examples
pythonmultiprocessingpool

Weird behaviour with multiprocessing Pool.map


I observe a really weird behaviour when using pool.map to call a method function. With only one process the behaviour is not the same as a simple for loop and we enter several times in the if not self.seeded: block whereas we should not. Here is the codes and outputs below :

import os
from multiprocessing import Pool


class MyClass(object):
    def __init__(self):
        self.seeded = False
        print("Constructor of MyClass called")

    def f(self, i):
        print("f called with", i)
        if not self.seeded:
            print("PID : {}, id(self.seeded) : {}, self.seeded : {}".format(os.getpid(), id(self.seeded), self.seeded))
            self.seeded = True

    def multi_call_pool_map(self):
        with Pool(processes=1) as pool:
            print("multi_call_pool_map with {} processes...".format(pool._processes))
            pool.map(self.f, range(10))

    def multi_call_for_loop(self):
        print("multi_call_for_loop ...")
        list_res = []
        for i in range(10):
            list_res.append(self.f(i))


if __name__ == "__main__":
    MyClass().multi_call_pool_map()

outputs :

Constructor of MyClass called
multi_call_pool_map with 1 processes...
f called with 0
PID : 18248, id(self.seeded) : 1864747472, self.seeded : False
f called with 1
f called with 2
f called with 3
PID : 18248, id(self.seeded) : 1864747472, self.seeded : False
f called with 4
f called with 5
f called with 6
PID : 18248, id(self.seeded) : 1864747472, self.seeded : False
f called with 7
f called with 8
f called with 9
PID : 18248, id(self.seeded) : 1864747472, self.seeded : False

And with the for loop :

if __name__ == "__main__":
    MyClass().multi_call_for_loop()

outputs :

Constructor of MyClass called
multi_call_for_loop ...
f called with 0
PID : 15840, id(self.seeded) : 1864747472, self.seeded : False
f called with 1
f called with 2
f called with 3
f called with 4
f called with 5
f called with 6
f called with 7
f called with 8
f called with 9

How can we explain the behaviour with pool.map (first case) ? I don't understand why we enter multiple times inside the if block because self.seeded is set to False only in the constructor and the constructor is called only once... (I have Python 3.6.8)


Solution

  • when running the code and also printing self inside f, we can see that before each time we enter the if clause, the instance actually changes:

        def f(self, i):
            print("f called with", i, "self is",self)
            if not self.seeded:
                print("PID : {}, id(self.seeded) : {}, self.seeded : {}".format(os.getpid(), id(self.seeded), self.seeded))
                self.seeded = True
    

    this outputs:

    Constructor of MyClass called
    multi_call_pool_map with 1 processes...
    f called with 0 self is <__main__.MyClass object at 0x7f30cd592b38>
    PID : 22879, id(self.seeded) : 10744096, self.seeded : False
    f called with 1 self is <__main__.MyClass object at 0x7f30cd592b38>
    f called with 2 self is <__main__.MyClass object at 0x7f30cd592b38>
    f called with 3 self is <__main__.MyClass object at 0x7f30cd592b00>
    PID : 22879, id(self.seeded) : 10744096, self.seeded : False
    f called with 4 self is <__main__.MyClass object at 0x7f30cd592b00>
    f called with 5 self is <__main__.MyClass object at 0x7f30cd592b00>
    f called with 6 self is <__main__.MyClass object at 0x7f30cd592ac8>
    PID : 22879, id(self.seeded) : 10744096, self.seeded : False
    f called with 7 self is <__main__.MyClass object at 0x7f30cd592ac8>
    f called with 8 self is <__main__.MyClass object at 0x7f30cd592ac8>
    f called with 9 self is <__main__.MyClass object at 0x7f30cd592a90>
    PID : 22879, id(self.seeded) : 10744096, self.seeded : False
    

    if you add chunksize=10 to .map() it will behave just like the for loop:

        def multi_call_pool_map(self):
            with Pool(processes=1) as pool:
                print("multi_call_pool_map with {} processes...".format(pool._processes))
                pool.map(self.f, range(10), chunksize=10)
    

    this outputs:

    Constructor of MyClass called
    multi_call_pool_map with 1 processes...
    f called with 0 self is <__main__.MyClass object at 0x7fd175093b00>
    PID : 22972, id(self.seeded) : 10744096, self.seeded : False
    f called with 1 self is <__main__.MyClass object at 0x7fd175093b00>
    f called with 2 self is <__main__.MyClass object at 0x7fd175093b00>
    f called with 3 self is <__main__.MyClass object at 0x7fd175093b00>
    f called with 4 self is <__main__.MyClass object at 0x7fd175093b00>
    f called with 5 self is <__main__.MyClass object at 0x7fd175093b00>
    f called with 6 self is <__main__.MyClass object at 0x7fd175093b00>
    f called with 7 self is <__main__.MyClass object at 0x7fd175093b00>
    f called with 8 self is <__main__.MyClass object at 0x7fd175093b00>
    f called with 9 self is <__main__.MyClass object at 0x7fd175093b00>
    

    exactly why this happens is a very elaborate implementation detail and has to do with how multiprocessing shares data between processes in the same pool.

    I'm afraid I'm not qualified enough to answer exactly how and why this works internally.