Search code examples
pythonpython-3.xqueuemultiprocessingpool

the position between Pool object and Manager object in multiprocessing module


 1 from multiprocessing import Pool, Manager
 2 
 3 
 4 def test(num):
 5     queue.put(num)
 6 
 7 
 8 queue = Manager().Queue()
 9 pool = Pool(5)
 10 
 11 for i in range(30):
 12     pool.apply_async(test, (i, ))
 13     
 14 pool.close()
 15 pool.join()
 16 
 17 print(queue.qsize())

The output of the code above is 30. However, if line 8 is exchanged with line 9 (see the code below), the output will be 0. So is there anybody who knows why? Thank you!

1 from multiprocessing import Pool, Manager
2 
3 
4 def test(num):
5     queue.put(num)
6 
7 
8 pool = Pool(5)
9 queue = Manager().Queue()
10 
11 for i in range(30):
12     pool.apply_async(test, (i, ))
13     
14 pool.close()
15 pool.join()
16 
17 print(queue.qsize())

from multiprocessing import Process, Queue 


def test():
    queue.put(1)


p = Process(target=test) 
queue = Queue()
p.start()
p.join()

print(queue.qsize())

The output is 1, which means the child process put the number into the queue created by the parent. Is that correct?


Solution

  • I assume you are using a Unix based operating system as on NT ones your logic would most likely break.

    To understand what happens we need to dig into the multiprocessing internals. On Unix, when creating a new process, the fork primitive is used. When a process forks, the parent continues its execution and the child starts as an exact copy of the parent.

    Python tends to hide a lot of things in the multiprocessing module (I particularly dislike that) and leads to lots of misunderstandings. In your logic, the fork happens when you create the Pool (line 9 in the first example, 8 in the second).

    In the first example, the children inherits the same queue object the parent created. Therefore, they successfully manage to communicate as they share the same channel.

    In the second one instead, parent and children create their own separated queue objects which are totally independent. When a child puts an element in the queue it puts it in its own one which is not shared by anybody.

    In the third and last example you create a Process object, then a Queue and then you call start on the process. Guess when the fork happens? When you call start and not when you create the Process object. That's why the queue is successfully shared. This is what I mean when I say that multiprocessing APIs are a bit misleading.