Python call a multiprocessing function from another multiprocessing function.

Consider the following multiprocessing python program:

import multiprocessing as mp
from math import sqrt

def funcA(mylist):
    worker_poolA = mp.Pool()
    jobs = mylist
    results = worker_poolA.map(sqrt, jobs)
    worker_poolA.close()
    worker_poolA.join()
    return results

def funcB():
    worker_poolB = mp.Pool()
    jobs = [[0, 1, 4, 9, 16, 25],[25, 4, 16, 0,1]]
    finalresults = worker_poolB.map(funcA, jobs)
    worker_poolB.close()
    worker_poolB.join()
    print finalresults

def funcC():
    jobs = [[0, 1, 4, 9, 16, 25],[25, 4, 16, 0,1]]
    for job in jobs:
        print funcA(job)

if __name__ == "__main__":
    funcC()  #works fine
    funcB()  #ERROR: AssertionError: daemonic processes are not allowed to have children

In order to overcome this issue, I then Subclassed the multiprocesing.pool.Pool module and set the daemon flag to False as suggested in this post . But now it is resulting in creation of zombie process. What is the right way to call a pool function from another pool function? Is this design flawed? I am using python 2.6.

Solution

Simple answer: multiprocessing does not allow you to do this, as it would orphan the inner processes as daemons, just as the error says. You can work around it (I'm referring to older versions of processing, which allowed this, and were forked in my package pathos). However, if you work around it, you then have to manually go kill the daemon processes, so it's really not worth it.

Typically, you will have the case where you want a nested map, and the lower level job is "heavy" while the higher level job is "light"… or something like that. One job is the "meat" of the work, and the other is just to distribute the underlying jobs. For this case, you can use two different types of pool.

For example, threads and processes. I'm using my fork (called multiprocess), just because it's easier to use in the interpreter -- but otherwise the same as multiprocessing in this particular case.

>>> import multiprocess as mp 
>>> from math import sqrt
>>> 
>>> def funcA(mylist):
...     worker_poolA = mp.Pool()
...     jobs = mylist
...     results = worker_poolA.map(sqrt, jobs)
...     worker_poolA.close()
...     worker_poolA.join()
...     return results
... 
>>> def funcB():
...     worker_poolB = mp.dummy.Pool()
...     jobs = [[0, 1, 4, 9, 16, 25],[25, 4, 16, 0,1]]
...     finalresults = worker_poolB.map(funcA, jobs)
...     worker_poolB.close()
...     worker_poolB.join()
...     print finalresults
... 
>>> def funcC():
...     jobs = [[0, 1, 4, 9, 16, 25],[25, 4, 16, 0,1]]
...     for job in jobs:
...         print funcA(job)
... 
>>> funcC()
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0]
[5.0, 2.0, 4.0, 0.0, 1.0]
>>> funcB()
[[0.0, 1.0, 2.0, 3.0, 4.0, 5.0], [5.0, 2.0, 4.0, 0.0, 1.0]]