Search code examples
pythonpython-3.xmultiprocessingpool

Understanding a simple multiprocessing script


I am trying to understand how Python's multiprocessing module work. To do so, a made a very simple version of the code I'm working on and tried to make it work in parallel. According to what I read, using a pool would be better suited to my program than using mp.Process.

Below is what I came up with:

import time, os
import multiprocessing as mp

class Foo:
    def __init__(self, ID):
        self.ID = ID

    def showID(self):
        for k in range(0,4):
            print('Foo #', self.ID, '\tID:', os.getpid(), '\tParent ID:', os.getppid())
            time.sleep(0.2)

# MAIN
if __name__ == '__main__':

    print('parent process:', os.getppid())
    print('process id:', os.getpid())
    print(' ')

    foos = [Foo(2), Foo(3)]

    pool = mp.Pool(processes=2)

    # Code below doesn't work
    pool.apply_async(foos[0].showID, ())
    pool.apply_async(foos[1].showID, ())

The list foos will eventually contain between 10 and 20 objects. The method Foo.showID will also eventually return something. My goal would be to send as many tasks (foos members) when it's time for them to run so they can be dispatched to one of the pool's processes.

If I run the code above, nothing happens, ie. only parent process and process id at the beginning are displayed. If I replace the two last lines by:

pool.apply_async(foos[0].showID())
pool.apply_async(foos[1].showID())

both of them are executed in the main process one after the other:

parent process: 3380
process id: 6556

Foo # 2         ID: 6556        Parent ID: 3380
Foo # 2         ID: 6556        Parent ID: 3380
Foo # 2         ID: 6556        Parent ID: 3380
Foo # 2         ID: 6556        Parent ID: 3380
Foo # 3         ID: 6556        Parent ID: 3380
Foo # 3         ID: 6556        Parent ID: 3380
Foo # 3         ID: 6556        Parent ID: 3380
Foo # 3         ID: 6556        Parent ID: 3380

Finally, if I replace them with something like this:

pool.apply_async(foos[0].showID, ())
pool.apply_async(foos[1].showID())

I get the expected behavior (I think):

parent process: 3380
process id: 4772

Foo # 3         ID: 4772        Parent ID: 3380
Foo # 2         ID: 6364        Parent ID: 4772
Foo # 3         ID: 4772        Parent ID: 3380
Foo # 2         ID: 6364        Parent ID: 4772
Foo # 3         ID: 4772        Parent ID: 3380
Foo # 2         ID: 6364        Parent ID: 4772
Foo # 3         ID: 4772        Parent ID: 3380
Foo # 2         ID: 6364        Parent ID: 4772

What is happening here? I noticed the same behavior if I try to use a function that is not defined inside the Foo class.


Solution

  • apply_async receives a function

    When you use foos[0].showID without the parenthesis you're passing the function, and not calling it, but when doing

    pool.apply_async(foos[0].showID())
    

    You're first evaluating foos[0].showID(), then passing its return value as an argument to apply_async. The one that ends up doing the evaluation is the caller of apply_async and that's synchronous processing.

    It's equivalent to doing:

    foos[0].showID()
    pool.apply_async()
    foos[1].showID()
    pool.apply_async()
    

    Your first try fails because you're not waiting for the async calls to execute. After calling.

    pool.apply_async(foos[0].showID, ())
    pool.apply_async(foos[1].showID, ())
    

    Your program quits so you don't wait for the output.

    Finally

    pool.apply_async(foos[0].showID, ())
    pool.apply_async(foos[1].showID())
    

    Is equivalent to:

    pool.apply_async(foos[0].showID, ())
    foos[1].showID()
    

    Makes one async call and one sync call, so it sort of works.