I am trying to understand how Python's multiprocessing module work. To do so, a made a very simple version of the code I'm working on and tried to make it work in parallel. According to what I read, using a pool
would be better suited to my program than using mp.Process
.
Below is what I came up with:
import time, os
import multiprocessing as mp
class Foo:
def __init__(self, ID):
self.ID = ID
def showID(self):
for k in range(0,4):
print('Foo #', self.ID, '\tID:', os.getpid(), '\tParent ID:', os.getppid())
time.sleep(0.2)
# MAIN
if __name__ == '__main__':
print('parent process:', os.getppid())
print('process id:', os.getpid())
print(' ')
foos = [Foo(2), Foo(3)]
pool = mp.Pool(processes=2)
# Code below doesn't work
pool.apply_async(foos[0].showID, ())
pool.apply_async(foos[1].showID, ())
The list foos
will eventually contain between 10 and 20 objects. The method Foo.showID
will also eventually return something. My goal would be to send as many tasks (foos
members) when it's time for them to run so they can be dispatched to one of the pool
's processes.
If I run the code above, nothing happens, ie. only parent process
and process id
at the beginning are displayed. If I replace the two last lines by:
pool.apply_async(foos[0].showID())
pool.apply_async(foos[1].showID())
both of them are executed in the main process one after the other:
parent process: 3380
process id: 6556
Foo # 2 ID: 6556 Parent ID: 3380
Foo # 2 ID: 6556 Parent ID: 3380
Foo # 2 ID: 6556 Parent ID: 3380
Foo # 2 ID: 6556 Parent ID: 3380
Foo # 3 ID: 6556 Parent ID: 3380
Foo # 3 ID: 6556 Parent ID: 3380
Foo # 3 ID: 6556 Parent ID: 3380
Foo # 3 ID: 6556 Parent ID: 3380
Finally, if I replace them with something like this:
pool.apply_async(foos[0].showID, ())
pool.apply_async(foos[1].showID())
I get the expected behavior (I think):
parent process: 3380
process id: 4772
Foo # 3 ID: 4772 Parent ID: 3380
Foo # 2 ID: 6364 Parent ID: 4772
Foo # 3 ID: 4772 Parent ID: 3380
Foo # 2 ID: 6364 Parent ID: 4772
Foo # 3 ID: 4772 Parent ID: 3380
Foo # 2 ID: 6364 Parent ID: 4772
Foo # 3 ID: 4772 Parent ID: 3380
Foo # 2 ID: 6364 Parent ID: 4772
What is happening here? I noticed the same behavior if I try to use a function that is not defined inside the Foo
class.
apply_async receives a function
When you use foos[0].showID
without the parenthesis you're passing the function, and not calling it, but when doing
pool.apply_async(foos[0].showID())
You're first evaluating foos[0].showID()
, then passing its return value as an argument to apply_async
. The one that ends up doing the evaluation is the caller of apply_async
and that's synchronous processing.
It's equivalent to doing:
foos[0].showID()
pool.apply_async()
foos[1].showID()
pool.apply_async()
Your first try fails because you're not waiting for the async calls to execute. After calling.
pool.apply_async(foos[0].showID, ())
pool.apply_async(foos[1].showID, ())
Your program quits so you don't wait for the output.
Finally
pool.apply_async(foos[0].showID, ())
pool.apply_async(foos[1].showID())
Is equivalent to:
pool.apply_async(foos[0].showID, ())
foos[1].showID()
Makes one async call and one sync call, so it sort of works.