Search code examples
pythonpython-2.7python-multiprocessingpool

Unexpected behavior using Multiprocessing.Pool inside for loop


Here's my code:

import multiprocessing as mp
import numpy as np

def foo(p):
    global i
    return p*i

global lower, upper
lower = 1
upper = 4

for i in range(lower, upper):
    if __name__ == '__main__':
        dataset = np.linspace(1, 100, 100)
        agents = mp.cpu_count() - 1
        chunksize = 5
        pool = mp.Pool(processes=agents)
        result = pool.map(foo, dataset, chunksize)
        print result
        print i
        pool.close()
        pool.join()

The console prints out the array [3, 6, 9,...,300] three times with the integers 1,2,3 in-between each array printout. So i is correctly iterating between lower & upper (not inclusive), but I expected it to print out the array [1, 2, 3,...,100] first followed by [2, 4, 6,...,200] and finally [3, 6, 9,...,300]. I don't understand why it's only passing the final value of i to foo and then mapping that thrice.


Solution

  • When you run the new process, this is what it sees:

    import multiprocessing as mp
    import numpy as np
    
    def foo(p):
        global i
        return p*i
    
    global lower, upper
    lower = 1
    upper = 4
    
    for i in range(lower, upper):
        if __name__ == '__main__':
            # This part is not run, as
            # in a different process,
            # __name__ is set to '__mp_main__'
    # i is now `upper - 1`, call `foo(p)` with the provided `p`
    

    And after executing that, it is told to run foo (It has to run the whole script again to find out what foo is, just because of how pickling it works)

    So, after it runs that, i will be upper - 1, and it will return p * 3 always.

    You want to make i a parameter given to foo, or some multiprocessing specific memory sharing object, as descibed here