Python 3 Multiprocessing Pool

I'm learning to use pool with multiprocessing. I did this script as an exercise.

Can anyone tell me why using a normal for loop took less time than using a pool?

P.S: My CPU has 2 cores.

Thank you very much.

from multiprocessing import Pool
from functools import reduce
import time

def one(n):
    a = n*n
    return a 

if __name__ == '__main__':
    l = list(range(1000))

    p = Pool()
    t = time.time()
    pol = p.map(one, l)
    result = reduce(lambda x,y: x+y, pol)
    print("Using Pool the result is: ", result, "Time: ", time.time() - t )
    p.close()
    p.join()

    def two(n):
        t = time.time()
        p_result = [] 

        for i in n:
            a = i*i 
            p_result.append(a)

        result = reduce(lambda x,y: x+y, p_result)
        print("Not using Pool the result is: ", result, "Time: ", time.time() - t)

    two(l)

Using Pool the result is: 332833500 Time: 0.14810872077941895

Not using Pool the result is: 332833500 Time: 0.0005018711090087891

Solution

I think there are several reasons at play here, but I would guess that it largely has to do with the overhead of running multiple processes, which mostly has to do with synchronization and communication, as well as the fact that your non-parallelized code is written a bit more efficiently.

As a basis, here is how your unmodified code runs on my computer:

('Using Pool the result is: ', 332833500, 'Time: ', 0.0009129047393798828)
('Not using Pool the result is: ', 332833500, 'Time: ', 0.000598907470703125)

First of all, I would like to try to level the playing field by making the code of the two() function nearly identical to the parallelized code. Here is the modified two() function:

def two(l):
    t = time.time()

    p_result = map(one, l)

    result = reduce(lambda x,y: x+y, p_result)
    print("Not using Pool the result is: ", result, "Time: ", time.time() - t)

Now, this does not actually make a whole lot of difference in this case, but it will be important in a second to see that both cases are doing the exact same thing. Here is a sample output with this change:

('Using Pool the result is: ', 332833500, 'Time: ', 0.0009338855743408203)
('Not using Pool the result is: ', 332833500, 'Time: ', 0.0006031990051269531)

What I would like to illustrate now is that since the one() function is so computationally cheap, the overhead of the inter-process communication is outweighing the benefit of running it in parallel. I will modify the one() function as follows to force it to do a bunch of extra computation. Note that because of the changes to the two() function, this change will affect both the parallel and the single-threaded code.

def one(n):
    for i in range(100000):
        a = n*n
    return a

The reason for the for loop is to give each process a reason for existence. As you have your original code, each process simply does several multiplications, and then has to send the list of results back to the parent process, and wait to be given a new chunk. It takes much longer to send and wait than it does to complete a single chunk. By adding these extra cycles, it forces each chunk to take longer, without changing the time needed for inter-process communication, and so we begin to see the parallelism pay off. Here are my results when I run the code with this change to the one() function:

('Using Pool the result is: ', 332833500, 'Time: ', 1.861448049545288)
('Not using Pool the result is: ', 332833500, 'Time: ', 3.444211959838867)

So there you have it. All you need is to give your child processes a bit more work, and they will be more worth your while.