python python-3.x multithreading multiprocessing python-multiprocessing

Python multiprocessing Pool API doesn't work efficiently when process count and worker count increased

I'm trying to apply multiprocessing to parallelize my code. I have around 2000 works to get done. Since it is not practical to create 2000 simultaneous processes, I'm using python multiprocessing.pool API to get the work parallelized while managing the task queue. I tried creating 100 workers. But it took hours to finish which is not a big gain compared to sequential implementation. My laptop has 12 logical cores. Then I experimented incrementing both workers and works simultaneously. Technically, then it should take the same time for each work to complete because I'm assigning only one work to one worker each time. But I experienced that the time for the complete process getting increased even if the work load per worked is not changed. Is this something wrong with the API? Or am I doing it wrong? Can some one provide me a possible solution for my 2000 works to parallely process in minimum time using Python?

P.S: I cannot use multi-threading due to code implementation issue.

My code

inputListLen = 13
workerCount = 13
regressList = regressList[:inputListLen] # regressList has 2000 items
with Pool(processes=workerCount) as pool:
    print(pool.map(runRegressWriteStatus, regressList))

Results

Input List Len  | workers   | Time(seconds) 
1               | 1         | 4.5  
2               | 2         | 4.9  
3               | 3         | 5.4  
4               | 4         | 5.6  
5               | 5         | 6.3  
6               | 6         | 7.2  
7               | 7         | 8.3  
8               | 8         | 9.6  
9               | 9         | 10.0 
10              | 10        | 10.7 
11              | 11        | 11.6 
12              | 12        | 11.8 
13              | 13        | 13.3

Solution

I think you are misunderstanding few things and assumptions of few things is not really accurate. As I mentioned here Python multiprocessing: dealing with 2000 processes that the number of processes you can actually run in parallel with multiprocessing are dependent and controlled by the number of cpu cores you have on the system. And it is the actual physical cores not the logical cores that you see with Hyperthreading enabled.

So 12 logical cores means 6 physical cores and 2 threads per core gives you 12 logical cores. So at any point of time your kernel sees 12 logical cores and it tries to schedule 12 process but system only has 6 physical cores so there is lot of context switching that happens to make it look like there are 12 cores but at any point of time there cannot be more than 6 processes in real time, because remember you have only 6 cores.

Secondly Pool works in a different way than Process, while with Process you can fire up parallel processes to do tasks which might be independent of each other.

Pool has a different purpose, with Pool object you are creating a pool of processes and then pass a big task/input to the it and then the pool divides this big task/input into smaller one and distribute it among the processes which can operate on the smaller input simultaneously.

Here is a very simple example of how you can use the pool.

import multiprocessing as mp
import time


def f(x):
    res = 0
    for i in range(x):
        res += i ** 6


if __name__ == '__main__':
    t1 = time.time()
    # MP Pool creates 4 concurrent processes and run the same function with diffrent args, to achieve the parallel computation
    po = mp.Pool(processes=4)
    res = po.map(f, range(5000))
    po.close()
    po.join()
    print('Parallel execution time taken = {}'.format(time.time() - t1))

    t2 = time.time()
    seq_res = list(map(f, range(5000)))
    print('Sequential execution time taken = {}'.format(time.time() - t2))

(py37) rbhanot@rbhanotlinux ~/home » python 7-1.mppool.py
Parallel execution time taken = 0.91422438621521
Sequential execution time taken = 2.9315543174743652
(py37) rbhanot@rbhanotlinux ~/home »

As you can see the parallel execution with pool took 3 times less time than sequential execution.

Now I have 8 logical cores but only 4 physical on my machine and at max my kernel can only schedule 4 processes at one time, so creating a pool of more than 4 processes won't make any difference, and here is a proof of that.

When run with pool of 7 processes

Parallel execution time taken = 0.9177846908569336

When run with pool of 12 processes

Parallel execution time taken =  0.9213907718658447

When run with pool of 2 processes

Parallel execution time taken = 1.712911605834961