python multiprocessing python-multiprocessing

Python multiprocessing is deacreasing performance

In order to do understand parallelization, I have written a toy script on the module multiprocessing. I plotted the following graph and the conclusion is that my implementation makes the performance worse by adding multiple processes. For example, my code creates a list a 10 000 elements, and then computes the sum of its elements in parallel. Each process has to do the sum of the elements in its own sublist.

With 1 process, it needs 0.78 seconds to compute the sum. With 2 processes, 1,29 sec and for 3 processes, 1,93 sec. See my graph :

Nb of processes VS computational time

Code :

import multiprocessing
import time
import math
import numpy as np

def sum_list(mylist,result,index_min,index_max):
    for i in range(index_min,index_max):
        result.value += mylist[i]

if __name__ == "__main__": 
    f = lambda x: np.sin(x)*np.cos(x)
    X = np.linspace(0,1,10000)
    mylist=f(X)  #creating the list to sum
    size = len(mylist)
    
    result = multiprocessing.Value('d')
    result.value = 0
    processes = []
    n = 10

    #creating processes
    start = time.perf_counter()
    for p in range(n):
        index_min = int(p*size/n)
        index_max = int((p+1)*size/n)
        print(index_min,index_max)
        processes.append( multiprocessing.Process(target = sum_list,args = (mylist,result,index_min,index_max)) ) 
    #starting processes
    for process in processes:
        process.start()
        
    #end processes
    for process in processes:
        process.join()
    end = time.perf_counter()
    print('time ellapsed :', end-start, 'seconds. ')

Obviously, since the size of the list is fixed, I would expect processes to have just a small sublist to sum, and since they do it at the same time, the total computational time should decrease.

I tried a lot of different number of processes, and two different hardware architectures but the problem is stil there. Am I missing something ? Is it because each time a process has to write in result, it has to communicate with the other processes, degrading the performance ?

Thanks for your anwsers.

Solution

"Python multiprocessing is deacreasing performance"

yes.

Any more questions?

(sorry for the joke)

Starting multiprocesses in itself costs a lot in terms of system-resources as compared to multi-threading or in-thread concurrency. Then, all data transferred too and from your external processes, even if the return values are encapsulated by your multiprocessing.Value() instance, involves some other resources. (typically serializing and de-serializing each data item passed around + synchronization code).

So, it is worth with "real workloads", or well designed demonstration workloads that will actually take advantage from system resources.

More specifically, your "work load" of result.value += mylist[i] is actually doing this under the hood: sending a signal to a manager process Multiprocessing spawns to coordinate shared-multiprocessing data, requesting its current result.value, and the sum operation itself is actually performed in this manager process (no matter how many worker process you do: in order to ensure the atomic nature of the += operator the sum is performed in a single process, and under a lock which prevents other processes from actually running in parallel. And you do that for all your values.

Change your worker-code to sum all its share at once, and increase the value at result.value just once per process, and you should get a great improvement. The sum operation is still simple enought that the multiprocessing overhead may still be larger than your gains, though.

Worth mentioning, though: the obvious gain is that multi-processes are able to run Python code in more than one CPU core simultaneously in a straightforward way - with the GIL (Global Interpreter Lock), that is not possible with Python versions prior to 3.12 (and hard to achieve, using Python Interpreter instances, in Python 3.12).

My recommendation is, since you are playing around this is to keep playing around until you get a solid feeling of the differences between using Threads, multiprocesses, and spawning 2 or 3 of them, or 20-30, and workloads with millions of items.

Another parameter for you to adjust and play around is to change the multiprocessing start method with multiprocessing.set_start_method - https://docs.python.org/3/library/multiprocessing.html#multiprocessing.set_start_method - the default for Mac OS and windows is "Spawn" - you should get better performance with your specific example by using "fork" (not avaliable on Windows)

Also,with Python 3.12 you can try using sub-interpreters - with Mac or Linux you can install the "extrainterpreters" package (warning: alpha software at this stage), or use import _xxsubinterpreters as interpreters (on any O.S.) and follow the docs at https://peps.python.org/pep-0554/

(disclaimer: I am the author of the extrainterpreters package)