Search code examples
pythonmultiprocessingpool

Multiprocessing Pool() method has no effect on performance


I am using Python 3.9.2 on Linux/Debian testing, on a multiprocessor machine. I am trying to understand how multiprocessing works.

I wrote two simple scripts that perform two exponential functions, one without multiprocessing and the other with.

This is the one without multiprocessing:

from timeit import default_timer as timer


def sqr(n):

    a = n ** n

    return a


def sqr_2(m):

    b = m ** m

    return b


def main():

    start = timer()
    
    print(f'sqr = {sqr(100000)}\nsqr_2= {sqr_2(200000)}')
    
    end = timer()


    print(f'time frame in which the operation is resolved: {end - start} seconds')


if __name__ == '__main__':
    main()

and this is the script using multiprocessing:

from multiprocessing import Pool, cpu_count
from timeit import default_timer as timer


def sqr_1(n):

    return n ** n


def sqr_2(m):

    return m ** m


def main():

    cpu_cnt = cpu_count()
    pool = Pool(processes = cpu_cnt)     #In this case there are 12 processors

    start = timer()
    
    val_1 = (100000,)
    val_2 = (200000,)
    
    process_1 = pool.map_async(sqr_1, val_1)
    process_2 = pool.map_async(sqr_2, val_2)
    
    print(f'Results: {process_1.get(), process_2.get()}')

    end = timer()

    print(f'time frame in which the operation is resolved: {end - start} seconds')


if __name__ == '__main__':
    main()

The problem is that the process of the second script, which finished without any error, performed the same task as the first script in the same amount of time (around 14 seconds). So the multiprocessing, in the second script, does not work. I thank in advance anyone who would like to point out that is the error!


Solution

  • Consider the following script. It lets you choose at runtime how many times to call the function, and whether to do so serially or in parallel. It also just computes the value; it does not try to write a string representation to standard output (since converting the result of n**n to a string is far more time-consuming for large n than actually calculating it).

    from multiprocessing import Pool, cpu_count
    from timeit import default_timer as timer
    import sys
    
    
    def f(n):
        return n ** n
    
    
    def main():
        cpu_cnt = cpu_count()
        n = int(sys.argv[2])
        start = timer()
        if sys.argv[1] == "s":
            s = [f(100000) for _ in range(n)]
        else:
            pool = Pool(processes = cpu_cnt)
            s = [pool.map_async(f, (100000,)) for _ in range(n)]
            results = [x.get() for x in s]
        end = timer()
        print(f'time frame in which the operation is resolved: {end - start} seconds')
    
    
    if __name__ == '__main__':
        main()
    

    Here are the results for 2, 6, 12, 24, 48, 96, and 192 function calls on my 4-core machine:

    % for n in 2 6 12 24 48 96 192; do print $n; for x in s p; do python3 tmp.py $x $n; done; done
    2
    time frame in which the operation is resolved: 0.146144435 seconds
    time frame in which the operation is resolved: 0.178840965 seconds
    6
    time frame in which the operation is resolved: 0.423103791 seconds
    time frame in which the operation is resolved: 0.24940852500000002 seconds
    12
    time frame in which the operation is resolved: 0.848754817 seconds
    time frame in which the operation is resolved: 0.340022419 seconds
    24
    time frame in which the operation is resolved: 1.691312521 seconds
    time frame in which the operation is resolved: 0.571664972 seconds
    48
    time frame in which the operation is resolved: 3.415401498 seconds
    time frame in which the operation is resolved: 1.029526396 seconds
    96
    time frame in which the operation is resolved: 6.76773454 seconds
    time frame in which the operation is resolved: 2.016387216 seconds
    192
    time frame in which the operation is resolved: 13.529949021999998 seconds
    time frame in which the operation is resolved: 3.770171452 seconds
    

    With only 2 parallel processes, there's no speed-up, due to the overhead of the parallelization itself. (In fact, there is a slow-down.) Once you start running more processes, the speed-up increases, though for n cores you will never quite see a speed-up of n.