Search code examples
pythonmultiprocessingcpu-speed

Why does per-process overhead constantly increase for multiprocessing?


I was counting for a 6 core CPU with 12 logical CPUs in a for-loop till really high numbers several times.

To speed things up i was using multiprocessing. I was expecting something like:

  • Number of processes <= number of CPUs = time identical
  • number of processes + 1 = number of CPUs = time doubled

What i was finding was a continuous increase in time. I'm confused.

the code was:

#!/usr/bin/python

from multiprocessing import Process, Queue
import random
from timeit import default_timer as timer

def rand_val():
    num = []
    for i in range(200000000):
        num = random.random()
    print('done')

def main():

    for iii in range(15):
        processes = [Process(target=rand_val) for _ in range(iii)]
        start = timer()
        for p in processes:
            p.start()

        for p in processes:
            p.join()

        end = timer()
        print(f'elapsed time: {end - start}')
        print('for ' + str(iii))
        print('')

if __name__ == "__main__":
    main()
    print('done')

result:

  • elapsed time: 14.9477102 for 1
  • elapsed time: 15.4961154 for 2
  • elapsed time: 16.9633134 for 3
  • elapsed time: 18.723183399999996 for 4
  • elapsed time: 21.568377299999995 for 5
  • elapsed time: 24.126758499999994 for 6
  • elapsed time: 29.142095499999996 for 7
  • elapsed time: 33.175509300000016 for 8

. . .

  • elapsed time: 44.629786800000005 for 11
  • elapsed time: 46.22480710000002 for 12
  • elapsed time: 50.44349420000003 for 13
  • elapsed time: 54.61919949999998 for 14

Solution

  • There are two wrong assumptions you make:

    1. Processes are not free. Merely adding processes adds overhead to the program.
    2. Processes do not own CPUs. A CPU interleaves execution of several processes.

    The first point is why you see some overhead even though there are less processes than CPUs. Note that your system usually has several background processes running, so the point of "less processes than CPUs" is not clearcut for a single application.

    The second point is why you see the execution time increase gradually when there are more processes than CPUs. Any OS running mainline Python does preemptive multitasking of processes; roughly, this means a process does not block a CPU until it is done, but is paused regularly so that other processes can run.
    In effect, this means that several processes can run on one CPU at once. Since the CPU can still only do a fixed amount of work per time, all processes take longer to complete.