Search code examples
pythonparallel-processingmultiprocessingmulticore

Poor division of labor using multiprocessing in python?


Consider the following program:

import Queue
from multiprocessing import Queue as Q
from multiprocessing import Process, Value, Array
from time import time
from numpy import sum, zeros
from numpy.random import normal

def enche(queue, cnts, tims, t0, proc):
    while True:
        try:
            queue.get(False)
            x = sum(normal(size = 1000))
            cnts[proc] = cnts[proc] + 1
        except Queue.Empty:
            tims[proc] = time() - t0.value
            break

def main(m, nth):
    queue = Q()
    tmp = [queue.put(0) for i in xrange(m)]
    cnts = Array('i', zeros(nth, dtype = "int"))
    tims = Array('d', zeros(nth))
    t0  = Value('d', time())
    procs = [ Process(target = enche , args = (queue,cnts, tims, t0,i)) for i in xrange(nth) ]
    for p in procs:
        p.start()
    for p in procs:
        p.join()
    for k in xrange(nth):
        print "%2.0f/100, %4.1f ms,"%(cnts[k]/float(m) * 100, tims[k]*1000), 
    print "" 


if __name__ == "__main__":
    for k in xrange(100):
        main(100, 4)

In essence, the function main(m,nth) should do a task - sum a thousand normal random numbers m times - and divide this task in nth processes.

I then count and print what percentage of the task was done by each process and how long each process lasted. This the typical results I get:

 4/100,  1.6 ms,  3/100,  2.2 ms, 88/100, 10.9 ms,  5/100,  2.7 ms, 
23/100,  3.6 ms,  1/100,  1.5 ms,  3/100,  2.0 ms, 73/100, 14.9 ms, 
 8/100,  3.9 ms,  1/100,  3.4 ms,  8/100,  5.3 ms, 83/100, 21.0 ms, 
 6/100,  4.1 ms,  0/100,  3.2 ms, 76/100, 19.0 ms, 18/100,  8.8 ms, 
 5/100,  1.7 ms, 90/100,  9.2 ms,  5/100,  2.4 ms,  0/100,  2.6 ms, 
 2/100,  1.5 ms,  8/100,  2.4 ms, 89/100,  8.9 ms,  1/100,  2.6 ms, 
 8/100,  2.3 ms, 11/100,  2.7 ms,  3/100,  2.4 ms, 78/100,  8.5 ms, 
52/100,  5.8 ms, 11/100,  3.2 ms,  4/100,  2.6 ms, 33/100,  5.9 ms, 
 6/100,  2.3 ms,  8/100,  2.9 ms, 24/100,  4.3 ms, 62/100,  7.7 ms, 
13/100,  3.0 ms, 11/100,  2.6 ms,  1/100,  2.5 ms, 75/100,  8.5 ms, 
 6/100,  2.1 ms,  3/100,  2.1 ms,  6/100,  2.7 ms, 85/100,  9.0 ms, 
 8/100,  2.3 ms,  3/100,  2.0 ms, 27/100,  4.4 ms, 62/100,  7.6 ms, 
18/100,  3.2 ms,  0/100,  1.9 ms,  6/100,  3.1 ms, 76/100,  9.2 ms, 
 3/100,  1.9 ms, 19/100,  3.4 ms, 30/100,  6.0 ms, 48/100,  7.9 ms, 
10/100,  2.4 ms,  0/100,  1.5 ms,  8/100,  2.8 ms, 82/100,  8.8 ms, 
 6/100,  1.8 ms, 36/100,  4.7 ms, 32/100,  4.7 ms, 26/100,  4.7 ms, 
 8/100,  2.0 ms, 46/100,  5.9 ms, 46/100,  5.9 ms,  0/100,  2.6 ms, 
 3/100,  1.6 ms, 54/100,  6.2 ms,  1/100,  2.4 ms, 42/100,  6.1 ms, 
12/100,  2.6 ms,  8/100,  2.4 ms, 34/100,  5.3 ms, 46/100,  6.3 ms, 
 1/100,  1.4 ms,  9/100,  2.3 ms,  1/100,  2.3 ms, 89/100,  9.2 ms, 
13/100,  2.6 ms,  5/100,  2.2 ms,  3/100,  2.4 ms, 79/100,  8.7 ms, 
11/100,  2.3 ms,  7/100,  2.6 ms,  8/100,  2.9 ms, 74/100,  8.8 ms, 
 8/100,  1.8 ms,  9/100,  2.7 ms,  0/100,  1.8 ms, 83/100,  9.3 ms, 
13/100,  2.4 ms,  4/100,  2.0 ms,  3/100,  2.5 ms, 80/100,  9.9 ms, 
10/100,  2.3 ms, 72/100,  8.0 ms,  6/100,  2.6 ms, 12/100,  4.1 ms, 
27/100,  4.3 ms,  0/100,  1.5 ms,  5/100,  2.5 ms, 68/100,  8.4 ms, 
 3/100,  1.5 ms,  8/100,  2.3 ms, 86/100,  8.3 ms,  3/100,  2.7 ms, 
 7/100,  1.9 ms, 10/100,  2.5 ms,  5/100,  2.5 ms, 78/100,  8.2 ms, 
14/100,  2.5 ms,  3/100,  2.0 ms,  5/100,  2.5 ms, 78/100,  8.2 ms, 
53/100,  7.4 ms,  3/100,  1.9 ms,  1/100,  2.4 ms, 43/100,  7.5 ms, 
 3/100,  1.7 ms,  9/100,  2.5 ms, 47/100,  6.0 ms, 41/100,  6.0 ms, 
 3/100,  1.5 ms, 37/100,  4.7 ms, 33/100,  4.7 ms, 27/100,  4.7 ms, 
 7/100,  1.8 ms, 36/100,  4.5 ms, 32/100,  4.5 ms, 25/100,  4.5 ms, 
 3/100,  1.4 ms, 51/100,  5.6 ms, 46/100,  5.6 ms,  0/100,  2.3 ms, 
 5/100,  1.8 ms,  9/100,  2.5 ms, 46/100,  5.8 ms, 40/100,  5.9 ms, 
 9/100,  2.0 ms,  3/100,  1.8 ms,  5/100,  2.3 ms, 83/100,  8.8 ms, 
 7/100,  2.1 ms,  3/100,  2.1 ms,  8/100,  2.9 ms, 82/100,  8.6 ms, 
11/100,  2.3 ms,  4/100,  2.0 ms,  3/100,  2.5 ms, 82/100,  9.3 ms, 
19/100,  2.9 ms,  9/100,  2.6 ms, 10/100,  3.0 ms, 62/100,  7.3 ms, 
18/100,  2.8 ms, 16/100,  2.9 ms,  2/100,  2.5 ms, 64/100,  7.9 ms, 
 2/100,  1.5 ms, 51/100,  5.8 ms, 47/100,  5.9 ms,  0/100,  2.4 ms, 
26/100,  4.0 ms,  8/100,  2.3 ms, 11/100,  2.9 ms, 55/100,  7.6 ms, 
 8/100,  2.1 ms,  3/100,  1.9 ms, 79/100,  9.2 ms, 10/100,  3.8 ms, 
14/100,  2.5 ms,  4/100,  2.1 ms, 45/100,  5.6 ms, 37/100,  5.6 ms, 
11/100,  2.2 ms,  9/100,  2.8 ms,  4/100,  2.4 ms, 76/100,  8.9 ms, 
 3/100,  1.4 ms,  4/100,  2.1 ms,  5/100,  2.4 ms, 88/100,  9.5 ms, 
 3/100,  1.4 ms,  7/100,  2.4 ms, 49/100,  5.9 ms, 41/100,  5.9 ms, 
 3/100,  1.4 ms,  3/100,  1.8 ms, 92/100,  9.3 ms,  2/100,  2.6 ms, 
 9/100,  2.3 ms, 16/100,  3.1 ms, 41/100,  5.5 ms, 34/100,  5.5 ms, 
 5/100,  1.7 ms,  8/100,  2.3 ms, 47/100,  5.8 ms, 40/100,  5.7 ms, 
 6/100,  1.7 ms,  5/100,  2.2 ms,  5/100,  2.4 ms, 84/100,  8.6 ms, 
 7/100,  2.0 ms,  6/100,  2.2 ms, 47/100,  5.9 ms, 40/100,  5.8 ms, 
 5/100,  1.9 ms, 32/100,  5.1 ms,  6/100,  2.7 ms, 57/100,  7.7 ms, 
11/100,  2.4 ms, 14/100,  2.7 ms, 42/100,  5.3 ms, 33/100,  5.3 ms, 
50/100,  6.6 ms, 12/100,  2.8 ms,  0/100,  1.9 ms, 38/100,  6.6 ms, 
55/100,  7.0 ms,  1/100,  1.7 ms,  2/100,  2.2 ms, 42/100,  6.9 ms, 
 8/100,  1.9 ms, 48/100,  6.7 ms,  5/100,  2.4 ms, 39/100,  6.7 ms, 
12/100,  2.3 ms, 18/100,  3.0 ms, 30/100,  5.6 ms, 40/100,  6.8 ms, 
 5/100,  1.7 ms,  3/100,  1.9 ms,  5/100,  2.5 ms, 87/100,  8.8 ms, 
50/100,  6.7 ms, 13/100,  2.7 ms,  0/100,  2.0 ms, 37/100,  6.7 ms, 

Why do I get such a poor division of labor? Why sometimes a process last for short times and do nothing and others do almost 90% of the job?

What should I do to improve this?

edit: extra info - I run this in a i7 CPU.


Solution

  • The task it does in parallel is rather small compared to process start overhead, so that one process can do many iterations before other processes get a chance to start running.

    If the line:

    x = sum(normal(size = 1000))
    

    is changed to:

    x = sum(normal(size = 100000))
    

    This is what I get:

    25/100, 366.2 ms, 25/100, 364.4 ms, 25/100, 364.1 ms, 25/100, 361.7 ms, 
    34/100, 362.1 ms,  0/100,  2.6 ms, 25/100, 357.6 ms, 41/100, 357.8 ms, 
    25/100, 365.2 ms, 26/100, 362.7 ms, 25/100, 366.0 ms, 24/100, 355.7 ms, 
    21/100, 360.3 ms, 21/100, 355.9 ms, 22/100, 362.3 ms, 36/100, 358.0 ms,