Weird behaviour with numpy and multiprocessing.process

Sorry for the long code, I have tried to make it as simple as possible and yet reproducible.

In short, this python script starts four processes that randomly distribute numbers into lists. Then, the result is added to a multiprocessing.Queue().

import random
import multiprocessing
import numpy
import sys

def work(subarray, queue):
    result = [numpy.array([], dtype=numpy.uint64) for i in range (0, 4)]

    for element in numpy.nditer(subarray):
        index = random.randint(0, 3)
        result[index] = numpy.append(result[index], element)

    queue.put(result)
    print "after the queue.put"

jobs = []
queue = multiprocessing.Queue()

subarray = numpy.array_split(numpy.arange(1, 10001, dtype=numpy.uint64), 4)

for i in range(0, 4):
    process = multiprocessing.Process(target=work, args=(subarray[i], queue))
    jobs.append(process)
    process.start()

for j in jobs:
    j.join()

print "the end"

All processes ran the print "after the queue.put" line. However, it doesn't get to the print "the end" line. Weird enough, if I change the arange from 10001 to 1001, it gets to the end. What is happening?

Solution

most of the child processes are blocking on put call. multiprocessing queue put

block if necessary until a free slot is available.

this can be avoided by adding a call to queue.get() before join.

Also, in multiprocessing code please isolate the parent process by having:

if __name__ == '__main__':
    # main code here

Compulsory usage of if name==“main” in windows while using multiprocessing