Search code examples
pythonnumpymultiprocessingreturn

Python Multiprocessing: executing a function with randomness multiple times and getting identical results


In the example code below, I was trying to adapt the accepted answer in this thread. The goal is to use multi-processing to generate independent random normal numbers (in the example below I just want 3 random numbers). This is a baby version of any more complicated code where some random number generator is used in defining the trial function.

Example Code

import multiprocessing

def trial(procnum, return_dict):
    p = np.random.randn(1)
    num = procnum
    return_dict[procnum] = p, num

if __name__ == '__main__':
    manager = multiprocessing.Manager()
    return_dict = manager.dict()
    jobs = []
    for i in range(5):
        p = multiprocessing.Process(target=trial, args=(i,return_dict))
        jobs.append(p)
        p.start()

    for proc in jobs:
        proc.join()
    print(return_dict.values())

However, the output gives me the same random number every time, rather than an independent random number for each entry in return_dict.

Output

[(array([-1.08817286]), 0), (array([-1.08817286]), 1), (array([-1.08817286]), 2)]

I feel like this is a really silly mistake. Can someone expose my silliness please :)


Solution

  • It's not a silly mistake, and it has to do with the way numpy does the staging across cores. Read more here: https://discuss.pytorch.org/t/why-does-numpy-random-rand-produce-the-same-values-in-different-cores/12005

    But the solution is to give numpy a random seed from a large range:

    import multiprocessing
    import numpy as np
    import random
    
    def trial(procnum, return_dict):
        np.random.seed(random.randint(0,100000))
        p = np.random.randn()
        return_dict[procnum] = p
    
    if __name__ == '__main__':
        manager = multiprocessing.Manager()
        return_dict = manager.dict()
        jobs = []
        for i in range(3):
            p = multiprocessing.Process(target=trial, args=(i,return_dict))
            jobs.append(p)
            p.start()
    
        for proc in jobs:
            proc.join()
        print(return_dict.values())