Search code examples
numpyrandommultiprocessing

generating random numbers with numpy using multiprocessing


I want to generate random numbers in Numpy using Multiprocessing. Using this this answer I wrote the code at the end of this message, and it seems to work. Because answer I linked to is quite old, I want to make sure that it is still correct. Numpy's official documentation seems to say that np.random.seed is deprecated, but I can't seem to find good documentation about the Generator instance it recommends, and it seems more complicated.

import numpy as np
from multiprocessing import Pool

def generate_random(iproc):
    # if then next line is commented then each process produces same random numbers
    np.random.seed()
    nums = np.random.uniform(size=8)
    print(f'{iproc=},{nums=}')
    

if __name__ == '__main__':
    nproc = 8
    arglist = []
    for iproc in range(nproc):
        arglist.append((iproc,))
        
    with Pool(nproc) as p:
        p.starmap(generate_random, arglist)

Solution

  • To avoid sharing a global random state object you could seed the subprocesses with different seeds. Here is a minimal example of how to achieve this with Numpy's generator API:

    import numpy as np
    from multiprocessing import Pool
    
    
    def generate_random(idx_proc, seed):
        random_state = np.random.RandomState(seed)
        nums = random_state.uniform(size=8)
        print(f"{idx_proc=}, {nums=}")
    
    
    if __name__ == "__main__":
        n_proc = 8
    
        random_state = np.random.RandomState(480273)
    
        seeds = random_state.randint(0, 2**32-1, n_proc)
    
        with Pool(n_proc) as p:
            p.starmap(generate_random, enumerate(seeds))
    

    By using a random state object in the main process and generating the seeds from it you will get reproducible results (however keep in mind that the order the processes run in and compute the numbers might not be reproducible!).

    I hope this helps!