Search code examples
numpypython-3.10

Python numpy set random seed, thread-safe


I am trying to set a seed to each call of my function which does some numpy stuff, which I will be running in parallel. From the numpy documentation (https://numpy.org/doc/stable/reference/random/parallel.html) it is said to use SeedSequence or default_rng (and not to use random.seed or random.RandomState as some older answers suggest, as these are not thread safe) however this same exact code from the documentation does not work for me, even when running iteratively.

from numpy.random import default_rng, normal

def worker(root_seed, worker_id):
    rng = default_rng([worker_id, root_seed])
    print(normal())

root_seed = 0x8c3c010cb4754c905776bdac5ee7501
results = [worker(root_seed, worker_id) for worker_id in range(5)]

Running it twice I get different results. Why?


Solution

  • The np.random.normal call inside worker uses the default generator initialized on startup.

    For reproducibility, you want to use the Generator object returned by default_rng instead - simply constructing a generator does not set the random state globally.

    from numpy.random import default_rng
    
    def worker(root_seed, worker_id):
        rng = default_rng([worker_id, root_seed])
        print(rng.normal())
    
    root_seed = 0x8c3c010cb4754c905776bdac5ee7501
    results = [worker(root_seed, worker_id) for worker_id in range(5)]