I am trying to set a seed to each call of my function which does some numpy stuff, which I will be running in parallel. From the numpy documentation (https://numpy.org/doc/stable/reference/random/parallel.html) it is said to use SeedSequence or default_rng (and not to use random.seed or random.RandomState as some older answers suggest, as these are not thread safe) however this same exact code from the documentation does not work for me, even when running iteratively.
from numpy.random import default_rng, normal
def worker(root_seed, worker_id):
rng = default_rng([worker_id, root_seed])
print(normal())
root_seed = 0x8c3c010cb4754c905776bdac5ee7501
results = [worker(root_seed, worker_id) for worker_id in range(5)]
Running it twice I get different results. Why?
The np.random.normal
call inside worker
uses the default generator initialized on startup.
For reproducibility, you want to use the Generator
object returned by default_rng
instead - simply constructing a generator does not set the random state globally.
from numpy.random import default_rng
def worker(root_seed, worker_id):
rng = default_rng([worker_id, root_seed])
print(rng.normal())
root_seed = 0x8c3c010cb4754c905776bdac5ee7501
results = [worker(root_seed, worker_id) for worker_id in range(5)]