numpymachine-learningrandomrandom-seednumpy-random# Correctly seeding numpy random generator

For my scientific experiments, I usually seed using:

`rng = np.random.Generator(np.random.PCG64(seed))`

which for the current numpy version is equivalent to

`rng = np.random.Generator(np.random.default_rng(seed))`

As I repeat my experiments `n`

times and average their results, I usually set the `seed`

to all the numbers between `0`

and `n`

.

However, reading the documentations here and here it states that

Seeds should be large positive integers.

or

We default to using a 128-bit integer using entropy gathered from the OS. This is a good amount of entropy to initialize all of the generators that we have in numpy. We do not recommend using small seeds below 32 bits for general use.

However, in the second reference, it also states

There will not be anything wrong with the results, per se; even a seed of 0 is perfectly fine thanks to the processing that SeedSequence does.

This feels contradictory and I wonder, if small seeds are now totally fine to use, or one should move towards higher seeds. Especially, I wonder, (i) at which point (if any) would a large seed make a difference to a low seed and (ii) if one does scientific experiments (e.g. machine learning / algorithmic research) should one prefer higher to lower seeds or should it not make a difference?

PS: This question is highly related to Random number seed in numpy but concerns the now recommended Generator. Furthermore, the answer seems not in-depth enough as it does not include a discussion about high and low seeds.

Solution

The justification is in the quick start page which you linked:

We recommend using very large, unique numbers to ensure that your seed is different from anyone else’s. This is good practice to ensure that your results are statistically independent from theirs unless you are intentionally trying to reproduce their result.

In short, this is to avoid reproducing someone else's bias (if any) by generating the exact same dataset, since humans are more likely to pick short numbers by default (`0`

, `11`

, `42`

) rather than very large ones.

In your use case this is probably not important.

- Normal Equation Implementation in Python / Numpy
- Custom transformer for sklearn Pipeline that alters both X and y
- Rewrite for-loop vectorized using Numpy
- Using numpy in AWS Lambda
- How to calculate rolling / moving average using python + NumPy / SciPy?
- AttributeError: 'Timedelta' object has no attribute 'dt'
- I need a highly accurate simultaneous equation solver for Python
- Pandas: change between mean/std and plus/minus notations
- Simple tensorflow/keras model with one hot vector output gives error message
- What does axis = 0 do in Numpy's sum function?
- Speedup DataFrame resampling
- numpy.gradient gives unexpected results
- Loading .txt data into 10x256 3d numpy array
- Replacing values in a numpy array by averaging adjacent values
- NumPy matrix with two entries in the same cell is not working
- How to save and read the EXACT same data with pandas?
- Convert c-order index into f-order index in Python
- Max in a sliding window in NumPy array
- How to fill numpy array between two specific values
- Numpy multiply each slice of a 3D array for its transpose and sum them
- How can I extract the duration and offset from a numpy array representing audio?
- Are Python C extensions faster than Numba JIT?
- sklearn can't find lapack in new conda environment
- JAX jax.grad on simple function that takes an array: `ConcretizationTypeError: Abstract tracer value encountered where concrete value is expected`
- Is it possible to flatten an array with two different major orders without using for loop?
- How can I modify my function to accommodate a 3D array instead of a 2D array
- How to find index of Nth element in multidimentional numpy array?
- Pythonic optimization of per-pixel image processing
- Selecting specific rows and columns from NumPy array
- Numpy slicing with bound checks