Does setting a specific random seed (random_state
) when splitting train/test datasets using scikit-learn produce the same initialization of the random number generator (i.e., produces same pseudo-random numbers) over different platforms - for instance, over different cloud computing instances?
Thanks!
As long as random_state
is equal on all platforms and they are all running the same versions of numpy, you should get the exact same splits.
Since random_state
is a numpy instance, I think all of scikit-learn's pseudo-random number generators are frozen because numpy froze RandomState
.
You can check the documentation for random_state
here, which as you can see is numpy.random.RandomState
. You can check numpy's compatibility guarantee here.