python arrays numpy random scientific-computing

difference between numpy randint and floor of rand

num_draws = int(1e6)
arr1 = np.random.randint(0, 10, num_draws)
arr2 = np.floor(10*np.random.rand(num_draws))

Could someone with expertise in the internals of numpy.random comment on whether arr2 obeys formally equivalent statistics to arr1? In experiments I have done, the distributions appear to have the same first few moments, but that's all I have checked thus far.

Solution

Yes, they're equivalent [1]

Looking at the source code, they both are defined in auxiliary functions (1,2) that reference underlying C calls based on the data size (1, 2)- these both make a call to the same underlying function.

That underlying function is a 32-bit Mersenne Twister. Everything on top of this call is shifts and masking to coerce the right data type, but it doesn't change the underlying behaviour of the randomness.

Footnotes

[1] I assume you're not asking whether your method of flooring the number has unexpected statistical side-effects. That doesn't depend on numpy, but since they both use the same uniform statistical device, they should have the same bias. I would not expect them to have the same performance