Search code examples
ramazon-web-servicesrandom

R random numbers almost the same but not identical


Running the same version of R, one on a Linux R server and one on AWS, the RNG is almost the same, but not always identical. Out of 1 million samples from the uniform, gamma, and Normal distributions respectively:

  • runif() produces identical results.
  • rgamma() produces 7 small differences; otherwise identical results.
  • rnorm() also produces 7 small differences; otherwise identical results.

By small differences, I mean something like 1.4510448921274106 vs 1.4510448921274115.

What would be causing these differences? If a floating point issue, why only some distributions? If an OS/library/software issue, why only different on rare occasions?


Solution

  • runif() is not implemented in floating point; it's doing integer arithmetic internally (at least for the default Mersenne-Twister algorithm, and probably for all the available algorithms). You can see the code for Mersenne Twister here; in particular, you can see that the result is only converted to double-precision floating point at the very end (line 725). So it is not subject to cross-platform/cross-compiler floating-point artifacts.

    As for "why [are the others] only different on rare occasions?"; I assume that the rgamma() and rnorm() implementations are relatively numerically stable, so that the possibilities for differential floating-point/roundoff error are rare — especially if the differences have to do only with the use of 80-bit registers for intermediate computations (vs the default 64-bit precision).