Search code examples
rrandomfloating-pointcross-platform

In R, how can I get PRNG to give identical floating point numbers between platforms?


Running the following code in R 4.1.1 gives different results between platforms.

set.seed(1)
x <- rnorm(3)[3]
print(x, 22)

# -0.83562861241004716      # intel windows
# -0.8356286124100471557341 # m1 mac

print(round(x, 15), 22)
# -0.83562861241004704      # intel windows
# -0.8356286124100470447118 # m1 mac

I know the size of difference is below .Machine$double.eps and the extra digits do not carry meaningful information.

I am not happy with the fact that extra digits exist. How can I ensure exactly consistent results? Is there an RNG library that achieves this?

EDIT:

The bit representation is different.

set.seed(1)
x <- rnorm(100)
x <- sum(x)
SoDA::binaryRep(x)

.10101110001110000100001111110111000010011001011111011 # intel windows
.10101110001110000100001111110111000010011001011111110 # m1 mac

Bits are also different in runif(). This suggests that the uniform-to-gaussian conversion is not the only breaking point.

set.seed(1)
x <- runif(10000000)
x <- sum(x)
SoDA::binaryRep(x)

# kind = "Mersenne-Twister"
.10011000100101000110100110111100101000100000101100000 # intel windows
.10011000100101000110100110111100101000011111001100000 # m1 mac
# kind = "Wichmann-Hill"
.10011000100111111110101000100001001001010100000011011 # intel windows
.10011000100111111110101000100001001001010100001001010 # m1 mac
# kind = "Marsaglia-Multicarry"
.10011000100011100110000010000001011100011110100001110 # intel windows
.10011000100011100110000010000001011100011110001010000 # m1 mac
# kind = "Super-Duper"
.10011000100010011010010110100001000101100011101011110 # intel windows
.10011000100010011010010110100001000101100100001111101 # m1 mac
# kind = "Knuth-TAOCP-2002"
.10011000101000110101010111000111010011101001000101100 # intel windows
.10011000101000110101010111000111010011101001000101101 # m1 mac
# kind = "Knuth-TAOCP"
.10011000100110001011010011000001011001001110011111000 # intel windows
.10011000100110001011010011000001011001001110011111001 # m1 mac
# kind = "L'Ecuyer-CMRG"
.10011000100100010110100101101001011000000111010110101 # intel windows
.10011000100100010110100101101001011000001000010100001 # m1 mac

Solution

  • (Comments from Oct. 29 and Nov. 2 moved here and edited.)

    I should note that such subtle reproducibility issues with pseudorandom number generators (PRNGs) can occur when floating-point arithmetic is involved. For instance, Intel's instruction set architecture might make use of 80-bit extended precision for internal arithmetic. Extended precision, though, is only one way (among a host of others) that floating-point arithmetic might lead to non-reproducible pseudorandom numbers. Consider that Intel's and Arm's instruction set architectures are different enough to cause reproducibility issues. (If I understand, an Arm instruction set is what is used in Apple's M1 chip.)

    By contrast, integer arithmetic has fewer reproducibility problems.

    Thus, if bit-for-bit reproducibility matters to you, you should try to find an R language PRNG that uses only integer operations. (Indeed, computers generate pseudorandom floating-point numbers via integers, not the other way around, and most PRNGs produce integers, not floating-point numbers.)

    For instance, for uniform variates, take the integer output of the Mersenne Twister algorithm without manipulating it. For Gaussian (and exponential) random variates, there is fortunately an algorithm by Karney that generates arbitrary-precision variates with only integer operations. Also, consider rational arithmetic built on underlying integer operations.

    REFERENCES:

    Karney, C.F.F., 2016. Sampling exactly from the normal distribution. ACM Transactions on Mathematical Software (TOMS), 42(1), pp.1-14.