I'm maintaining an R package named iRF, and a big problem is that it isn't reproducible. In other words, I cannot get the same result by setting set.seed
. For the purpose of this question, let's focus on the function RIT
. You don't need to figure out what it does; just look at the RNG handling part instead.
It is defined in R/RIT.R
, which calls either RIT_1class
or RIT_2class
depending on the input type. Both RIT_[1|2]class
functions are defined in src/ExportedFunctionsRIT.cpp
, which in turn calls helper functions defined in src/RITmain.h
and src/RITaux.h
.
I'm using Rcpp attributes, so randomness in RIT_[1|2]class
should be correctly handled by an implicit RNGScope
, as mentioned in this answer. However, this codebase is tricky to tackle in two ways,
RIT_basic
and RIT_minhash
use // [[Rcpp::plugins(openmp)]]
. Fortunately, the original author gives each thread a separate seed, so hopefully, I can make it deterministic with seeds[i] = rand() * (i+1)
, yet you can tell this along isn't enough since I'm asking here.// Set up vector of seeds for RNG
vector<unsigned int> seeds(n_cores);
for (int i=0; i<n_cores; i++) {
seeds[i] = chrono::high_resolution_clock::now().time_since_epoch().count()*(i+1);
}
CreateHT
uses random_device rd;
. I'm not familiar with C++ but a quick search reveals it generates "non-deterministic random numbers".void CreateHt(...) {
// Ht is p by L
random_device rd; //seed for Random Number Generator(RNG)
mt19937_64 mt(rd()); //Use Mersenne Twister as RNG
...
shuffle(perm.begin(), perm.end(), mt);
...
}
From my understanding, both rand()
and random_device
are C++'s builtin random artifacts. How can I make them respect .Random.seed
?
You should not use rand()
, c.f. https://channel9.msdn.com/Events/GoingNative/2013/rand-Considered-Harmful. In particular rand()
is not thread safe, so combining it with OpenMP will not work. However, going for C++11's random
header is not a good idea either, since its usage is discouraged by WRE. No reason is given, but the distribution functions being implementation defined is a likely one.
Possible alternatives:
Use R's RNG. Rcpp provides many wrapper functions in the R
and Rcpp
namespace. In addition R_unif_index
is helpful for getting an unbiased integer within a range.
Use the RNGs from boost.random
provided by the BH package. Seed them with a call to R's RNG to make everything reproducible.
Use alternative packages like rTRNG
, sitmo
or my own dqrng
. This is particularly helpful in the context of parallel RNGs. Seeding via R's RNG can be used here as well.