I am writing a stochastic, process driven model of transmission of infection and diagnostic testing to detect infection. The model requires repeat random samples across multiple time steps and iterations. The faster my model can run, the better. For the random sampling in the model, parameters for the random samples can change at each time step in the model. I first wrote my model in R, and then in CPP (via the great Rcpp package). In Rcpp, using the R based random number generator, the model takes about 7% of the time to run as it took in R. I was advised that using GSL within CPP for random number generation is faster again. In the CPP model, with GSL based random sampling instead of R based random sampling, I get a marginal increase in speed. However, I am not sure that I am using the GSL based random sampler correctly.
My questions are:
I obtained the code for setting the GSL seed according to the time of day from this Stack Overflow post:
GSL Uniform Random Number Generator
I am using a Windows machine and the RStudio interface. I am sourcing the CPP functions from R using the Rcpp package. All of the packages and programmes were recently reinstalled. Here is the session info: R version 4.2.2 (2022-10-31 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 22000)
For context, I am a veterinary epidemiologist with R experience, but only two months into learning CPP. This is my first stack exchange query. Thanks in advance for your time!
Here is an example of what I am trying to achieve written in CPP (using Rcpp in RStudio) and using the GSL based RNG. Please can somebody tell me if this is the correct way to set the GSL RNG seed? Is it OK to do the seed setting process just once at the top of the function?
// CPP code - function GSL RNG written using Rcpp on a CPP file in RStudio
// [[Rcpp::plugins(cpp11)]]
#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>
#include <gsl/gsl_blas.h>
#include <iostream>
#include <gsl/gsl_math.h>
#include <sys/time.h>
#include <RcppGSL.h>
// [[Rcpp::depends(RcppGSL)]]
// [[Rcpp::export]]
Rcpp:: NumericMatrix check_cpp_gsl_rng(int n_iters,
int min_unif,
int max_unif,
double exp_rate,
double bernoulli_prob)
{
const gsl_rng_type * T;
gsl_rng * r;
gsl_rng_env_setup();
struct timeval tv; // Seed generation based on time
gettimeofday(&tv,0);
unsigned long mySeed = tv.tv_sec + tv.tv_usec;
T = gsl_rng_default; // Generator setup
r = gsl_rng_alloc (T);
gsl_rng_set(r, mySeed);
// matrix to collect outputs
Rcpp:: NumericMatrix Output_Mat(n_iters, 7);
for (int i = 0; i < n_iters; i++) // in real model, parameters may change for each iteration
{
// random exponential draws
Output_Mat(i, 0) = gsl_ran_exponential(r , (1 / exp_rate)); // exp 1
Output_Mat(i, 1) = gsl_ran_exponential(r , (1 / exp_rate)); // exp 2
// random uniform draws
Output_Mat(i, 2) = gsl_ran_flat(r, min_unif, max_unif); // unif 1
Output_Mat(i, 3) = gsl_ran_flat(r, min_unif, max_unif); // unif 2
// random Bernoulli draws
Output_Mat(i, 4) = gsl_ran_bernoulli(r, bernoulli_prob); // Bernoulli 1
Output_Mat(i, 5) = gsl_ran_bernoulli(r, bernoulli_prob); // Bernoulli 2
Output_Mat(i, 6) = i; // record iteration number
}
return Output_Mat;
gsl_rng_free(r);
// end of function
}
The plot below shows a comparison of run speeds of the random sampling function implemented in R only, CPP using the R RNG and CPP using the GSL RNG (as in code above) based on 100 comparisons of 1000 iterations using the "microbenchmark" package.
A package you may find useful is my RcppZiggurat (github). It revives the old but fast Ziggurat RNG for normal covariates and times it. It use several other Ziggurat implementations as benchmarks -- including one from the GSL.
First, we can use its code and infrastructure to set up a simple structure (see below). I first show that 'yes indeed' we can seed a GSL RNG:
> setseedGSL(42)
> rnormGSLZig(5)
[1] -0.811264 1.092556 -1.873074 -0.146400 -1.653703
> rnormGSLZig(5) # different
[1] -1.281593 0.893496 -0.545510 -0.337940 -1.258800
> setseedGSL(42)
> rnormGSLZig(5) # as before
[1] -0.811264 1.092556 -1.873074 -0.146400 -1.653703
>
Note that we need a global variable for an instance of a GSL RNG 'state'.
Second, we can show that Rcpp is actually faster that either the standard normal GSL generator or its Ziggurat implementation. Using Rcpp vectorised is even faster:
> library(microbenchmark)
> n <- 1e5
> res <- microbenchmark(rnormGSLZig(n), rnormGSLPlain(n), rcppLoop(n), rcppDirect(n))
> res
Unit: microseconds
expr min lq mean median uq max neval cld
rnormGSLZig(n) 996.580 1151.7065 1768.500 1355.053 1424.220 18597.82 100 b
rnormGSLPlain(n) 996.316 1085.6820 1392.323 1358.696 1431.715 2929.05 100 b
rcppLoop(n) 223.221 259.2395 641.715 518.706 573.899 13779.20 100 a
rcppDirect(n) 46.224 67.2075 384.004 293.499 320.919 14883.86 100 a
>
The code is below; it is a pretty quick adaptation from my RcppZiggurat package. You can sourceCpp()
it (if you have RcppGSL installed which I used to 'easily' get the compile and link instructions to the GSL) and it will run the demo code shown above.
#include <Rcpp/Lighter>
#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>
// [[Rcpp::depends(RcppGSL)]]
class ZigguratGSL {
public:
ZigguratGSL(uint32_t seed=12345678) {
gsl_rng_env_setup() ;
r = gsl_rng_alloc (gsl_rng_default);
gsl_rng_set(r, seed);
}
~ZigguratGSL() {
gsl_rng_free(r);
}
double normZig() {
const double sigma=1.0;
return gsl_ran_gaussian_ziggurat(r, sigma);
}
double normPlain() {
const double sigma=1.0;
return gsl_ran_gaussian_ziggurat(r, sigma);
}
void setSeed(const uint32_t seed) {
gsl_rng_set(r, seed);
}
private:
gsl_rng *r;
};
static ZigguratGSL gsl;
// [[Rcpp::export]]
void setseedGSL(const uint32_t s) {
gsl.setSeed(s);
return;
}
// [[Rcpp::export]]
Rcpp::NumericVector rnormGSLZig(int n) {
Rcpp::NumericVector x(n);
for (int i=0; i<n; i++) {
x[i] = gsl.normZig();
}
return x;
}
// [[Rcpp::export]]
Rcpp::NumericVector rnormGSLPlain(int n) {
Rcpp::NumericVector x(n);
for (int i=0; i<n; i++) {
x[i] = gsl.normPlain();
}
return x;
}
// [[Rcpp::export]]
Rcpp::NumericVector rcppLoop(int n) {
Rcpp::NumericVector x(n);
for (int i=0; i<n; i++) {
x[i] = R::rnorm(1.0,0.0);
}
return x;
}
// [[Rcpp::export]]
Rcpp::NumericVector rcppDirect(int n) {
return Rcpp::rnorm(n, 1.0, 0.0);
}
/*** R
setseedGSL(42)
rnormGSLZig(5)
rnormGSLZig(5) # different
setseedGSL(42)
rnormGSLZig(5) # as before
library(microbenchmark)
n <- 1e5
res <- microbenchmark(rnormGSLZig(n), rnormGSLPlain(n), rcppLoop(n), rcppDirect(n))
res
*/
PS We write it as Rcpp. Capital R, lowercase cpp.