Search code examples
c++rperformancercppgsl

Setting GSL RNG seed correctly in Rcpp for model with repeat iterations


I am writing a stochastic, process driven model of transmission of infection and diagnostic testing to detect infection. The model requires repeat random samples across multiple time steps and iterations. The faster my model can run, the better. For the random sampling in the model, parameters for the random samples can change at each time step in the model. I first wrote my model in R, and then in CPP (via the great Rcpp package). In Rcpp, using the R based random number generator, the model takes about 7% of the time to run as it took in R. I was advised that using GSL within CPP for random number generation is faster again. In the CPP model, with GSL based random sampling instead of R based random sampling, I get a marginal increase in speed. However, I am not sure that I am using the GSL based random sampler correctly.

My questions are:

  1. Is it correct to only do the seed setting procedure once for the GSL RNG based on the time of day and use this same construct for all of my random draws (as I have done in code below)? I confess I do not fully understand the seed setting procedure within CPP for GSL as I am new to both. I have compared the distributions produced using both R-based and GSL-based RNG and they are very similar, so hopefully this bit is OK.

I obtained the code for setting the GSL seed according to the time of day from this Stack Overflow post:

GSL Uniform Random Number Generator

  1. I was expecting a greater increase in speed using the GSL RNG. Is there anything I can do to maximize the speed of the GSL RNG?

I am using a Windows machine and the RStudio interface. I am sourcing the CPP functions from R using the Rcpp package. All of the packages and programmes were recently reinstalled. Here is the session info: R version 4.2.2 (2022-10-31 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 22000)

For context, I am a veterinary epidemiologist with R experience, but only two months into learning CPP. This is my first stack exchange query. Thanks in advance for your time!

Here is an example of what I am trying to achieve written in CPP (using Rcpp in RStudio) and using the GSL based RNG. Please can somebody tell me if this is the correct way to set the GSL RNG seed? Is it OK to do the seed setting process just once at the top of the function?

// CPP code - function GSL RNG written using Rcpp on a CPP file in RStudio

// [[Rcpp::plugins(cpp11)]]

#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>
#include <gsl/gsl_blas.h>
#include <iostream>
#include <gsl/gsl_math.h>
#include <sys/time.h>
#include <RcppGSL.h>

// [[Rcpp::depends(RcppGSL)]]

// [[Rcpp::export]]


Rcpp:: NumericMatrix check_cpp_gsl_rng(int n_iters,
                                 int min_unif,
                                 int max_unif,
                                 double exp_rate,
                                 double bernoulli_prob) 
{
  const gsl_rng_type * T;
  gsl_rng * r;
  gsl_rng_env_setup();
  struct timeval tv; // Seed generation based on time
  gettimeofday(&tv,0);
  unsigned long mySeed = tv.tv_sec + tv.tv_usec;
  T = gsl_rng_default; // Generator setup
  r = gsl_rng_alloc (T);
  gsl_rng_set(r, mySeed);

  // matrix to collect outputs
  
  Rcpp:: NumericMatrix Output_Mat(n_iters, 7); 
  
  for (int i = 0; i < n_iters; i++) // in real model, parameters may change for each iteration
    
  {
    // random exponential draws
    
    Output_Mat(i, 0) = gsl_ran_exponential(r , (1 / exp_rate)); // exp 1
    Output_Mat(i, 1) = gsl_ran_exponential(r , (1 / exp_rate)); // exp 2
    
    // random uniform draws
    
    Output_Mat(i, 2) = gsl_ran_flat(r, min_unif, max_unif); // unif 1
    Output_Mat(i, 3) = gsl_ran_flat(r, min_unif, max_unif); // unif 2
    
    // random Bernoulli draws

    Output_Mat(i, 4) = gsl_ran_bernoulli(r, bernoulli_prob); // Bernoulli 1
    Output_Mat(i, 5) = gsl_ran_bernoulli(r, bernoulli_prob); // Bernoulli 2
    
    
    Output_Mat(i, 6) = i; // record iteration number
  }
  
  return Output_Mat;
  
  gsl_rng_free(r);
  
  // end of function
  
}

The plot below shows a comparison of run speeds of the random sampling function implemented in R only, CPP using the R RNG and CPP using the GSL RNG (as in code above) based on 100 comparisons of 1000 iterations using the "microbenchmark" package.

Comparison of run speeds of the random sampling function in R only, CPP using the R RNG and CPP using the GSL RNG based on 100 comparisons of 1000 iterations using the "microbenchmark" package.


Solution

  • A package you may find useful is my RcppZiggurat (github). It revives the old but fast Ziggurat RNG for normal covariates and times it. It use several other Ziggurat implementations as benchmarks -- including one from the GSL.

    First, we can use its code and infrastructure to set up a simple structure (see below). I first show that 'yes indeed' we can seed a GSL RNG:

    > setseedGSL(42)
    > rnormGSLZig(5)
    [1] -0.811264  1.092556 -1.873074 -0.146400 -1.653703
    > rnormGSLZig(5)    # different
    [1] -1.281593  0.893496 -0.545510 -0.337940 -1.258800
    > setseedGSL(42)
    > rnormGSLZig(5)    # as before
    [1] -0.811264  1.092556 -1.873074 -0.146400 -1.653703
    >
    

    Note that we need a global variable for an instance of a GSL RNG 'state'.

    Second, we can show that Rcpp is actually faster that either the standard normal GSL generator or its Ziggurat implementation. Using Rcpp vectorised is even faster:

    > library(microbenchmark)
    > n <- 1e5
    > res <- microbenchmark(rnormGSLZig(n), rnormGSLPlain(n), rcppLoop(n), rcppDirect(n))
    > res
    Unit: microseconds
                 expr     min        lq     mean   median       uq      max neval cld
       rnormGSLZig(n) 996.580 1151.7065 1768.500 1355.053 1424.220 18597.82   100   b
     rnormGSLPlain(n) 996.316 1085.6820 1392.323 1358.696 1431.715  2929.05   100   b
          rcppLoop(n) 223.221  259.2395  641.715  518.706  573.899 13779.20   100  a 
        rcppDirect(n)  46.224   67.2075  384.004  293.499  320.919 14883.86   100  a 
    > 
    

    The code is below; it is a pretty quick adaptation from my RcppZiggurat package. You can sourceCpp() it (if you have RcppGSL installed which I used to 'easily' get the compile and link instructions to the GSL) and it will run the demo code shown above.

    #include <Rcpp/Lighter>
    #include <gsl/gsl_rng.h>
    #include <gsl/gsl_randist.h>
    
    // [[Rcpp::depends(RcppGSL)]]
    
    class ZigguratGSL {
    public:
        ZigguratGSL(uint32_t seed=12345678) {
            gsl_rng_env_setup() ;
            r = gsl_rng_alloc (gsl_rng_default);
            gsl_rng_set(r, seed);
        }
        ~ZigguratGSL() {
            gsl_rng_free(r);
        }
        double normZig() {
            const double sigma=1.0;
            return gsl_ran_gaussian_ziggurat(r, sigma);
        }
        double normPlain() {
            const double sigma=1.0;
            return gsl_ran_gaussian_ziggurat(r, sigma);
        }
        void setSeed(const uint32_t seed) {
            gsl_rng_set(r, seed);
        }
    private:
        gsl_rng *r;
    };
    
    static ZigguratGSL gsl;
    
    // [[Rcpp::export]]
    void setseedGSL(const uint32_t s) {
        gsl.setSeed(s);
        return;
    }
    
    // [[Rcpp::export]]
    Rcpp::NumericVector rnormGSLZig(int n) {
        Rcpp::NumericVector x(n);
        for (int i=0; i<n; i++) {
            x[i] = gsl.normZig();
        }
        return x;
    }
    
    // [[Rcpp::export]]
    Rcpp::NumericVector rnormGSLPlain(int n) {
        Rcpp::NumericVector x(n);
        for (int i=0; i<n; i++) {
            x[i] = gsl.normPlain();
        }
        return x;
    }
    
    // [[Rcpp::export]]
    Rcpp::NumericVector rcppLoop(int n) {
        Rcpp::NumericVector x(n);
        for (int i=0; i<n; i++) {
            x[i] = R::rnorm(1.0,0.0);
        }
        return x;
    }
    
    // [[Rcpp::export]]
    Rcpp::NumericVector rcppDirect(int n) {
        return Rcpp::rnorm(n, 1.0, 0.0);
    }
    
    
    /*** R
    setseedGSL(42)
    rnormGSLZig(5)
    rnormGSLZig(5)    # different
    setseedGSL(42)
    rnormGSLZig(5)    # as before
    
    
    library(microbenchmark)
    n <- 1e5
    res <- microbenchmark(rnormGSLZig(n), rnormGSLPlain(n), rcppLoop(n), rcppDirect(n))
    res
    */
    

    PS We write it as Rcpp. Capital R, lowercase cpp.