Search code examples
c++randomthread-safetyopenmpgsl

GSL+OMP: Thread safe random number generators in C++


I have a code in which I am trying to execute in parallel.

#include<iostream>
#include<omp.h>
#include<math.h>
#include<cstdlib>
#include<iterator>
#include<string.h>
#include<vector>
#include<map>
#include<time.h>
#include<gsl/gsl_rng.h>
#include<gsl/gsl_randist.h>

gsl_rng ** threadvec = new gsl_rng*[omp_get_num_threads()];
using namespace std;

int main(){
   clock_t begin = omp_get_wtime();
   vector<double> PopVals;
   map<int, vector<double> > BigMap;
   int Num1 = 100; 
   double randval;
   int Num2 = 10; 
   #pragma omp parallel
   {
       gsl_rng_env_setup();     
       for (int b = 0; b < omp_get_num_threads(); b++)
           threadvec[b] = gsl_rng_alloc(gsl_rng_taus);  
   }
   for( int i = 0; i < Num1; i++){
       PopVals.resize(Num2);
       #pragma omp parallel for
          for( int j = 0; j < Num2; j++){   
              randval = gsl_rng_uniform(threadvec[omp_get_thread_num()]);   
              PopVals[j] = randval; 
          }
       BigMap.insert(make_pair(i,PopVals));
       PopVals.clear();
   }

map<int,vector<double> >::iterator it = BigMap.find(Num1-1);
vector<double> OutVals = it->second; 

for (int i = 0; i < Num2; i++)
    cout << endl << OutVals[i] << endl; 

for (int b = 0; b < omp_get_num_threads(); b++)
        gsl_rng_free(threadvec[b]);

clock_t end = omp_get_wtime(); 
double elapsed_time = double(end - begin);
cout << endl << "Time taken to run: " << elapsed_time <<  " secs" << endl;
}

When I run this, there are 8 threads executing the nested loop in parallel, but I keep seeing the same random number for each thread. I attributed this behavior to lack of setting the seed, for each iteration. It would be great if somebody can point out, how can i generate unique random numbers in each iteration of the loop in a thread safe way.

The output of the above code is 0.793816, 10 times. Whereas, I want unique numbers for each of the values in the inner loop.

Thanks.


Solution

  • There are multiple issues here.

    Using omp_get_num_threads out of parallel regions

    Outside of a parallel region, omp_get_num_threads() always returns 1. Use omp_get_max_threads() instead, it will return the number of threads for any upcoming parallel region unless manually overridden. Especially threadvec has only one entry.

    Don't initialize the environment in a parallel region

    Calling gsl_rng_env_setup in a parallel region won't work properly. Also you are trying to allocate the whole vector of rngs by all threads... Simply remove the parallel region and use omp_get_max_threads() correctly. Or you could also do:

    gsl_rng_env_setup(); // serial
    #pragma omp parallel
    threadvec[omp_get_thread_num()] = gsl_rng_alloc(gsl_rng_taus);
    

    Allthough from the documentation it is not 100% clear if that is threadsafe, so just use the serial loop version.

    Properly seed your rngs differently

    By default, all rngs are seeded with the same number, so obviously they will return the exact same sequence. Seed them properly with the thread number, e.g. gsl_rng_set(threadvec[b], b * 101);. Note that Tausworthe generators are weird. Those particular generate the same sequence of numbers when seeded with 0 or 1.

    Implicitly shared variables

    Your variable randval is defined outside of the parallel region, hence it is implicitly shared. You could force it to be private, but it is better to declare variables as locally as possible. That makes reasoning about OpenMP code much easier.

    At the end, it looks something like this:

    #include <cstdlib>
    #include <gsl/gsl_randist.h>
    #include <gsl/gsl_rng.h>
    #include <iostream>
    #include <iterator>
    #include <map>
    #include <math.h>
    #include <omp.h>
    #include <string.h>
    #include <time.h>
    #include <vector>
    
    // DO NOT using namespace std;
    
    int main() {
      clock_t begin = omp_get_wtime();
      std::vector<double> PopVals;
      std::map<int, std::vector<double>> BigMap;
      constexpr int Num1 = 100;
      constexpr int Num2 = 10;
      gsl_rng_env_setup();
      gsl_rng **threadvec = new gsl_rng *[omp_get_max_threads()];
      for (int b = 0; b < omp_get_max_threads(); b++) {
        threadvec[b] = gsl_rng_alloc(gsl_rng_taus);
        gsl_rng_set(threadvec[b], b * 101);
      }
      for (int i = 0; i < Num1; i++) {
        PopVals.resize(Num2);
        #pragma omp parallel for
        for (int j = 0; j < Num2; j++) {
          double randval = gsl_rng_uniform(threadvec[omp_get_thread_num()]);
          PopVals[j] = randval;
        }
        BigMap.insert(std::make_pair(i, PopVals));
        PopVals.clear();
      }
    
      std::map<int, std::vector<double>>::iterator it = BigMap.find(Num1 - 1);
      std::vector<double> OutVals = it->second;
    
      for (int i = 0; i < Num2; i++)
        std::cout << std::endl << OutVals[i] << std::endl;
    
      for (int b = 0; b < omp_get_max_threads(); b++)
        gsl_rng_free(threadvec[b]);
    
      clock_t end = omp_get_wtime();
      double elapsed_time = double(end - begin);
      std::cout << std::endl << "Time taken to run: " << elapsed_time << " secs" << std::endl;
      delete[] threadvec;
    }