c++multithreading parallel-processing openmp trng

Random generation with TRNG

For the following code which generates random numbers for Monte Carlo simulation, I need to receive the exact sum for each run, but this will not happen, although I have fixed the seed. I would appreciate it if anyone could point out the problem with this code

#include <cmath>
#include <random>
#include <iostream>
#include <chrono>
#include <cfloat>
#include <iomanip>
#include <cstdlib>
#include <omp.h>
#include <trng/yarn2.hpp>
#include <trng/mt19937_64.hpp>
#include <trng/uniform01_dist.hpp>

using namespace std;
using namespace chrono;
const double landa = 1; 
const double exact_solution = landa / (pow(landa, 2) + 1); 
double function(double x) {
    return cos(x) / landa;
}

int main() {
    int rank; 
    const int N = 1000000;
    double sum = 0.0;
    trng::yarn2 r[6];
    for (int i = 0; i <6; i++)
    { 
        r[i].seed(0); 
    }
    
    for (int i = 0; i < 6; i++)
    {
          r[i].split(6,i);
    }
    
     trng::uniform01_dist<double> u;
     auto start = high_resolution_clock::now();
     #pragma omp parallel  num_threads(6)  
     {
         rank=omp_get_thread_num();
         #pragma omp for reduction (+: sum)
         for (int i = 0; i<N; ++i) {
            //double x = distribution(g);
      
            double x= u(r[rank]); 
            x = (-1.0 / landa) * log(1.0 - x); 
            sum = sum+function(x);
         }
    }
     double app   = sum / static_cast<double> (N);
     auto end = high_resolution_clock::now();
        
    auto diff=duration_cast<milliseconds>(end-start);
    
    cout << "Approximation is: " <<setprecision(17) << app << "\t"<<"Time: "<< setprecision(17) << diff.count()<<" Error: "<<(app-exact_solution)<< endl; 

    return 0;
}

Solution

TL;DR The problem is two-fold:

Floating point addition is not associative;
You are generating different random number for each thread.

I need to receive the exact sum for each run, but this will not happen, although I have fixed the seed. I would appreciate it if anyone could point out the problem with this code

First, you have a race-condition on rank=omp_get_thread_num();, the variable rank is shared among all threads, to fix that you can declared the variable rank inside the parallel region, hence, making it private to each thread.

 #pragma omp parallel  num_threads(6)  
 {
     int rank=omp_get_thread_num();
     ...
  }

In your code, you should not expect that the value of the sum will be the same for different number of threads. Why ?

because you are adding doubles in parallel
```
    double sum = 0.0;
    ...
    #pragma omp for reduction (+: sum)
    for (int i = 0; i<N; ++i) {
        //double x = distribution(g);

        double x= u(r[rank]); 
        x = (-1.0 / landa) * log(1.0 - x); 
        sum = sum+function(x);
    }
```
and from What Every Computer Scientist Should Know about Floating Point Arithmetic one can read:

Another grey area concerns the interpretation of parentheses. Due to roundoff errors, the associative laws of algebra do not necessarily hold for floating-point numbers. For example, the expression (x+y)+z has a totally different answer than x+(y+z) when x = 1e30, y = -1e30 and z = 1 (it is 1 in the former case, 0 in the latter).

Hence, from that you conclude that floating point addition is not associative, and the reason why for a different number of threads you might have different sum values.
You are generating different random values per thread:
```
  for (int i = 0; i < 6; i++)
  {
      r[i].split(6,i);
  }
```
Consequently, for different number of threads, the variable sum gets different results as well.

As kindly point out by jérôme-richard in the comments:

Note that more precise algorithm like the Kahan summation can significantly reduces the rounding issue while being still relatively fast.