Search code examples
copenmpmontecarlo

OpenMP Monte_Carlo simulation achieve target closeness to PI


I am trying to write a parallel program which takes an error rate(i.e 0.01) and returns a PI value which is closer to PI than the error with montecarlo simulation. I wrote a simple function however it does not terminate as error rate is always around 11. I appreciate your comments.

#include "stdio.h"
#include "omp.h"
#include <stdlib.h>
#include <unistd.h>
#include <math.h>

double drand48(void);

double monte_carlo(double epsilon){
    double x,y, pi_estimate = 0.0;
    double drand48(void);
    double error = 10000.0;
    int n = 0; // total number of points
    int i = 0; // total numbers of points inside circle
    int p = omp_get_num_threads();
    while(error>=epsilon){
        #pragma omp parallel private(x, y) reduction(+:i)//OMP parallel directive
        {
            x = drand48();
            y = drand48();
            if((x*x+y*y)<=1.0){i+=1;}
        }
        n+=p;
        printf("%lf\n", error);
        pi_estimate=4.0*(double)i/(double)n;
        error = fabs(M_PI-pi_estimate)/M_PI;
    }
    return pi_estimate;
}

int main(int argc, char* argv[]) {
    double epsilon = 0.01;
    printf("PI estimate: %lf",monte_carlo(epsilon));
    return 0;
}

Solution

  • Calling omp_get_num_threads() outside a parallel section will always return 1, as there is only one active thread at the moment the function is called. The following code should give a correct result, but will be much slower than the serial version due to the large parallelization & synchronization overhead spend for doing a very simple operation.

    #pragma omp parallel private(x, y) reduction(+:i)//OMP parallel directive
    {
        x = drand48();
        y = drand48();
        if((x*x+y*y)<=1.0){i+=1;}
        #pragma omp master
        n+=omp_get_num_threads();
    }
    

    The following avoids repeatedly spawning threads and may be more efficient, but still probably slower.

    #pragma omp parallel private(x, y)
    while(error>=epsilon){
            x = drand48();
            y = drand48();
            if((x*x+y*y)<=1.0){
                #pragma omp atomic
                i++;
            }
        #pragma omp barrier
        #pragma omp single
        {
            n+=omp_get_num_threads();
            pi_estimate=4.0*(double)i/(double)n;
            error = fabs(M_PI-pi_estimate)/M_PI;
            printf("%lf\n", error);
        } // implicit barrier here
    }
    

    In order to really go faster, a minimum number of iterations should be given such as:

    #define ITER 1000
    #pragma omp parallel private(x, y)
    while(error>=epsilon){
        #pragma omp for reduction(+:i)
        for (int j=1;j<ITER;j++){
            x = drand48();
            y = drand48();
            if((x*x+y*y)<=1.0) i+=1;
        }
        /* implicit barrier + implicit atomic addition
         * of thread-private accumulator to shared variable i
         */
    
        #pragma omp single
        {
            n+=ITER;
            pi_estimate=4.0*(double)i/(double)n;
            error = fabs(M_PI-pi_estimate)/M_PI;
            printf("%lf\n", error);
        } // implicit barrier
    }