Search code examples
parallel-processingopenmpparallelism-amdahl

OpenMP - Random running time - why having so high run-time variance?


I am following Tim Mattson's lectures on OpenMP to learn ways of implementation of some parallel programming concepts.

I was trying to observe the running time behavior of a parallel program that computes the value of PI using 3x10^8 steps.

Here is the code,

#include <omp.h>
#include <stadio.h>

static long num_steps = 300000000;
double step;
#define PAD 8 // tried 50 too
#define NUM_THREADS 4
int main()
{
    int i, nthreads;
    double pi, sum[NUM_THREADS][PAD];
    double ts, te;

    ts = omp_get_wtime();

    step = 1.0/(double) num_steps;
    omp_set_num_threads(NUM_THREADS);
    #pragma omp parallel
    {
        int i, id,nthrds;
        double x;

        id = omp_get_thread_num();
        nthrds = omp_get_num_threads();
        if (id == 0)  nthreads = nthrds;
        for (i=id, sum[id]=0.0;i< num_steps; i=i+nthrds) {
            x = (i+0.5)*step;
            sum[id][0] += 4.0/(1.0+x*x);
        }
    }

    for(i=0, pi=0.0;i<nthreads;i++)
        pi += sum[i][0] * step;

    te = omp_get_wtime();

    printf("%.10f\n", pi);
    printf("%.f\n", te-ts);

}

Now I was on Ubuntu 14.04 LTS running on a Dual Core machine. A call to omp_get_num_procs() returned 2. The running time was something like totally random, ranging from 1.31 second to 4.46 seconds. Whereas the serial program was taking 2.31 second almost always.

I tried creating 1, 2, 3, 4, upto 10 threads. The running time varies too much in every case, though the average is smaller in case of more threads. I wasn't running any other applications.


Can anyone explain why the running time varied too much?

How to calculate the run time accurately? The lecturer has given the running time of his computer which seems consistent. And he was also using Dual Core processor.


Solution

  • Dual-CPU comparison, using OpenMP :

    Result          : 3.1415926536
    Number of CPU-s : 2  
    Duration        : 2.4025482161
    

    There seems to be pretty consistent set of resulting code-execution times:

    /*           Duration        : 2.3984972970
                 Duration        : 2.4004815188
                 Duration        : 2.3814983589
                 Duration        : 2.4070654172
                 Duration        : 2.3964317020
                 Duration        : 2.3858104548
                 Duration        : 2.3765923560
                 Duration        : 2.3734730321
        -O3:
                 Duration        : 0.4159400249
                 Duration        : 0.3089567909
                 Duration        : 0.3106977220
                 Duration        : 0.3312316008
                 Duration        : 0.2856188160
                 Duration        : 0.2984415500
                 Duration        : 0.3282426349
                 Duration        : 0.2836121118
                                        :......
      + FYI:     #pragma-overheads      :......
                 Duration        : 0.0001377461                                                                                           
                 Duration        : 0.0001228561
                 Duration        : 0.0001215260
        REF:
        Amdahl's Law             >>> https://stackoverflow.com/revisions/18374629/3
        criticism,
        on
        (not-)including also the real-world's infrastructure add-on
        { setup | termination }-overhead costs of #pragma omp parallel section
        ( 
          simplified test w/o the add-on costs of global OpenMP setup & configuration
          )
    
                 */
    

    which turns attention to your System-under-Test workload background noise.

    Best re-test your code on a head-less platform, so as to avoid any sort of GUI-related workloads from intervening the computing-part of the test.

    May enjoy this sandboxed online-TiO-platform to re-run experiments.