Search code examples
cparallel-processingopenmp

Why is the array sum coming less than the actual sum when OpenMP is used?


I have written the following program in C using OpenMP library for parallel programming to find the sum of an array of size 10000000. The expected output should be sum of elements = 10000000 but the output I am getting is less than the sum.

#include <stdio.h>
#define ARR_SIZE 10000000
int a[ARR_SIZE];
int main(int argc, char* argv[])
{
 int i,tid,numt;
int sum=0;
double t1,t2;
for(i=0;i<ARR_SIZE;i++)
a[i]=1;

t1=omp_get_wtime();

#pragma omp parallel default(shared) private(i,tid)
{
int from,to;
tid=omp_get_thread_num();
numt=omp_get_num_threads();

from = (ARR_SIZE/numt)*tid;
to= (ARR_SIZE/numt)*(tid+1)-1;

if(tid == numt-1)
to= ARR_SIZE-1;

printf("Hello from %d of %d , my range is from = %d to %d \n",tid,numt,from,to);

for(i=from;i<=to;i++)
sum+=a[i];
}

t2=omp_get_wtime();

printf("Sum of the array elements = %d time = %g \n",sum,t2-t1);
return 0;
}

Some of the sample outputs are :

Output 1

Hello from 0 of 4 , my range is from = 0 to 2499999
Hello from 3 of 4 , my range is from = 7500000 to 9999999
Hello from 1 of 4 , my range is from = 2500000 to 4999999
Hello from 2 of 4 , my range is from = 5000000 to 7499999
Sum of the array elements = 3235618 time = 0.118754

Output 2

Hello from 3 of 4 , my range is from = 7500000 to 9999999
Hello from 0 of 4 , my range is from = 0 to 2499999
Hello from 2 of 4 , my range is from = 5000000 to 7499999
Hello from 1 of 4 , my range is from = 2500000 to 4999999
Sum of the array elements = 2964874 time = 0.129216

What is the reason that the given sum is less than the actual sum.


Solution

  • The update of sum variable isn't an atomic operation and is prone to races. Races of this type are likely to yield a lesser than expected sum.

    The summing boils down to something of this kind:

    1. Load to register from memory location
    2. Add new value to the register
    3. Store the register value back to the memory

    Now when 4 threads perform the above without consideration of one to another, some additions will be lost, resulting in a sum that is below of what was expected.

    For example, with 2 threads (for simplicity):

    Thread 1: Load to a register from memory location
    Thread 2: Load to a register from memory location
    Thread 1: Add new value to the register
    Thread 2: Add new value to the register
    Thread 1: Store the register value back to the memory
    Thread 2: Store the register value back to the memory
    

    In this example, at the end the addition of thread 1 will be overridden.

    You should make sure the summation is done atomically to avoid races.