I have written the following program in C using OpenMP library for parallel programming to find the sum of an array of size 10000000. The expected output should be sum of elements = 10000000 but the output I am getting is less than the sum.
#include <stdio.h>
#define ARR_SIZE 10000000
int a[ARR_SIZE];
int main(int argc, char* argv[])
{
int i,tid,numt;
int sum=0;
double t1,t2;
for(i=0;i<ARR_SIZE;i++)
a[i]=1;
t1=omp_get_wtime();
#pragma omp parallel default(shared) private(i,tid)
{
int from,to;
tid=omp_get_thread_num();
numt=omp_get_num_threads();
from = (ARR_SIZE/numt)*tid;
to= (ARR_SIZE/numt)*(tid+1)-1;
if(tid == numt-1)
to= ARR_SIZE-1;
printf("Hello from %d of %d , my range is from = %d to %d \n",tid,numt,from,to);
for(i=from;i<=to;i++)
sum+=a[i];
}
t2=omp_get_wtime();
printf("Sum of the array elements = %d time = %g \n",sum,t2-t1);
return 0;
}
Some of the sample outputs are :
Hello from 0 of 4 , my range is from = 0 to 2499999
Hello from 3 of 4 , my range is from = 7500000 to 9999999
Hello from 1 of 4 , my range is from = 2500000 to 4999999
Hello from 2 of 4 , my range is from = 5000000 to 7499999
Sum of the array elements = 3235618 time = 0.118754
Hello from 3 of 4 , my range is from = 7500000 to 9999999
Hello from 0 of 4 , my range is from = 0 to 2499999
Hello from 2 of 4 , my range is from = 5000000 to 7499999
Hello from 1 of 4 , my range is from = 2500000 to 4999999
Sum of the array elements = 2964874 time = 0.129216
What is the reason that the given sum is less than the actual sum.
The update of sum
variable isn't an atomic operation and is prone to races. Races of this type are likely to yield a lesser than expected sum.
The summing boils down to something of this kind:
Now when 4 threads perform the above without consideration of one to another, some additions will be lost, resulting in a sum that is below of what was expected.
For example, with 2 threads (for simplicity):
Thread 1: Load to a register from memory location
Thread 2: Load to a register from memory location
Thread 1: Add new value to the register
Thread 2: Add new value to the register
Thread 1: Store the register value back to the memory
Thread 2: Store the register value back to the memory
In this example, at the end the addition of thread 1 will be overridden.
You should make sure the summation is done atomically to avoid races.