I am trying to use Intel TBB parallel_reduce to obtain the sum of array elements consisting of doubles. However the result is different compared to OpenMP reduction implementation.
Here is the OpenMP one:
double dAverageTemp = 0.0;
#pragma omp parallel for reduction(+:dAverageTemp)
for (int i = 0; i < sCartesianSize; i++)
dAverageTemp += pdTempCurr[i];
This code returns the correct value which is "317.277493"; however this TBB code:
double dAverageTemp = tbb::parallel_reduce(tbb::blocked_range<double*>(pdTempCurr, pdTempCurr + sCartesianSize - 1),
0.0,
[](const tbb::blocked_range<double*> &r, double value) -> double {
return std::accumulate(r.begin(), r.end(), value);
},
std::plus<double>()
);
insists that the result is "317.277193".
What am I missing here?
Although all comments about the order of summations are perfectly correct, the simple truth here is you have a bug in your code. All std::
, thrust::
and tbb::
algorithms or constructors abide to the same philosophy when it comes to define ranges, which is to indicate from first element to take to first element not to take, like in a for ( auto it = v.begin(); it < v.end(); it++)
Therefore, here, your code for tbb::blocked_range
should go up to pdTempCurr + sCartesianSize
, not to pdTempCurr + sCartesianSize - 1
.
It should become:
double dAverageTemp = tbb::parallel_reduce(tbb::blocked_range<double*>(pdTempCurr, pdTempCurr + sCartesianSize ),
0.0,
[](const tbb::blocked_range<double*> &r, double value) -> double {
return std::accumulate(r.begin(), r.end() value);
},
std::plus<double>()
);
My (wild) guess is that pdTempCurr[sCartesianSize-1]
is around 0.0003
which will account for the numerical difference experienced.