I'm trying to figure out how I can use OpenMP's for reduction()
equivalent in CUDA. I've done some research online, and none of what I've tried worked. The code:
#pragma omp parallel for reduction(+:sum)
for (i = 0; i < N; i++)
{
float f = ... //store return from function to f
out[i] = f; //store f to out[i]
sum += f; //add f to sum and store in sum
}
I know what for reduction()
does in OpenMP....it makes the last line of the for loop possible. But how can I use CUDA to express the same thing?
Thanks!
Use Thrust, An STL inspired library that comes with CUDA. See the Quick Start Guide for examples on how to perform reductions.