I need to perform a parallel reduction to find the min or max of an array on a CUDA device. I found a good library for this, called Thrust. It seems that you can only perform a parallel reduction on arrays in host memory. My data is in device memory. Is it possible to perform a reduction on data in device memory? I can't figure how to do this. Here is documentation for Thrust: http://code.google.com/p/thrust/wiki/QuickStartGuide#Reductions. Thank all of you.
You can do reductions in thrust on arrays which are already in device memory. All that you need to do is wrap your device pointers inside thrust::device_pointer
containers, and call one of the reduction procedures, just as shown in the wiki you have linked to:
// assume this is a valid device allocation holding N words of data
int * dmem;
// Wrap raw device pointer
thrust::device_ptr<int> dptr(dmem);
// use max_element for reduction
thrust::device_ptr<int> dresptr = thrust::max_element(dptr, dptr+N);
// retrieve result from device (if required)
int max_value = dresptr[0];
Note that the return value is also a device_ptr
, so you can use it directly in other kernels using thrust::raw_pointer_cast
:
int * dres = thrust::raw_pointer_cast(dresptr);