Search code examples
cudathrustreduction

Parallel reduction on CUDA with array in device


I need to perform a parallel reduction to find the min or max of an array on a CUDA device. I found a good library for this, called Thrust. It seems that you can only perform a parallel reduction on arrays in host memory. My data is in device memory. Is it possible to perform a reduction on data in device memory? I can't figure how to do this. Here is documentation for Thrust: http://code.google.com/p/thrust/wiki/QuickStartGuide#Reductions. Thank all of you.


Solution

  • You can do reductions in thrust on arrays which are already in device memory. All that you need to do is wrap your device pointers inside thrust::device_pointer containers, and call one of the reduction procedures, just as shown in the wiki you have linked to:

    // assume this is a valid device allocation holding N words of data
    int * dmem;
    
    // Wrap raw device pointer 
    thrust::device_ptr<int> dptr(dmem);
    
    // use max_element for reduction
    thrust::device_ptr<int> dresptr = thrust::max_element(dptr, dptr+N);
    
    // retrieve result from device (if required)
    int max_value = dresptr[0];
    

    Note that the return value is also a device_ptr, so you can use it directly in other kernels using thrust::raw_pointer_cast:

    int * dres = thrust::raw_pointer_cast(dresptr);