Search code examples
cudathrust

using thrust::sort inside a thread


I would like to know if thrust::sort() can be used inside a thread

__global__
void mykernel(float* array, int arrayLength)
{
    int threadID = blockIdx.x * blockDim.x + threadIdx.x;
    // array length is vector in the device global memory
    // is it possible to use inside the thread?
    thrust::sort(array, array+arrayLength);
    // do something else with the array
}

If yes, does the sort launch other kernels to parallelize the sort?


Solution

  • Yes, thrust::sort can be combined with the thrust::seq execution policy to sort numbers sequentially within a single CUDA thread (or sequentially within a single CPU thread):

    #include <thrust/sort.h>
    #include <thrust/execution_policy.h>
    
    __global__
    void mykernel(float* array, int arrayLength)
    {
      int threadID = blockIdx.x * blockDim.x + threadIdx.x;
    
      // each thread sorts array
      // XXX note this causes a data race
      thrust::sort(thrust::seq, array, array + arrayLength);
    }
    

    Note that your example causes a data race because each CUDA thread attempts to sort the same data in parallel. A correct race-free program would partition array according to thread index.

    The thrust::seq execution policy, which is required for this feature, is only available in Thrust v1.8 or better.