Search code examples
cudaparallel-processingthrust

Segmentation error when using thrust::sort in CUDA


I am trying to sort an array of class objects based on its type by passing a comparison function as the parameter to the thrust sort.

The class defination:

class TetraCutInfo
{

        public:
        int tetraid;
        unsigned int ncutEdges;
        unsigned int ncutNodes;
        unsigned int type_cut;
        __host__ __device__ TetraCutInfo();
};

Sort:

   thrust::sort(cutInfoptr,cutInfoptr+n,cmp());

cutInfoptr is a pointer of type TetraCutInfo having the address of the device memory allocated using cudaMalloc.

Comparison function

struct cmp
{
  __host__ __device__
  bool operator()(const TetraCutInfo x, TetraCutInfo y)
  {
        return (x.type_cut < y.type_cut);
  }
};

On running this I am getting Segmentation fault, however I am able to iterate through cutInfoptr in another kernel.

PS: I referred to the example in the link https://code.google.com/p/thrust/source/browse/examples/sort.cu


Solution

  • cutInfoptr is a pointer of type TetraCutInfo having the address of the device memory allocated using cudaMalloc.

    Although you haven't shown a complete code, based on the above statement you made, things probably won't work, and I would expect a seg fault as that pointer gets dereferenced.

    Note the information given in the thrust quick start guide:

    You may wonder what happens when a "raw" pointer is used as an argument to a Thrust function. Like the STL, Thrust permits this usage and it will dispatch the host path of the algorithm. If the pointer in question is in fact a pointer to device memory then you'll need to wrap it with thrust::device_ptr before calling the function.

    The cutInfoptr you referenced, if being created by cudaMalloc, is a "raw pointer" (which also happens to be a device pointer). When you pass it to thrust, thrust sees that it is a raw pointer, and dispatches the "host path". When the (device) pointer you pass is dereferenced in host code in the host path, you get a seg fault.

    One solution is to wrap it in a thrust::device_ptr pointer, excerpting the quick start guide example here:

    size_t N = 10;
    
    // raw pointer to device memory
    int * raw_ptr;
    cudaMalloc((void **) &raw_ptr, N * sizeof(int));
    
    // wrap raw pointer with a device_ptr 
    thrust::device_ptr<int> dev_ptr(raw_ptr);
    
    // use device_ptr in thrust algorithms
    thrust::fill(dev_ptr, dev_ptr + N, (int) 0);
    

    Another possible solution is to dispatch with an appropriate execution policy, such as thrust::device.