Search code examples
cudathrust

Use thrust::reduce after invoking other CUDA functions


I am trying to invoke a thrust::reduce function after using my own CUDA function. Here is the question:

If I use thrust::reduce in before my CUDA function (just for test), everything is fine, no error, no throwout.

However, if the thrust::reduce is invoked after running my CUDA function, there is a message popped:

cudaErrorLaunchOutOfResources(7): 
[.../dispatch_reduce.cuh, 454]: too many resources requested for launch
[.../dispatch_reduce.cuh, 646]: too many resources requested for launch

I think that it might be the block size or thread size I changed in my own CUDA kernel that has changed some kinds of environmental variables for thrust. But am I right?

If yes, how may I change such variables back to the values for thrust?

Here is an illustration of the codes:

//...
// some CUDA functions calculating the coordinate transformation matrix using different combinations of points (called RANSAC in surveying) 
// the CUDA functions use shared memory to speed up 
// all qualified answers leave a "1" in the resulting array (unsigned int res[])
//...

// calling a thrust reduce function to get the number of qualified results (simpling adding the numbers in res[]
void GetTotalQualNum(unsigned int* uQuali, unsigned int uTotalCandiNum, unsigned int* uTotalQualiNum)
{
    unsigned int* d_vec; 
    cudaMalloc*((void**)&d_vec, uTotalCandiNum * sizeof(unsigned int)); 
    cudaMemcpy(d_vec, uQualiRes, uTotalCandiNum * sizeof(unsigned int), cudaMemcpyHostToDevice); 
    thrust::device_ptr<unsigned int> pd_vec = thrust::device_pointer_cast(d_vec); 

    *uTotalQualiNum = thurst::reduce(thrust::device, pd_vec, pd_vec + uTotalCandiNum, 0); 

    cudaFree(d_vec); 
}

The GPU is GTX1070 with CUDA 9.1.85 and thrust v1.9.


Solution

  • After checking answers to other questions, especially thrust functor: “too many resources requested for launch”, I realized it might be due to the block dimension or grid dimension I set for my own CUDA kernels that may affect the thrust function.

    After revising the dim3 dimBlock from 1024 to 512, the error has never shown again.

    Regarding the proper size of blocks and grids, reference can be found by searching the keywords.