I am trying to find the sum of an array (already present in CUDA memory) using thrust library. Few replies here, said that is possible by wrapping it using thrust::device_ptr, but it is throwing an error for me.
Initial code
cudaMemcpy((void *)(data + stride), (void *)d_output, sizeof(unsigned int) * rows * cols, cudaMemcpyDeviceToHost);
thrust::device_vector<unsigned int> vec((data + stride), (data + stride + (rows * cols)));
sum = thrust::reduce(vec.begin(), vec.end());
The above code works perfectly fine. But if I change it to
thrust::device_ptr<unsigned int> outputPtrBegin(d_output);
thrust::device_ptr<unsigned int> outputPtrEnd((d_output + stride + (rows * cols)));
sum = thrust::reduce(outputPtrBegin, outputPtrEnd);
It throws me the following error.
terminate called after throwing an instance of 'thrust::system::system_error'
what(): an illegal memory access was encountered
Aborted (core dumped)
What could be the problem? Thanks a lot for your time.
*Edited input from Robert Crovella The mistake was using stride. I have a following question (related to the above declaration)
Depending on the value of toggle, I need to call thrust
if(toggle) {
thrust::device_ptr<unsigned int> outputPtrBegin(d_output);
thrust::device_ptr<unsigned int> outputPtrEnd((d_output + (rows * cols)));
}
else {
thrust::device_ptr<unsigned int> outputPtrBegin(d_X);
thrust::device_ptr<unsigned int> outputPtrEnd((d_X + (rows * cols)));
}
But the compilation says outputPtrBegin and outputPtrEnd are not declared, because they are in the if statement. How do I declare these device pointers before and then use?
This is wrong:
thrust::device_ptr<unsigned int> outputPtrEnd((d_output + stride + (rows * cols)));
It should be:
thrust::device_ptr<unsigned int> outputPtrEnd((d_output + (rows * cols)));
In your first (working) example, you are copying a region from the device to the host. On the device, that region starts at d_output
and has a length of rows*cols
elements. This is the data that you are ulimately passing through the reduce
operation. Yes, on the host, it happens to be copied to a region that begins at data + stride
but that is irrelevant. Ultimately you are performing a reduce over rows*cols
elements, in your first implementation.
It's quite clear that in the second implementation, you are attempting to perform a reduce operation starting at d_output
and going to d_output+stride+(rows*cols)
. This is not the same size operation.
In addition, you may want to do something like this instead:
thrust::device_ptr<unsigned int> outputPtrBegin(d_output);
thrust::device_ptr<unsigned int> outputPtrEnd = outputPtrBegin + (rows * cols);
sum = thrust::reduce(outputPtrBegin, outputPtrEnd);
Regarding your second question (please post new questions as new questions), instead of this:
if(toggle) {
thrust::device_ptr<unsigned int> outputPtrBegin(d_output);
thrust::device_ptr<unsigned int> outputPtrEnd((d_output + (rows * cols)));
}
else {
thrust::device_ptr<unsigned int> outputPtrBegin(d_X);
thrust::device_ptr<unsigned int> outputPtrEnd((d_X + (rows * cols)));
}
Do something like this:
thrust::device_ptr<unsigned int> outputPtrBegin;
thrust::device_ptr<unsigned int> outputPtrEnd;
if(toggle) outputPtrBegin=thrust::device_pointer_cast<unsigned int>(d_output);
else outputPtrBegin=thrust::device_pointer_cast<unsigned_int>(d_X);
outputPtrEnd = outputPtrBegin + (rows * cols);