I am a beginner in Cuda, and I just wanted to ask a simple question that I could not find any clear answer for.
I know that we can define our array in Device memory using a raw pointer:
int *raw_ptr;
cudaMalloc((void **) &raw_ptr, N * sizeof(int));
And, we can also use Thrust to define a vector and push_back our items:
thrust::device_vector<int> D;
Actually, I need a huge amount of memory (like 500M int variables) to apply too many kernels on them in parallel. In terms of accessing the memory by kernels, is (when) using raw pointers faster than Thrust::vector?
The data in thrust::device_vector
is ordinary global memory, there is no difference in access speed.
Note however that the two alternatives you present are not equivalent. cudaMalloc returns uninitialized memory. Memory in thrust::device_vector
will be initialized. After allocation it launches a kernel for the initialization of its elements, followed by cudaDeviceSynchronize
. This could slow down the code. You need to benchmark your code.