Search code examples
cudathrust

Ensure that thrust doesnt memcpy from host to device


I have used the following method, expecting to avoid memcpy from host to device. Does thrust library ensure that there wont be a memcpy from host to device in the process?

void EScanThrust(float * d_in, float * d_out)
{
     thrust::device_ptr<float> dev_ptr(d_in);
     thrust::device_ptr<float> dev_out_ptr(d_out);

     thrust::exclusive_scan(dev_ptr, dev_ptr + size, dev_out_ptr);
}

Here d_in and d_out are prepared using cudaMalloc and d_in is filled with data using cudaMemcpy before calling this function


Solution

  • Does thrust library ensure that there wont be a memcpy from host to device in the process?

    The code you've shown shouldn't involve any host->device copying. (How could it? There are no references anywhere to any host data in the code you have shown.)

    For actual codes, it's easy enough to verify the underlying CUDA activity using a profiler, for example:

    nvprof --print-gpu-trace ./my_exe
    

    If you keep your profiled code sequences short, it's pretty easy to line up the underlying CUDA activity with the thrust code that generated that activity. If you want to profile just a short segment of a longer sequence, then you can turn profiling on and off or else use NVTX markers to identify the desired range in the profiler output.