Search code examples
c++cudagpgputhrust

CUDA thrust: copy from device to device


I have a memory array allocated in CUDA using standard CUDA malloc and it is passed to a function as follows:

void MyClass::run(uchar4 * input_data)

I also have a class member which is a thrust device_ptr declared as:

thrust::device_ptr<uchar4> data = thrust::device_malloc<uchar4(num_pts);

Here num_pts is the number of values in the array and the input_data pointer is guaranteed to be num_pts long.

Now, I would like to copy the input array into the thrust_device_ptr. I have looked at the thrust documentation and a lot of it is talking about copying from device to host memory and vice versa. I was wondering what would be the most performance optimal way to do this device to device copy on thrust or should I just use cudaMemcpy?


Solution

  • The canonical way to do this is just to use thrust::copy. The thrust::device_ptr has standard pointer semantics and the API will seamlessly understand whether the source and destination pointers are on the host or device, viz:

    #include <thrust/device_malloc.h>
    #include <thrust/device_ptr.h>
    #include <thrust/copy.h>
    #include <iostream>
    
    int main()
    {
        // Initial host data
        int ivals[4] = { 1, 3, 6, 10 };
    
        // Allocate and copy to first device allocation
        thrust::device_ptr<int> dp1 = thrust::device_malloc<int>(4);
        thrust::copy(&ivals[0], &ivals[0]+4, dp1);
    
        // Allocate and copy to second device allocation
        thrust::device_ptr<int> dp2 = thrust::device_malloc<int>(4);
        thrust::copy(dp1, dp1+4, dp2);
    
        // Copy back to host
        int ovals[4] = {-1, -1, -1, -1};
        thrust::copy(dp2, dp2+4, &ovals[0]);
    
        for(int i=0; i<4; i++)
            std::cout << ovals[i] << std::endl;
    
    
        return 0;
    }
    

    which does this:

    talonmies@box:~$ nvcc -arch=sm_30 thrust_dtod.cu 
    talonmies@box:~$ ./a.out 
    1
    3
    6
    10