Search code examples
coptimizationcudathrust

Is it possible to use thrust::device_vector and thrust::fill for 2D arrays using thrust library in CUDA


I am new to use thrust library. I have my CUDA C code which uses global 2D arrays. I am initializing it using kernel function in my code.

I have to know whether is it possible to use thrust::device_vector or thrust::fill to initialize and fill 2D arrays.

For example:

// initialize 1D array with ten numbers in a device_vector 
    thrust::device_vector<int> D(10);

Is it possible to give..

thrust::device_vector<int> D[5][10];

and if its possible how will I use thrust::fill function.

My aim is to optimize the code using thrust library.


Solution

  • In STL and thrust, a vector is a container of data elements, adhering to a strict linear sequence, therefore it is basically 1-D in nature. In thrust, these data elements can be ordinary types, and even structs and objects, but they cannot be other vectors (unlike STL).

    You can create an array of vectors, but thrust operations on them will generally need to be done one-by-one on each vector within the array.

    Regarding syntax, you cannot do this:

    thrust::device_vector D[5][10];
    

    You can do something like this:

    thrust::device_vector<int> D[5][10];
    

    However this will create a 2-D array of vectors, which is not what you want, I don't think.

    In many cases, 2-D arrays can be "flattened" to be handled as if they were 1-dimensional, and without knowing more about your situation, this is what I would recommend investigating. If you can treat your 2-D array as if it were 1-D perhaps using pointer indexing, then you can fill the whole thing with a single thrust::fill call, for example.

    I would also recommend becoming familiar with the thrust quick start guide.

    Here is a worked example showing a 2D array on the host with rudimentary flattening:

    #include <thrust/host_vector.h>
    #include <thrust/device_vector.h>
    #include <thrust/sequence.h>
    
    #define H 5
    #define W 10
    __global__ void kernel(int *data, int row, int col) {
    
      printf("Element (%d, %d) = %d\n", row, col, data[(row*W)+col]);
    
    }
    
    int main(void)
    {
        int h[H][W];
        thrust::device_vector<int> d(H*W);
    
        thrust::copy(&(h[0][0]), &(h[H-1][W-1]), d.begin());
        thrust::sequence(d.begin(), d.end());
        kernel<<<1,1>>>(thrust::raw_pointer_cast(d.data()), 2, 3);
        cudaDeviceSynchronize();
    
        return 0;
    }