Search code examples
cudathrust

Using thrust to handle vectors in CUDA classes?


I have a question about the applicability of the thrust into c++ classes. I am trying to implement a class object that receives (x,y,z) coordinates of vertices as ver1, ver2, and ver3. Then, assigns to a triangle and calculates area and normal vector.

However, I didn’t quite understand how to create a class of thrust vectors.

Here are the coordinates of the vertices I read it from a file and I would like to send them to a class that will assign them into triangles. Here we are in main.

        thrust::host_vector<double> dum(start, end); //dum has coordinates and I create
vertices from it.
        thrust::host_vector<double> ver1(dum.begin(),dum.begin()+3); //initilizing elements in CPU.
        thrust::host_vector<double> ver2(dum.begin()+3,dum.begin()+6);
        thrust::host_vector<double> ver3(dum.begin()+6,dum.end());

        thrust::device_vector<double> ver1_gpu = ver1; //copying CPU vectors to GPU vectors.
        thrust::device_vector<double> ver2_gpu = ver2;
        thrust::device_vector<double> ver3_gpu = ver3;
        triangle(ver1_gpu, ver2_gpu, ver3_gpu); 

In the triangle class, I tried to initialize 3 vertices that have all zeros for their first 3 elements. Since each vertices have 3 coordinates.(x, y, and z). and I also initialize area and normal variables.

class triangle
{
    thrust::device_vector<double>v1(3,0);
    thrust::device_vector<double>v2(3,0);
    thrust::device_vector<double>v3(3,0);
    thrust::device_vector<double>E1(3,0);
    thrust::device_vector<double>E2(3,0);
    thrust::device_vector<double>E3(3,0);
    double normal;
    double dummy
    double area;

public:
    __device__ __host__ triangle(device_vector<double>vert1, device_vector<double>vert2, device_vector<double>vert3)
    {
        triangle.v1 = vert1;
        triangle.v2 = vert2;
        triangle.v3 = vert3;
        triangle.E1 = vert2 - vert1;
        triangle.E2 = vert3 - vert1;
        dummy = cross(obj.E2, obj.E1);%% Cross product
        triangle.Area = norm(dummy) / 2;
        triangle.Normal = dummy / norm(dummy);
    }
};

I’d like to do all of my calculations in the device. I am new to cuda and its libraries and I know I am wrong in many places but I seek your help.


Solution

  • The following code is intended to show how to avoid unnecessary copies through move semantics (not Thrust specific) and initialization through clever use of Thrusts "fancy iterators" (thrust::transform_iterator and thrust::zip_iterator) in a C++ class leveraging Thrust vectors to offload computation. As our class is supposed to handle many triangles at once to utilize the resources of a modern GPU and recuperate the associated overheads, it is named Triangles. To achieve good performance in bandwidth-bound applications like this on the GPU, coalescing of global memory accesses is key. A straightforward way of achieving this is to use so-called Struct-of-Array (SoA) semantics, e.g.

    struct SoA_example {
        double x[N];
        double y[N];
        double z[N];
    };
    

    instead of Array-of-Structs semantics, e.g.

    struct Vertex {
        double x;
        double y;
        double z;
    }
    Vertex AoS_example[N];
    

    This vocabulary somewhat clashes with the names of the C++ and Thrust containers used below, as our "array" is the thrust::device_vector while std::array is used as a "struct".

    Depending on the context one could think of many other constructors than the one showed below that e.g. handle data transfer from host to device, read values from file or handle (host) input in AoS format.

    OPs question defines dummy and normal as scalars which doesn't make sense mathematically. The cross-product of two vectors is another vector. I corrected this here.

    The following code is untested but compiles.

    #include <thrust/device_vector.h>
    #include <thrust/iterator/transform_iterator.h>
    #include <thrust/iterator/zip_iterator.h>
    #include <thrust/zip_function.h>
    
    #include <array>
    #include <cstddef>
    #include <utility>
    
    template <typename T>
    struct CrossProduct {
        __host__ __device__ T operator()(T a, T b, T c, T d) const {
            return a * b - c * d;
        }
    };
    
    template <typename T>
    struct TriangleArea {
        __host__ __device__ T operator()(T x, T y, T z) const {
            return sqrt(x * x + y * y + z * z) / 2;
        }
    };
    
    template <typename T>
    struct NormalizeUsingArea {
        __host__ __device__ T operator()(T &val_x, T &val_y, T &val_z,
                                         T area) const {
            T norm = 0.5 / area;
            val_x *= norm;
            val_y *= norm;
            val_z *= norm;
        }
    };
    
    template <typename T>
    class Triangles {
        static constexpr int x_dim = 0;
        static constexpr int y_dim = 1;
        static constexpr int z_dim = 2;
        static constexpr int n_dims = 3;
    
        using Container = thrust::device_vector<T>;
        using Vectors = std::array<Container, n_dims>;
    
        std::ptrdiff_t n_triangles{};
        Vectors v1;
        Vectors v2;
        Vectors v3;
    
        Vectors E1;
        Vectors E2;
        // Vectors E3;
    
        // Vectors dummies;
        Vectors normals;
        Container areas;
    
        // helper functions
        auto make_minus_iterator(Vectors &vs_1, Vectors &vs_2, int component) {
            return thrust::make_transform_iterator(
                thrust::make_zip_iterator(vs_1[component].cbegin(),
                                          vs_2[component].cbegin()),
                thrust::make_zip_function(thrust::minus<T>{}));
        }
    
        template <int component>
        auto make_crossprod_iterator(Vectors &vs_1, Vectors &vs_2) {
            static_assert(component >= x_dim && component < n_dims);
            static constexpr int dim_1 = (component + 1) % n_dims;
            static constexpr int dim_2 = (component + 2) % n_dims;
            return thrust::make_transform_iterator(
                thrust::make_zip_iterator(
                    vs_1[dim_1].cbegin(), vs_2[dim_2].cbegin(),
                    vs_1[dim_2].cbegin(), vs_2[dim_1].cbegin()),
                thrust::make_zip_function(CrossProduct<T>{}));
        }
    
        auto make_area_iterator(Vectors &crossproduct) {
            return thrust::make_transform_iterator(
                thrust::make_zip_iterator(crossproduct[x_dim].cbegin(),
                                          crossproduct[y_dim].cbegin(),
                                          crossproduct[z_dim].cbegin()),
                thrust::make_zip_function(TriangleArea<T>{}));
        }
    
        auto make_zip_iterator(Vectors &vecs, const Container &scalars) {
            return thrust::make_zip_iterator(vecs[x_dim].begin(),
                                             vecs[y_dim].begin(),
                                             vecs[z_dim].begin(), scalars.cbegin());
        }
    
       public:
        // The following constructor is just an example based on OPs constructor.
        // Use fancy iterators to avoid unnecessary initialization to 0 of E1, E2,
        // ... Depending on the use case it might make more sense to just have the
        // iterators as members and compute E1, E2, etc on the fly when needed and
        // get rid of their Vectors members (kernel fusion).
        Triangles(
            thrust::device_vector<T> &&vert1_x, thrust::device_vector<T> &&vert1_y,
            thrust::device_vector<T> &&vert1_z, thrust::device_vector<T> &&vert2_x,
            thrust::device_vector<T> &&vert2_y, thrust::device_vector<T> &&vert2_z,
            thrust::device_vector<T> &&vert3_x, thrust::device_vector<T> &&vert3_y,
            thrust::device_vector<T> &&vert3_z)
            : n_triangles{static_cast<std::ptrdiff_t>(vert1_x.size())},
              // move device_vectors with vertices into class (avoids expensive
              // copies)
              v1{std::move(vert1_x), std::move(vert1_y), std::move(vert1_z)},
              v2{std::move(vert2_x), std::move(vert2_y), std::move(vert2_z)},
              v3{std::move(vert3_x), std::move(vert3_y), std::move(vert3_z)},
              // calculate diffs and initialize E1, E2 with them
              E1{Container(make_minus_iterator(v2, v1, x_dim),
                           make_minus_iterator(v2, v1, x_dim) + n_triangles),
                 Container(make_minus_iterator(v2, v1, y_dim),
                           make_minus_iterator(v2, v1, y_dim) + n_triangles),
                 Container(make_minus_iterator(v2, v1, z_dim),
                           make_minus_iterator(v2, v1, z_dim) + n_triangles)},
              E2{Container(make_minus_iterator(v3, v1, x_dim),
                           make_minus_iterator(v3, v1, x_dim) + n_triangles),
                 Container(make_minus_iterator(v3, v1, y_dim),
                           make_minus_iterator(v3, v1, y_dim) + n_triangles),
                 Container(make_minus_iterator(v3, v1, z_dim),
                           make_minus_iterator(v3, v1, z_dim) + n_triangles)},
              // calculate cross-products and initialize normals with them(normalize
              // later)
              normals{
                  Container(make_crossprod_iterator<x_dim>(E2, E1),
                            make_crossprod_iterator<x_dim>(E2, E1) + n_triangles),
                  Container(make_crossprod_iterator<y_dim>(E2, E1),
                            make_crossprod_iterator<y_dim>(E2, E1) + n_triangles),
                  Container(make_crossprod_iterator<z_dim>(E2, E1),
                            make_crossprod_iterator<z_dim>(E2, E1) + n_triangles)},
              // calculate areas and initialize with them
              areas(make_area_iterator(normals),
                    make_area_iterator(normals) + n_triangles) {
            // normalize normals
            thrust::for_each_n(
                make_zip_iterator(normals, areas), n_triangles,
                thrust::make_zip_function(NormalizeUsingArea<double>{}));
        }
    };
    
    // expicit instantiation to find compilation errors on godbolt.com
    template class Triangles<double>;