Search code examples
c++templatesc++11cudatemplate-aliases

C++11 alias templates in CUDA


The essential question is are alias templates supported by the CUDA compiler?

I am using CUDA 7.5 on Ubuntu with gcc-4.8. All of my template classes are defined in header files and #included into a single translation unit during compilation.

I have a simple cuda_array class that provides a thin wrapper around a std::vector. It's essentially a very simple version of thrust::host_vector combined with a thrust::device_vector. Its declaration is

template <typename T, const size_t N>
class cuda_array {
    std::vector<T> host;
    T *device;
public:
    // lots of type aliases to meet container requirements
    void push() { /* cudaMemcpy(...,H2D); */ }
    void pull() { /* cudaMemcpy(...,D2H); */ }
    // a few others that aren't relevant here
};

To make a matrix, I just made a quick template alias.

template <typename T, const size_t M, const size_t N>
using cuda_matrix = cuda_array<T, M * N>;

I want to map my matrix-vector multiplication CUDA kernel onto the overloaded operator* for type safety and easy use (it is left to the caller to ensure that push and pull are called correctly).

template <typename T, const size_t rows, const size_t cols>
__global__ void matrix_vector_mul(T *A, T *b, T *result) {
     __shared__ T shared_b[cols];
    // rest of it
}

template <typename T, const size_t M, const size_t N>
__host__ cuda_array<T, M> operator*(cuda_matrix<T, M, N> &m, cuda_array<T, N> &v) {
    cuda_array<T, M> result;
    matrix_vector_mul<T, M, N><<<16, 32>>>(m.device_data(), v.device_data(), result.device_data());
    return result;
}

In my 'main.cpp', I then have

cuda_matrix<int,16,32> A;
cuda_array<int,32> b;
auto result = A * b;

The last line throws an error saying

error: no operator "*" matches these operands
        operand types are: cuda_matrix<int, 16UL, 32UL> * cuda_array<int, 32UL>

I chased down all of the usual suspects for template type deduction errors I could think of, but nothing worked. In desperation, I converted my cuda_matrix alias template into a template class.

template <typename T, const size_t M, const size_t N>
class cuda_matrix : public cuda_array<T, M * N> {};

And the compile error disappears! It therefore seems that CUDA does not yet support alias templates. Or did I do something silly that I can't figure out?


Solution

  • You must remember that:

    § 14.5.7 [temp.alias]/p2:

    When a template-id refers to the specialization of an alias template, it is equivalent to the associated type obtained by substitution of its template-arguments for the template-parameters in the type-id of the alias template. [ Note: An alias template name is never deduced. — end note ]

    This means that deduction is not performed for:

    template <typename T, const size_t M, const size_t N>
    __host__ cuda_array<T, M> operator*(cuda_matrix<T, M, N> &m, cuda_array<T, N> &v)
    

    but for:

    template <typename T, const size_t M, const size_t N>
    __host__ cuda_array<T, M> operator*(cuda_array<T, M * N> &m, cuda_array<T, N> &v)
    //                                  ~~~~~~~~~~~~~~~~~~~^
    

    And so:

    § 14.8.2.5 [temp.deduct.type]/p16:

    If, in the declaration of a function template with a non-type template parameter, the non-type template parameter is used in a subexpression in the function parameter list, the expression is a non-deduced context as specified above.

    M is in a non-deducible context, hence this operator* is not considered as a viable overload.

    As one of the workarounds, you can instead verify the deduced value for cuda_array itself:

    template <typename T, std::size_t MN, std::size_t N>
    auto operator*(const cuda_array<T, MN>& m, const cuda_array<T, N>& v)
        -> typename std::enable_if<(MN/N)*N==MN, cuda_array<T, MN/N>>::type;
    

    or use the inheritance trick that you already have; then M and N are separate non-type template parameters of cuda_matrix.