Search code examples
c++vectorsyclintel-oneapidpc++

Matrix Multiplication on SYCL using 2D std::vector


I'm new to SYCL and C++. This is my kernel for simple matrix multiplication using 2D std::vector.


void MatrixMulParallel(queue& q, 
    const std::vector<std::vector<double>>& a_host,
    const std::vector<std::vector<double>>& b_host,
    std::vector<std::vector<double>>& c_gpu) {
    /*
        To Multiply: C[M][P] = A[M][N] * B[N][P]
    */
    PROFILE_FUNCTION();
    try {
        size_t M = a_host.size();
        size_t N = a_host[0].size();
        size_t P = b_host[0].size();
        // Create device buffers for A, B, C
        buffer a(a_host.data(), range<2>{M, N});
        buffer b(b_host.data(), range<2>{N, P});
        buffer c(c_gpu.data(), range<2>{M, P});

        PROFILE_SCOPE("Starting Multiply on GPU");
        std::cout << "GPU::Multiplying A and B into C.\n";
        auto e = q.submit([&](handler& h) {

            auto A = a.get_access<access::mode::read>(h);
            auto B = b.get_access<access::mode::read>(h);
            auto C = c.get_access<access::mode::write>(h);

            h.parallel_for(range<2>{M, P}, [=](id<2> index) {
                // index[0] allows accessing ROW index, index[1] is column index
                
                int row = index[0];
                int col = index[1];
                auto sum = 0.0;
                for (int i = 0; i < N; i++)
                    sum += A[row][i] * B[i][col]; // Error #1
                C[index] = sum; // Error #2
                });
            });
        e.wait();
    }
    catch (sycl::exception const& e) {
        std::cout << "An exception is caught while multiplying matrices.\n";
        terminate();
    }
}

I get two errors, indicated along the lines:

  1. Error #1: invalid operands to binary expression ('const std::vector<double, std::allocator<double>>' and 'const std::vector<double, std::allocator<double>>')
  2. Error #2: no viable overloaded '='

I tried looking for errors similar to invalid operands for binary expression (...), but none of them seem to help debug my specific case. Maybe because this ain't beginner-friendly.

From what I understood so far, a_host.data() shows a return type std::vector<double> (shouldn't it be std::vector< std::vector<double> >?).

I have tried using std::array with statically known sizes, and it works.

How can I make this work using 2D std::vector?

Any help would be appreciated.


Solution

  • A 2D std::vector<std::vector<T>> does not have elements stored contiguously in the memory.

    A better way around would be to declare std::vector<T> with sizes M*N, i.e. linear arrays, and operate on them as contiguous blocks.

    Since the destination vector C, is supposed to be 2D, create a kernel that indexes in both rows and cols. SYCL index actually fills up in linearly-accessible memory blocks.

    Here's what I did to make it work using std::vector:

    template <typename T>
    void MatrixMulParallelNaive(queue& q, 
        const std::vector<T>& a_host,
        const std::vector<T>& b_host,
        std::vector<T>& c_gpu) {
        /*
            To Multiply: C[M][P] = A[M][N] * B[N][P]
        */
        PROFILE_FUNCTION();
        try {
            
            buffer<double, 1> a(a_host.data(), range<1>{a_host.size()}); // 1D
            buffer<double, 1> b(b_host.data(), range<1>{b_host.size()}); // 1D
            buffer<double, 2> c(c_gpu.data(), range<2>{M, P}); // Create 2D buffer
            PROFILE_SCOPE("Starting Multiply on GPU");
            std::cout << "GPU::Multiplying A and B into C.\n";
            auto e = q.submit([&](handler& h) {
    
                auto A = a.get_access<access::mode::read>(h);
                auto B = b.get_access<access::mode::read>(h);
                auto C = c.get_access<access::mode::write>(h);
                
                h.parallel_for(range<2>{M, P}, [=](id<2> index) {
                    // Threading index that iterates over C.
                    int row = index[0];
                    int col = index[1];
                    auto sum = 0.0;
                    // Compute result of ONE element of C
                    for (int i = 0; i < N; i++)
                        sum += A[row * M + i] * B[i * N + col];
                    C[index] = sum;
                    });
                });
            e.wait();
        }
        catch (sycl::exception const& e) {
            std::cout << "An exception is caught while multiplying matrices.\n";
            terminate();
        }
    }