Search code examples
c++lambdaintel-oneapisycldpc++

cannot capture the struct value inside of the kernal function


It is so strange and I am struggling with this problem for the whole week. I just want to use the variable which is defined inside of the struct constructor, but fail to do that. The simple code is here:

#include <CL/sycl.hpp>
#include <fstream>
#include <cstdlib>
#include <stdio.h>
#include <stdlib.h>

#define ghost 3
using namespace cl::sycl;

struct test
{
    int ls[3];
    queue Q{};
    test()
    {
        ls[0] = ghost;
        ls[1] = ghost;
        ls[2] = ghost;
    }
    void calculate();
};

void test::calculate()
{
    size_t lx = 10;
    size_t ly = 10;

    auto abc = Q.submit([&](handler &h)
                        {
        sycl::stream out(1024, 256, h);
        h.parallel_for(range{lx, ly}, [=, lsq = this->ls](id<2> idx)
                       { out << "this is id1" << lsq[1] << "\n"; }); });
}

int main()
{
    test t1;
    t1.calculate();
    return 0;
}

Someone from the DPC++ community told me this method to capture this pointer, but I don't why it does not work well.


Solution

  • According to 4.12.4. Rules for parameter passing to kernels from SYCL 2020 Specification the array of scalar values can be passed as a kernel parameter. But the problem is in the capturing of struct member:

    [lsq = this->ls]
    

    is equivalent to

    auto lsq = this->ls;
    

    In this case, the type of lsq is int* and it will contain the address of test::ls in the host memory. The access to the elements of the array in the kernel will lead to the undefined behavior.

    There are two possible solutions here:

    Solution 1

    Create a local references to the test::ls and pass it to the kernel by value:

    void test::calculate() {
        size_t lx = 10;
        size_t ly = 10;
    
        auto abc = Q.submit([&](handler &h) {
            sycl::stream out(1024, 256, h);
            auto& lsq = this->ls;
            h.parallel_for(range{lx, ly}, [=](id<2> idx) {
                out << "this is id1: " << lsq[1] << "\n"; 
            });
        });
    }
    

    In this case, the captured variable (lsq) will have int[3] type and will be correctly initialized in the kernel.

    Solution 2

    Use std::array or sycl::marray instead of C array:

    #define ghost 3
    using namespace cl::sycl;
    
    struct test {
        marray<int, 3> ls;
        queue Q;
        test() {
            ls[0] = ls[1] = ls[2] = ghost;
        }
    
        void calculate() {
            size_t lx = 10;
            size_t ly = 10;
    
            auto abc = Q.submit([&](handler& h) {
                sycl::stream out(1024, 256, h);
                h.parallel_for(range{ lx, ly }, [=, lsq = this->ls](id<2> idx) {
                    out << "this is id1: " << lsq[1] << "\n";
                });
            });
        }
    };