Search code examples
tbbsyclintel-oneapidpc++

ERROR: implicit capture of 'this' is not allowed for kernel functions, SYCL, DPCPP


I try to write a kind of "map" class that wraps OneAPI calls hiding hardware targeting issues through some parameter specifying the kind of target (CPU or GPU/Accelerator). The map, directs code to SYCL kernel or to TBB to implement the map operation through a parallel for. It, takes as parameters device type, CPU or GPU, and the function and applies to all the items in the collection. But in the kernel function, I have an error which is implicit capture is not allowed. I cannot understand what is my mistake. this is my code:

    #include <CL/sycl.hpp>
    #include <iostream>
    #include <tbb/tbb.h>
    #include <tbb/parallel_for.h>
    #include <vector>
    #include <string>
    #include <queue>
    #include<tbb/blocked_range.h>
    #include <tbb/global_control.h>
    
    using namespace std;
    using namespace cl::sycl;
    using namespace tbb;
    
    template<typename Tin, typename Tout>
    class Map {
    private:
        function<Tout(Tin)> fun;
        string device_type;
    public:
        Map() {}
        Map(function<Tout(Tin)> f):fun(f) {}
        void f(function<Tout(Tin)> ff) {
            fun = ff;
           }
        void set_device(string dev) {
                device_type = dev;
            }
    
    
        vector<Tout> operator()(vector<Tin>& v) {
            device *my_dev = new device();
            if(device_type == "cpu"){
                if(my_dev->is_cpu()) {
                    vector<Tout> r(0);
                    tbb::parallel_for(tbb::blocked_range<Tin>(0, v.size()),
                            [&](tbb::blocked_range<Tin> t) {
                        for (int index = t.begin(); index < t.end(); ++index){
                            r[index] = fun(v[index]);
                        }
                });
               return r;
             }
            }else if(device_type == "gpu"){
                if(my_dev->is_gpu()) {
                    vector<Tout> r(v.size());
                    sycl::queue gpuQueue{gpu_selector()};
                    sycl::range<1> n_item{v.size()};
                    sycl::buffer<Tin, 1> in_buffer(&v[0], n_item);
                    sycl::buffer<Tout, 1> out_buffer(&r[0], n_item);
                    gpuQueue.submit([&](sycl::handler& h){
                         //local copy of fun                        
                        //auto f = fun;
                        sycl::accessor in_accessor(in_buffer, h, sycl::read_only);
                        sycl::accessor out_accessor(out_buffer, h, sycl::write_only);
                        h.parallel_for(n_item, [=](sycl::id<1> index) {
                            out_accessor[index] = fun(in_accessor[index]);
                        });
                    }).wait();
                    return r;
                }
    
            }
        }
    
    };
    
    int main(int argc, char *argv[]) {
    
    
        vector<int> v = {1,2,3,4,5,6,7,8};
    
        auto f = [](int x){return (++x);};
    
        sycl::device dev = sycl::cpu_selector().select_device();
        string dev_type = argv[1];
        Map <int,int> m(f);
        m.set_device(dev_type);
        auto r = m(v);
        for(auto &e:r) {
            cout << e << "\n";
        }
    
      return 0;
    }

When I check the Problems in the console of Eclipse, It shows me this error:

1- implicit capture of 'this' is not allowed for kernel functions


Solution

  • You are trying to access fun in your kernel, a member variable of Map. Member variables are accessed in C++ using the this pointer. Lambdas don't capture the this pointer by default in C++, hence the error message.

    However, even if you were to capture this in your kernel it wouldn't work because this will point to host memory which in general is not accessible on device.

    One very simple fix for this is usually to just use local copies in your kernel:

    class X {
      void run(sycl::queue& q){
        q.submit([&](sycl::handler& cgh){
          int local_var = var; // Note: This can also be expressed using the lambda capture list
          cgh.parallel_for(..., [=](...){ /* use local_var here*/});
        });
      }
    
      int var;
    };
    

    Starting with C++17 you can also just capture the class by copy: [*this](...){...}.

    The more fundamental problem with your code is that the SYCL specification does not allow the use of std::function inside device code. In some cases and for some SYCL implementations it might work (e.g. for host backends), but this is an extension. The problem is that the implementation of std::function typically uses mechanisms that cannot be supported on device for type erasure such as dynamic polymorphism.

    One solution might be to include the type of the function in the class template arguments instead of using std::function.