Search code examples
c++tbbsycldpc++

ERROR: kernel parameter has non-trivially copy constructible class/struct type+sycl+tbb


I try to provide a kind of "map" skeleton that wraps OneAPI calls hiding hardware targeting issues through some parameter specifying the kind of target (CPU or GPU/Accelerator). my Map skeleton pass function and its derivative with initial point to the Newton method. but I have an error which is:

kernel parameter has non-trivially copy constructible class/struct type 'std::function<double (double)>'

and my code is:

         #include <CL/sycl.hpp>
#include <iostream>
#include <tbb/tbb.h>
#include <tbb/parallel_for.h>
#include <tbb/parallel_reduce.h>
#include <vector>
#include <string>
#include <queue>
#include<tbb/blocked_range.h>
#include <tbb/global_control.h>
#include <chrono>
#include "uTimer.cpp"
#include <cmath>
#include <random>
#include <ctime>
#include <numeric>
#include <cstdlib>

//#include <dos.h> //for delay
//#include <conio.h> //for getch()
//#include <complex>
#define EPSILON 0.000001 // The step size across the X and Y axis

using namespace tbb;

class Clock {
private:
    typedef std::chrono::high_resolution_clock clock;
    std::chrono::time_point<clock> t;
public:
    Clock() {
        start();
    }
    void start() {
        t = clock::now();
    }
    double stop() const {
        return std::chrono::duration_cast<std::chrono::duration<double>>(
                clock::now() - t).count();
    }
};


//std::complex<double> mycomplex(10.0, 2.0);

template<class Tin, class Tout>
class Map {
        private:
                std::function<Tout(Tin)> fun;
                std::function<Tout(Tin)> dfun;
        public:
                Map() {};
                Map(std::function<Tout(Tin)> f, std::function<Tout(Tin)> df) {
                    fun = f;
                    dfun = df;
                };


    void operator()(bool use_tbb, Tin &x1) {
        int iter=100;
        Tout x;
        if (use_tbb) {
            uTimer *timer = new uTimer("Executing Code On CPU");
            tbb::parallel_for(tbb::blocked_range < int > (0, iter),
                    [&](tbb::blocked_range<int> t) {
                        for (int index = t.begin(); index < t.end(); ++index) {
                            do
                            {
                                x = x1;
                                x1 = x - (fun(x) / dfun(x));
                            }while (std::abs(x1 - x) >= EPSILON);
                        }
                });
            timer->~uTimer();
        }else {
            sycl::buffer<Tin, 1> x1_buffer(&x1, iter);
            sycl::buffer<Tout, 1> x_buffer(&x, iter);
            //Profiling GPU

            // Initialize property list with profiling information
            sycl::property_list propList {
                    sycl::property::queue::enable_profiling() };
            // Build the command queue (constructed to handle event profling)
            sycl::queue gpuQueue = cl::sycl::queue(sycl::gpu_selector(),
                    propList);
            // print out the device information used for the kernel code
            std::cout << "Device: "
                    << gpuQueue.get_device().get_info<sycl::info::device::name>()
                    << std::endl;

            std::cout << "Compute Units: "
                    << gpuQueue.get_device().get_info<
                            sycl::info::device::max_compute_units>()
                    << std::endl;

            auto start_overall = std::chrono::system_clock::now();
            auto event = gpuQueue.submit([&](sycl::handler &h) {
                //local copy of fun
                auto f = fun;
                auto df = dfun;
                sycl::accessor x1_accessor(x1_buffer, h, sycl::read_write);
                sycl::accessor x_accessor(x_buffer, h, sycl::read_write);
                h.parallel_for(iter, [=](sycl::id<1> index) {
                    do
                    {
                        x_accessor[index] = x1_accessor[index];
                        x1_accessor[index] = x_accessor[index] - (f(x_accessor[index]) / df(x_accessor[index]));
                    }while (sycl::fabs(f(x1_accessor[index]))>= EPSILON);

                });
            });
            event.wait();
            auto end_overall = std::chrono::system_clock::now();
            cl_ulong submit_time = event.template get_profiling_info<
                    cl::sycl::info::event_profiling::command_submit>();
            cl_ulong start_time = event.template get_profiling_info<
                    cl::sycl::info::event_profiling::command_start>();
            cl_ulong end_time = event.template get_profiling_info<
                    cl::sycl::info::event_profiling::command_end>();
            auto submission_time = (start_time - submit_time) / 1000000.0f;
            std::cout << "Submit Time: " << submission_time << " ms"
                    << std::endl;
            auto execution_time = (end_time - start_time) / 1000000.0f;
            std::cout << "Execution Time: " << execution_time << " ms"
                    << std::endl;
            auto execution_overall = std::chrono::duration_cast
                    < std::chrono::milliseconds > (end_overall - start_overall);
            std::cout << "Overall Execution Time: " << execution_overall.count()
                    << " ms" << std::endl;
        };
    };
};





int main(int argc, char *argv[]) {

    //Define a function
    auto f = [](double x) {return pow(x,3);};
    //Define the derivative of function
    auto df = [](double x) {return pow(x, 2) *3;};
    //Define an instance of Map class
    auto m1 = Map<double, double>(f, df);
    double x1 = 3;
    m1(true, x1);
    //print the result
    //for (auto &e : r) {
        //std::cout << e << " ";
    //}
    return 0;
}

In addition, if we do not consider to an error, I think something in my code seems is not correct but I cannot not understand what it is.


Solution

  • You cannot do what you want. If you tried getting rid of std::function and using function pointers you still wouldn't be able to (even if it would be trivially copyable). In SYCL as in any other such language (CUDA, hip, OpenCL,...), the device compiler needs to be able to compile all the functions executed/called by kernel. So no, you cannot pass a function "in". It boils down to one of your previous questions answered here

    You could try defining your lambdas as functions somewhere else and then calling them from your kernel. If you want to be able to choose at runtime between various functions, you could write a templated kernel (with let's say an enum) and dispatch your call through an if constexpr (in the kernel) to avoid runtime costs (and code deduplication). At the end that would instantiate n SYCL kernels, each calling one of your functions. They would be properly compiled by the device compiler, etc.