Search code examples
c++openclsycl

How is control flow extracted from a SYCL kernel?


Using SYCL to run code on any OpenCL device doesn't require a custom compiler, as everything is done in a library (full of template magic), and a standard GCC/Clang will do just fine. Is this correct? (Especially in the case of triSYCL, which I'm using...)

If so... I know that simple expression trees can be extracted by overloading a bunch of operators on custom "handle" or "wrapper" classes, but this is not the case with control flow. Am I wrong?

Section 3.1 of this paper discusses the pros and cons of a few different approaches to adding EDSLs to C++, but I'm more interested in the actual technical implementation of the method SYCL uses.

I tried to look at the source at some SYCL-related projects (Eigen, TensorFlow, triSYCL, ComputeCpp, etc.) but so far I could not find the answer in them.

So: How can a SYCL library(?) discover the full control flow graph of a kernel, given as an ordinary C++ lambda, without needing a custom/extended compiler?


Solution

  • I think you are right.

    If you compile SYCL for CPU, since SYCL is a pure C++ executable DSEL, you can have an implementation that just uses a normal C++ compiler. This is how triSYCL works for example. https://github.com/triSYCL/triSYCL

    I do not know the detail about ComputeCpp. On https://github.com/triSYCL/triSYCL/blob/master/doc/about-sycl.rst there is a link about a very interesting but old presentation:

    Implementing the OpenCL SYCL Shared Source C++ Programming Model using Clang/LLVM, Gordon Brown. November 17, 2014, Workshop on the LLVM Compiler Infrastructure in HPC, SuperComputing 2014 http://www.codeplay.com/public/uploaded/publications/SC2014_LLVM_HPC.pdf

    In the case triSYCL is targeting a device, there is also a device compiler. I have to push a new version with a design document... In the meantime, you can look at https://github.com/triSYCL/triSYCL/tree/device https://github.com/triSYCL/llvm https://github.com/triSYCL/clang

    sycl-gtx is using some SYCL syntax extensions based on macros to have a meta-representation of the control flow in the kernel, as shown for example on this example: https://github.com/ProGTX/sycl-gtx/blob/master/tests/regression/work_efficient_prefix_sum.cpp