Using SYCL to run code on any OpenCL device doesn't require a custom compiler, as everything is done in a library (full of template magic), and a standard GCC/Clang will do just fine. Is this correct? (Especially in the case of triSYCL, which I'm using...)
If so... I know that simple expression trees can be extracted by overloading a bunch of operators on custom "handle" or "wrapper" classes, but this is not the case with control flow. Am I wrong?
Section 3.1 of this paper discusses the pros and cons of a few different approaches to adding EDSLs to C++, but I'm more interested in the actual technical implementation of the method SYCL uses.
I tried to look at the source at some SYCL-related projects (Eigen, TensorFlow, triSYCL, ComputeCpp, etc.) but so far I could not find the answer in them.
So: How can a SYCL library(?) discover the full control flow graph of a kernel, given as an ordinary C++ lambda, without needing a custom/extended compiler?
I think you are right.
If you compile SYCL for CPU, since SYCL is a pure C++ executable DSEL, you can have an implementation that just uses a normal C++ compiler. This is how triSYCL works for example. https://github.com/triSYCL/triSYCL
I do not know the detail about ComputeCpp. On https://github.com/triSYCL/triSYCL/blob/master/doc/about-sycl.rst there is a link about a very interesting but old presentation:
Implementing the OpenCL SYCL Shared Source C++ Programming Model using Clang/LLVM, Gordon Brown. November 17, 2014, Workshop on the LLVM Compiler Infrastructure in HPC, SuperComputing 2014 http://www.codeplay.com/public/uploaded/publications/SC2014_LLVM_HPC.pdf
In the case triSYCL is targeting a device, there is also a device compiler. I have to push a new version with a design document... In the meantime, you can look at https://github.com/triSYCL/triSYCL/tree/device https://github.com/triSYCL/llvm https://github.com/triSYCL/clang
sycl-gtx is using some SYCL syntax extensions based on macros to have a meta-representation of the control flow in the kernel, as shown for example on this example: https://github.com/ProGTX/sycl-gtx/blob/master/tests/regression/work_efficient_prefix_sum.cpp