Search code examples
c++dpc++

How to call oneMKL's DFT in sycl kernel (Intel GPU)(in Windows)


I can currently in the CPU(host) use oneMKL's DFT function to implement data calculations. But now I want to implement oneMKL's DFT operation in the device GPU. I tried many things but failed.So I came here hoping to get help, thank you!Btw,I use Intel oneAPI's DPC++ language.

My environment:

CPU: Intel(R) Core(TM) i7-1065G7 CPU;

GPU: Intel(R) Iris(R) Plus Graphics;

OS: Win11;

IDE: VS2022;

OneAPI BaseToolkit: 2024.0.1.45;

OneMath Kernel Library:2024.1.0.696;

  1. I've tried encapsulating the DFT settings in a non-kernel function and then the device kernel calls that function. But it will report errors: 【SYCL kernel cannot call an undefined function without SYCL_EXTERNAL attribute】【 SYCL kernel cannot call a variadic function】

  2. I checked the SYCL2020 standard: 【SYCL device code, as defined by this specification, does not support virtual function calls, function pointers in general, exceptions, runtime type information or the full set of C++ libraries that may depend on these features or on features of a particular host compiler. Nevertheless, these basic restrictions can be relieved by some specific Khronos or vendor extensions.】From the above, I can conclude that the DFT of Intel's oneMKL library should be callable in the kernel.(maybe?)

  3. Expect to be able to call this DFT in the kernel (Intel GPU).

  4. Below is the DFT implementing oneMKL in CPU:

#include <iostream>
#include <mkl_dfti.h>
#include <complex>
#include <vector>
using namespace std;

int main() {

    complex<double> input[48] = { 0.5,-0.991445,0.258819,0.793353,-0.866025,-0.130526,0.965926,-0.608761,-0.5,0.991445,-0.258819,-0.793353,0.8 66025,0.130526,-0.965926,0.608761, 0.5,-0.991445,0.258819,0.793353,-0.866025,-0.130526,0.965926,-0.608761,-0.5,0.991445,-0.258819,-0.793353,0.866025,0.13052 6,-0.965926,0.608761,0.5,-0.991445,0.258819,0.793353,- 0.866025,-0.130526,0.965926,-0.608761,-0.5,0.991445,-0.258819,-0.793353,0.866025,0.130526,-0.965926,0.608761 };
    complex<double> output[48]; // Output for complex-to-complex DFT;
    
    DFTI_DESCRIPTOR_HANDLE my_desc_handle = NULL; // Descriptor handle for complex to complex FFT;

    MKL_LONG status; // Variable to store command execution status;
    status = DftiCreateDescriptor(&my_desc_handle, DFTI_DOUBLE, DFTI_COMPLEX, 1, 48); // Create a descriptor;
    status = DftiSetValue(my_desc_handle, DFTI_PLACEMENT, DFTI_NOT_INPLACE); // Set non-in-place operations;
    status = DftiSetValue(my_desc_handle, DFTI_NUMBER_OF_TRANSFORMS, 1); //Set the number of transformations to 1;
    status = DftiCommitDescriptor(my_desc_handle); // Submit the descriptor to make its configuration effective;
    status = DftiComputeForward(my_desc_handle, input, output); // Execute forward FFT;
    status = DftiFreeDescriptor(&my_desc_handle); // Release descriptor;

    cout << "FFT result:" << endl; //Output the FFT result from complex number to complex number
    for (int i = 0; i < 48; i++) {
        cout << output[i] << "\n";
    }
    cout << endl << endl;

    return 0;
}

Solution

  • Intel oneMKL follows the UXL Foundation's oneAPI specification, as detailed here.

    I haven't tested the following code, but I would expect the following to about right:

    oneapi::mkl::dft::descriptor<oneapi::mkl::dft::precision::DOUBLE,
                                     oneapi::mkl::dft::domain::COMPLEX>
            desc(static_cast<std::int64_t>(48));
    desc.set_value(oneapi::mkl::dft::config_param::PLACEMENT,
                       oneapi::mkl::dft::config_value::NOT_INPLACE);
    desc.set_value(oneapi::mkl::dft::config_param::NUMBER_OF_TRANSFORMS,
                       static_cast<std::int64_t>(1));
    desc.commit(my_sycl_queue);
    auto compute_event = oneapi::mkl::dft::compute_forward(desc, input, output);
    // ... wait for the event to complete
    compute_event.wait_and_throw();
    

    For more complete example (including getting a queue, memory allocation, and both the buffer and USM APIs), try looking at the repo of the oneMKL interface project (same interface as Intel oneMKL, but calls Intel/Nvidia/AMD/pure SYCL libraries according to the device used), or at the examples provided with oneMKL in the base toolkit at <ONEMKL_BASE>/mkl/latest/share/doc/mkl/examples/examples_sycl.tgz and (after decompressing that) <examples_sycl>/sycl/dft/.