Search code examples
python-3.xcudacomplex-numberscupy

How to run complex exponential "cexpf" or "cexp" in the RawKernel of cupy?


As title, I was calculating the exponential of an array of complex numbers in the RawKernel provided by cupy. But I don't know how to include or invoke the function "cexpf" or "cexp" correctly. The error message always shows me that "cexpf" is undefined. Does anybody know how to invoke the function in the correct way? Thank you a lot for the answer.

import cupy as cp
import time

add_kernel = cp.RawKernel(r'''
#include <cupy/complex.cuh>
#include <cupy/complex/cexpf.h>
extern "C" __global__
void test(double* x, double* y, complex<float>* z){
    int tId_x = blockDim.x*blockIdx.x + threadIdx.x;
    int tId_y = blockDim.y*blockIdx.y + threadIdx.y;
    
    complex<float> value = complex<float>(x[tId_x],y[tId_y]);

    z[tId_x*blockDim.y*gridDim.y+tId_y] = cexpf(value);
}''',"test")

x = cp.random.rand(1,8,4096,dtype = cp.float32)
#x = cp.arange(0,4096,dtype = cp.uint32)
y = cp.random.rand(1,8,4096,dtype = cp.float32)
#y = cp.arange(4096,8192,dtype = cp.uint32)
z = cp.zeros((4096,4096), dtype = cp.complex64)
t1 = time.time()
add_kernel((128,128),(32,32),(x,y,z))
print(time.time()-t1)
print(z)

Solution

  • Looking at the headers it seems like you are supposed to just call exp and you don't need to include cupy/complex/cexpf.h yourself, as it is already included implicitly via cupy/complex.cuh.

    add_kernel = cp.RawKernel(r'''
    #include <cupy/complex.cuh>
    extern "C" __global__
    void test(double* x, double* y, complex<float>* z){
        int tId_x = blockDim.x*blockIdx.x + threadIdx.x;
        int tId_y = blockDim.y*blockIdx.y + threadIdx.y;
        
        complex<float> value = complex<float>(x[tId_x],y[tId_y]);
    
        z[tId_x*blockDim.y*gridDim.y+tId_y] = exp(value);
    }''',"test")
    

    Generally Cupy's custome kernel C++ complex number API is taken from Thrust, so you can consult the Thrust documentation. Just skip using the thrust:: namespace.

    Thrusts API in turn tries to implement the C++ std::complex API for the most part, so looking at the C++ standard library documentation might also be helpful when the Thrust documentation does not go deep enough. Just be careful because Thrust might no give all the same guarantees to avoid performance problems on the GPU.