Search code examples
pythoncudachainercupy

Passing structure to raw kernel in cupy


I have CUDA kernels that take structures such as float3, int2, etc. as arguments. I seem to be unable to properly pass an argument to these kernels through the cupy rawkernel interface. I have tried passing a 1d cupy array of 3 floats for a float3 parameter and the argument was not interpreted correctly in the kernel. I have tried passing a ctypes structure, but got back an unsupported type error. Is it possible to to send a custom struct to a raw kernel in cupy? If so, how?

I tried using ctype structs as follows:

class float3(ctypes.Structure): 
    fields = [ ("X", c_float), ("Y", c_float), ("Z", c_float)] 

from cupy.cuda.function import CPointer 

class CFloat3(CPointer): 
    def __init__(self, v): super().__init__(ctypes.addressof(v)) 
        self.val = v 

val= float3(1.5, 3, 5) 
cval= CFloat3(val) 

This bypassed cupy's type checking, but still didn't correctly pass the values to the kernel. It seems like it should work if you check look at the function module in cupy's source code. It just passes on the pointer of the struct. I also tried id(v) and ctypes.POINTER(float3)(v) instead of ctypes.addressof to get the address of the struct, but that didn't work either.

I can work around this by writing kernel wrappers that accept arrays as inputs and then convert the arrays to structs to call my regular kernels. This is ugly to me though. If this can't be done, it seems like a big oversight to not provide the ability to pass structs to kernels.


Solution

  • I'm in agreement with the comment; I was not able to find a way to make this work in the general case.

    A hacky method can perhaps be used to work with float2 or double2 by repurposing the np.complexXX datatypes. Here is an example:

    $ cat t19.py
    import numpy as np
    import cupy
    ddim = 64
    bdim = 32
    d = np.complex64(1+2j)
    i = cupy.ones((ddim*3), dtype=cupy.float32).reshape(ddim, 3)
    o = cupy.zeros((ddim*3), dtype = cupy.float32).reshape(ddim, 3)
    my_test = cupy.RawKernel(r'''
      extern "C" __global__
      void my_test(const float2 d, const  float3 * __restrict__  i, float3 * __restrict__ o, int dim) {
      int x = blockDim.x * blockIdx.x + threadIdx.x;
      if (x < dim){
        float3 temp = i[x];
        temp.x += d.x;
        temp.y += d.y;
        temp.z += d.x;
        o[x] = temp;}
      }
     ''', 'my_test')
    gdim = ddim//bdim + 1
    my_test((gdim,1), (bdim,1), (d, i,o,ddim))  # grid, block and arguments
    r_o = cupy.asnumpy(o)
    print(r_o)
    $ python t19.py
    [[2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]
     [2. 3. 2.]]
    $
    

    I had no luck with numpy structured data types, which seems like it would have been the logical path for this.