Search code examples
pythonnumpyparameter-passingopenclpyopencl

Pass vector (float4) kernell argument to OpenCL (Python)


is there any easy way how to pass float4 or any other vector argument to OpenCL kernel? For scalar argument (int, float) you can pass it directly while calling kernel. For array argument you have to first copy it to GPU using cl.Buffer() and than pass pointer. Sure it is probably possible to pass float4 the same way as array. But I ask if there is any easier and more clear way. ( especially using Python, numpy, pyOpenCL)

I tried pass numpy array of size 4*float32 as float4 but it does not work. Is it possible to do it somehow else?

For example : kernnel:

__kernel void myKernel( __global float  * myArray, float myFloat, float4 myFloat4 )

Python:

myFloat4   = numpy.array  ( [1.0 ,2.0 ,3.0], dtype=np.float32 ) 
myArray    = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=myArray_host)
kernelargs = ( myArray , numpy.float32(myFloat) , myFloat4) 
prg.myKernel(queue, cl_myArray.shape() , None, *(kernelargs) )

I got error :

pyopencl.LogicError: when processing argument #2 (1-based): clSetKernelArg failed: invalid arg size

the other possibiliy is passing it as set of scalar int or float - like:

__kernel void myKernel( __global float  * myArray, float myFloat, float myFloat4_x, float myFloat4_y, float myFloat4_z  )

kernelargs = ( myArray , numpy.float32(myFloat) ,numpy.float32(myFloat4_x),numpy.float32(myFloat4_y),numpy.float32(myFloat4_z))

but this is also not very convenient - you can be easily lost in many variable names if you want for example pass 4x float4 and 5x int3 to the kernell.

I think passing vectors (2,3,4) of int and float must be quite common in OpenCL - for example the size of 3D data grids. So I wonder if it is really necessary to pass it using cl.Buffer() as pointers.

I guess that constant argument float4 is also faster than *float (because it can be shared as a constant by all workitems)


Solution

  • I find this a nice way to create a float4 in python:

    import numpy as np
    import pyopencl as cl
    import pyopencl.array as cl_array
    
    data= np.zeros(N, dtype=cl_array.vec.float4)
    

    Edit: To also give a MWE:

    import numpy as np
    import pyopencl as cl
    import pyopencl.array as cl_array
    
    
    deviceID = 0
    platformID = 0
    workGroup=(1,1)
    
    N = 10
    testData = np.zeros(N, dtype=cl_array.vec.float4)
    
    dev = cl.get_platforms()[platformID].get_devices()[deviceID]
    
    ctx = cl.Context([dev])
    queue = cl.CommandQueue(ctx)
    mf = cl.mem_flags
    Data_In = cl.Buffer(ctx, mf.READ_WRITE, testData.nbytes)
    
    
    prg = cl.Program(ctx, """
    
    __kernel void   Pack_Cmplx( __global float4* Data_In, int  N)
    {
      int gid = get_global_id(0);
    
      Data_In[gid] = 1;
    }
     """).build()
    
    prg.Pack_Cmplx(queue, (N,1), workGroup, Data_In, np.int32(N))
    cl.enqueue_copy(queue, testData, Data_In)
    
    
    print testData