is there any easy way how to pass float4 or any other vector argument to OpenCL kernel? For scalar argument (int, float) you can pass it directly while calling kernel. For array argument you have to first copy it to GPU using cl.Buffer() and than pass pointer. Sure it is probably possible to pass float4 the same way as array. But I ask if there is any easier and more clear way. ( especially using Python, numpy, pyOpenCL)
I tried pass numpy array of size 4*float32 as float4 but it does not work. Is it possible to do it somehow else?
For example : kernnel:
__kernel void myKernel( __global float * myArray, float myFloat, float4 myFloat4 )
Python:
myFloat4 = numpy.array ( [1.0 ,2.0 ,3.0], dtype=np.float32 )
myArray = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=myArray_host)
kernelargs = ( myArray , numpy.float32(myFloat) , myFloat4)
prg.myKernel(queue, cl_myArray.shape() , None, *(kernelargs) )
I got error :
pyopencl.LogicError: when processing argument #2 (1-based): clSetKernelArg failed: invalid arg size
the other possibiliy is passing it as set of scalar int or float - like:
__kernel void myKernel( __global float * myArray, float myFloat, float myFloat4_x, float myFloat4_y, float myFloat4_z )
kernelargs = ( myArray , numpy.float32(myFloat) ,numpy.float32(myFloat4_x),numpy.float32(myFloat4_y),numpy.float32(myFloat4_z))
but this is also not very convenient - you can be easily lost in many variable names if you want for example pass 4x float4 and 5x int3 to the kernell.
I think passing vectors (2,3,4) of int and float must be quite common in OpenCL - for example the size of 3D data grids. So I wonder if it is really necessary to pass it using cl.Buffer() as pointers.
I guess that constant argument float4 is also faster than *float (because it can be shared as a constant by all workitems)
I find this a nice way to create a float4 in python:
import numpy as np
import pyopencl as cl
import pyopencl.array as cl_array
data= np.zeros(N, dtype=cl_array.vec.float4)
Edit: To also give a MWE:
import numpy as np
import pyopencl as cl
import pyopencl.array as cl_array
deviceID = 0
platformID = 0
workGroup=(1,1)
N = 10
testData = np.zeros(N, dtype=cl_array.vec.float4)
dev = cl.get_platforms()[platformID].get_devices()[deviceID]
ctx = cl.Context([dev])
queue = cl.CommandQueue(ctx)
mf = cl.mem_flags
Data_In = cl.Buffer(ctx, mf.READ_WRITE, testData.nbytes)
prg = cl.Program(ctx, """
__kernel void Pack_Cmplx( __global float4* Data_In, int N)
{
int gid = get_global_id(0);
Data_In[gid] = 1;
}
""").build()
prg.Pack_Cmplx(queue, (N,1), workGroup, Data_In, np.int32(N))
cl.enqueue_copy(queue, testData, Data_In)
print testData