I'm almost done rewriting some CUDA code into OpenCL. But I get this terrible runtime error.
The kernel I call takes arguments like this:
__kernel void kernel_forwardProject(
__global float *proj_out,
__gloabl float *proj_in,
__global float *vol,
__read_only image3d_t tex_vol,
__constant float *transformMatrices,
__constant float *sourcePositions)
I am using the cl2.hpp
wrapper for OpenCL and when I call the equivalent of clSetKernelArg
for argument 0, i.e. proj_out
, CL_INVALID_MEM_OBJECT
is returned.
I also get the same result when switching around argument 0 and 1. I've tried the three ways I know of allocating the device buffers:
// 1)
auto dev_proj_out = cl::Buffer(queue, h_proj_out, h_proj_out + proj_size,
/*read_only*/false, /*useHostPtr*/true, &err);
// 2)
auto dev_proj_out = cl::Buffer(ctx, CL_MEM_WRITE_ONLY | CL_MEM_USE_HOST_PTR,
proj_size * sizeof(float), (void*)&h_proj_out[0], &err);
// 3)
auto dev_proj_out = cl::Buffer(ctx, CL_MEM_WRITE_ONLY,
req_dev_alloc, nullptr, &err);
queue.enqueueWriteBuffer(dev_proj_out, CL_TRUE, 0, 0, (void *)&h_proj_out[0]);
h_proj_out
is float*
, proj_size
is 64*64*16
in the test case.
I've tried all 4 combinations of false
and true
for read_only
and useHostPtr
.
I check err
after all OpenCL API calls, there is no errors before clSetKernelArg
.
I've stepped through the code with gdb for all the combinations, it's always at clSetKernelArg
for the first argument which gives the error.
I've tried both the Nvidia and Intel CPU OpenCL runtimes. (POCL doesn't support image types for nvidia gpus, so I can't use that)
The host code can be found here: https://gitlab.com/agravgaard/cbctrecon/blob/master/Library/CbctReconLib/rtkExtension/rtkOpenCLForwardProjectionImageFilter.cpp#L130
The OpenCL kernel: https://gitlab.com/agravgaard/cbctrecon/blob/master/Library/CbctReconLib/rtkExtension/forward_proj.cl#L71 The kernel compiles without any warnings using the Intel SDK for OpenCL offline compiler (with the same defines as given at runtime).
The error occurs on line 247 of the host code. The KernelFunctor calls setArgs<> which calls setArg of the kernel, which calls clSetKernelArg at line 5398 of cl2.hpp
The issue had two parts comes from mixing devices, contexts and queues and kernel management.
The cl2.hpp
uses the default device, context and queue if none is given. As I was not consistent with the default in the example above and thus different queues and contexts "owned" different objects.
I initially got rid of the CL_INVALID_MEM_OBJECT
by rewriting how I manage kernels including adding:
program.createKernels(&kernel_list)
And using that list to initialize the KernelFunctor
.