Kernel codes that produce the error:
__kernel void testDynamic(__global int *data)
int id=get_global_id(0);
__kernel void test(__global int * data)
int id=get_global_id(0);
if (id == 0) {
queue_t q = get_default_queue();
ndrange_t ndrange = ndrange_1D(1,1);
void (^my_block_A)(void) = ^{testDynamic(data);};
I tested below code to be sure OpenCL 2.0 compiler is working.
__kernel void test2(__global int *data)
int id=get_global_id(0);
scan function gives 0,1,3,6 as outputs so OpenCL 2.0 reduction functions are working.
Is dynamic parallelism an extension to OpenCL 2.0? If I remove enqueue_kernel
command, results are equal the the expected values(omitting child kernel).
Device: Amd RX550, driver: 17.6.2
Is there a special command that needs to be run on host side, to run child kernel on get_default_queue
queue? For now, command queue is created with an OpenCL 1.2 way as below:
commandQueue = cl::CommandQueue(context, device, CL_QUEUE_PROFILING_ENABLE, &err);
Does get_default_queue()
have to be the same command queue which calls the parent kernel? Asking this because I'm using same command queue to upload data to GPU and then download results, in a single synchronization.
Moved solution from question to answer:
Edit: below API command was the solution:
commandQueue = cl::CommandQueue(context, device, CL_QUEUE_ON_DEVICE| CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE_DEFAULT, &err);
after creating this queue(only 1 per device), didn't use it for anything else and also the parent kernel is enqueued on any other host queue so it looks like get_default_queue() doesn't have to be the parent-calling queue.
Documentation says CL_INVALID_QUEUE_PROPERTIES will be thrown if CL_QUEUE_ON_DEVICE is specified but for my machine, dynamic parallelism works with it and doesn't throw that error(as the upper commandQueue constructor parameters).