I have got only an CPU Core i3 with two cores, so I can only work with CPU, not GPU. I want to test a simple example using OpenCL with a simple add kernel. But here is my problem:
After allocating platform, CPU device, etc, I do the following:
1) clEnqueueNDRange() enqueues a kernel task and assigns an event to the completion of this task using the last parameter.
2) clSetEventCallback() using CL_COMPLETE links the callback function to the aforementioned event.
Normally, the callback function should be called when the task completes. But it doesn't. Indeed, the task in INCOMPLETE at the end event if the host as a lot of stuff to do before ending. Could someone say me why?
Here is my minimal code:
/** Simple add kernel */
private static String programSource0 =
"__kernel void vectorAdd(" +
" __global const float *a,"+
" __global const float *b, " +
" __global float *c)"+
"{"+
" int gid = get_global_id(0);"+
" c[gid] = a[gid]+b[gid];"+
"}";
/** The entry point of this sample */
public static void main(String args[])
{
/** Callback function */
EventCallbackFunction kernelCommandEvent = new EventCallbackFunction()
{
@Override
public void function(cl_event event, int event_status, Object user_data)
{
System.out.println("Callback: task COMPLETED");
}
};
// Initialize the input data
int n = 1000000;
float srcArrayA[] = new float[n];
float srcArrayB[] = new float[n];
float dstArray0[] = new float[n];
Array.fill(srcArrayA, 1,0f);
Array.fill(srcArrayB, 1,0f);
// .
// (hidden) Allocation of my Intel platform, CPU device, kernel, commandQueue, and memory buffer, set the argument to kernel etc...
// .
// Set work-item dimensions and execute the kernels
long globalWorkSize[] = new long[]{n};
// I pass an event on completion of the command queue.
cl_event[] myEventID = new cl_event[1];
myEventID[0] = new cl_event();
clEnqueueNDRangeKernel(commandQueue, kernel0, 1, null, globalWorkSize, null, 0, null, myEventID[0]);
// I link the event to the callback function "kernelCommandEvent", and pass 10 as parameter
clSetEventCallback(myEventID[0], CL_COMPLETE, kernelCommandEvent, new Integer(10));
// host does some very long stuff !!
// Normally, my device task should be completed
int[] ok = new int[1];
Arrays.fill(ok, 0);
clGetEventInfo(myEventID[0], CL_EVENT_COMMAND_EXECUTION_STATUS, Sizeof.cl_int, Pointer.to(ok), null);
if (ok[0] == CL_COMPLETE) System.out.println("Task COMPLETE");else System.out.println("Task INCOMPLETE");
}
Enqueue does not enforce the execution of the task. It just puts it in the queue.
The tasks are executed only if you:
clFlush()
.Some drivers can also decide that they will start working on a task even if you did not flush it. But that is implementation dependent. If you want to be sure use clFlush(commandQueue);
Extra: This behaviour is like that, because the overhead of queuing data to the device can be big, and doing it every Enqueue call may not be efficient if it is called multiple times in a loop. Instead it is defered to the flush or a blocking call, so it can be batched.