Search code examples
openclgpu

How to launch multiple kernel in OpenCL, inside the program?


I'm trying to see the performance of the Opencl Programming model on GPUs, now while testing the Programming model, i have to launch the kernel by using clEnqueueNDkernel(), I'm trying to call this function multiple times, so that I can see how it performs when two or four concurrent kernels are launched.

I observe that the program is taking the same amount of time as launching one kernel, now I'm assuming that it is just running the kernel once, cause there is no way it takes the same amount of time to run two or four concurrent kernels.

Now I want to know how to launch multiple kernels on one GPU.

eg: I want to launch something like :

clEnqueueNDkernel()
clEnqueueNDkernel()

How can I do this?


Solution

  • First of all, check if your Device supports concurrent kernel execution. Latest AMD & Nvidia cards do.

    Then, create multiple command queues. If you enqueue kernels into same queue, they will be executed consecutively one after another.

    Finally, check that kernels were indeed executed in parallel. Use profilers from SDK or OpenCL events to gather profiling info.