My Goal is to get a number of kernels, where it's calculation satisfies a condition. The condition could be value == 0
.
I tried to find some built-in semaphore function to accomplish this, but I couldn't find any for OpenCL. I do not want to blit the result into CPU memory, as this will be somewhat inefficient.
Another thing I tried, or imagined could work, is to use an error to crash certain kernels and then use some debug code to retrieve the number of crashed kernels. I could then write something like 1/value
and every kernel, where my condition is satisfied will crash. Unfortunatly I couldn't find such an OpenCL (debug)-function to retrieve this information.
I imagine, that this approach may be a little slower than the one with semaphores, since it might trigger some fallback/cleanup code.
Further restrictions are, that the code on the host may only use the OpenCL headers for this.
Any ideas how one could possibly retrieve a number of kernels that satisfy a condition, without involving the CPU in the check?
As pmdj suggested:
OpenCL provides atomic operations, including an atomic increment for global (and local) buffers: atomic_inc. unsigned int atomic_inc(volatile __global unsigned int *p)
Even better, this function does not require any OpenCL extensions.