I am looking for performance measurement between Python wrapper to OpenCL and Pure C OpenCL. Performance measurements can varies with time, memory, etc.. - Are there any benchmarks available? - What should be the expectation about the time performance differences? - What kind of tasks (parallel of course...) should make a difference?
It is likely that PyOpenCL is your best choice. I would choose to use C only in very specific situations (a super-critical need for speed/low-latency on the host). For most casual parallel programs, it is fine for the host side to have plenty of slack, because all the real work gets done on the device.
You can consider PyOpenCL and OpenCL to have identical performance on the device.
Maybe use C if you are, like... designing a self-driving car, and every millisecond/amp matters. But even in that situation, it is likely that Python could be used effectively.
The best way to figure out if your specific program is slowed down is to time your code. For PyOpenCL that means:
import time
and
cl.command_queue_properties.PROFILING_ENABLE
Many smart companies and individuals choose to code first in Python, because they can build a flexible, working prototype quickly. If they end up needing more host performance later, it is relatively easy to port Python to C.
Hope that helps!