I have a problem: fast linear systems solving (I have a lot of such systems). I'm going to solve it using GPU and OpenCL.
I love dynamic languages such as Ruby or Python and I got out of a habit of using low level languages like C.
So I have two simultaneous aims:
The best case for me is: almost python code compile in OpenCL C almost without waste.
I found such solutions: pure OpenCL C, PyOpenCL, Clyther.
With what should I start?
My opinion is that trying to shoehorn a dynamic language into OpenCL is not worth the effort. You will lose most of what you like about Python, and probably not save much time for your effort in the end.
But I am speaking only of writing OpenCL kernels in Python. There is also the host application, which prepares and submits the kernels. If you like Python, I suggest writing the host app in pure Python with a wrapper like PyOpenCL to access the OpenCL API. Then, write your kernels in pure OpenCL and have your Python app submit them as-is. I believe this will get most of what you want from Python while costing almost nothing in performance.