I'm looking for a Java lib that permits to do some fast computations with vector (and maybe matrices too).
By fast I mean that it takes advantage of GPU processing and/or SSE instructions. I'm wondering if it can be possible to find something more portable as possible. I recognize that the JVM provides a thick abstraction layer of the hardware.
I've come across JCUDA, but there's a drawback: on a computer without an Nnvidia graphic card it should be run in emulation mode (so I come to believe it will be not efficient as expected). Has anyone already tried it?
What about OpenCL? It should provide you a good starting point for this kind of optimized operations.
There exist many bindings for Java, starting from jocl (but take a loot also at JavaCL or LWJGL that added support from 2.6)