Search code examples
vectorizationopenclgpgpuamd-processoramd-gpu

Should we use the vector-types, if we want to write once optimized code for both: CPU and GPU?


As known, OpenCL vector-type float16

enter image description here


As a result:

I.e. vector-types such as float16 does not matter much for the GPU, but are of great importance for the CPU.

Should we use the vector-types, if we want to write once optimized OpenCL-code for both architectures: CPU and GPU?


Conclusion:

Vector types are not much needed for GPU or Intel-CPU, but needed for AMD-CPU.


Solution

  • In general, if performance is what you're concerned about, it is almost always a bad idea to use a same kernel for different architectures. Pre-GCN's want vectors, GCN's want scalars, CPU's can handle both with Intel driver but only if you are awared of it, and I don't know how AMD's driver is doing on a CPU. While CPU need wider vectors than GPU. CPU's rely on cache and GPU's rely more on scratch memory. GPU's have insanely more registers than CPU's can even dream of...

    On GCN's actually vector types just make me feel my code looks nicer, and save some time on typing and making mistakes. float v[4], float4 v, or even float v0, v1, v2, v3, doesn't make much difference for the most of time.

    And as said before, Intel's CL driver can map a thread to a SIMD element, which make one core 8 CL threads.