Search code examples
cudamulticoregpugpgpu

Kernels that run fast on Multicores but relatively slow on GPU


Can someone suggest list of algorithms in which Multicores give superior performance compared to GPUs? I know that hybrid approach will still be faster, but what I am really looking for is to understand areas in which GPU still lag behind multicores.


Solution

  • In order of suitability from least suitable to most suitable:

    • GPUs can only accelerate SIMD type workloads, so they are no good for task-parallel operations (like make -jN).
    • GPUs don't have much cache and their atomic ops are relatively slow compared to CPUs; so they are nowhere near as good as CPUs with pointer-based structures such as trees.
    • Workloads such as image processing or computer vision are in a gray area where the GPU advantages (texture mapping hardware, more cores) may be offset by the CPU advantages (better SIMD integer support, much higher clock rate). If the actual processing is done in floating point, it's probably a wash or slight advantage to the GPU; if the processing is done in integer and can be mapped onto SSE2 instructions, the CPU will crush the GPU.

    GPUs excel at data-parallel workloads that use a lot of single-precision floating-point.

    Any workload getting offloaded to the GPU also incurs data transfer costs.