What's the advantage of running OpenCL code on aCPU?

I am learning OpenCL programming and am noticing something weird.

Namely, when I list all OpenCL enabled devices on my machine (Macbook Pro), I get the following list:

Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
Iris Pro
GeForce GT 750M

The first is my CPU, the second is the onboard graphics solution by Intel and the third is my dedicated graphics card.

Research shows that Intel has made their hardware OpenCL compatible so that I can tap into the power of the onboard graphics unit. That would be the Iris Pro.

With that in mind, what is the purpose of the CPU being OpenCL compatible? Is it merely for convenience so that kernels can be run on a CPU as backup should no other cards be found or is there any kind of speed advantage when running code as OpenCL kernels instead of regular (C, well threaded) programs on top of a CPU?

Solution

See https://software.intel.com/sites/default/files/m/d/4/1/d/8/Writing_Optimal_OpenCL_28tm_29_Code_with_Intel_28R_29_OpenCL_SDK.pdf for basic info.

Basically the Intel OpenCL compiler performs horizontal autovectorization for certain types of kernels. That means with SSE4 you get 8 threads running in parallel in a single core in similar fashion as Nvidia GPU runs 32 threads in a single 32 wide simd unit.

There are 2 major benefits on this approach: What happens if in 2 years they increase the SSE vector width to 16? Then you will instantly get autovectorization for 16 threads when you run on that CPU. No need to recompile your code. The second benefit is that it's far easier to write an OpenCL kernel that autovectorizes easily compared to writing it in ASM or C and getting your compiler to produce efficient code.