OpenCL on CPU Device - what happens under the hood?

So if I run an openCL Kernel function on a CPU device and not a GPU device, does the kernel automatically make use of all the cores on the CPU? For instance my system says I have 4 cores on the CPU will the kernel make use of all 4 cores automatically?

If the above is true, then does it mean that running an openCL kernel on a single CPU device would be equivalent to using standard std::threads (assuming we're using C++) to carry out the same tasks?

I ask because on my current computer openCL seems to only be able to access one CPU and no GPUs. So if I were to use openCL to parallelize my code it seems like it might be overkill if it will essentially be doing the same thing as an std::thread based implementation.

Solution

Without actually having access to the OpenCL implementation that you are actually using, it's hard to say exactly how it is implemented. I'm sure that there are more than one method of solving this problem (somewhat depending on the OS and other factors), but the most likely scenario is that it uses the threading system of the host OS in some way or another. OpenCL implementations are typically not written in C++, so it may not be std::threads, but almost certainly the same fundamentals that are the basis for std::threads on that system.

The purpose/benefit of using OpenCL when you have only a CPU device available would be that you can transparently transition from such a system to one that has a GPU device, without having two different sets of code.

It is also a little less difficult to debug OpenCL kernels that misbehave in this environment (I said less difficult, not easy, for a reason, before anyone complains) - assuming of course that the behaviour is the same in both environments, of course.