Search code examples
multithreadingmulticorecpu-cores

Multi-thread programming with logical threads


The theory of multirhead programing explained is based on the number of cores, but nowdays processors have more logical cores than physical ones. The question is, if a well-implemented parallel algorithm is run on a processor with 4 physical and 8 logical cores, the speedup will be 4 or 8 times (the best case without couting the cost of parallelism and additional staff).

For example below you can see the results of image filtering, having 4 cores and 8 thread CPU. It looks like upper bound is 4 time speed up, but in case of using 8 threads it seems to be the best speed up among the rest

enter image description here


Solution

  • Logical cores are only useful if your code is latency bound. This is the case when there are stalls (eg. cache miss) or when instructions are sequentiallized a lot (eg. a loop of dependent divisions). In such a case, the processor could truly execute 2 threads in parallel on the same physical core (using two logical cores). A relatively good example is a naive matrix transposition. Logical cores do not help a lot on many optimized codes because optimized codes do not stall often and generally expose a lot of instruction level parallelism (eg. due to loop unrolling).

    When you are measuring a speed up, it is generally not relevant to use logical cores unless you know that the workload can benefit from them (the ones inherently latency bound) or when the processor is designed so they should be use (on Xeon Phi or POWER processors for example). I expect logical cores not to be useful on an optimized image filtering workload.

    Note that logical cores tends to make benchmark results harder to understand.