Search code examples
multithreadingperformancearchitectureparallel-processinghyperthreading

Level of Parallelism present in multiple threads per core


So i have been looking into some of the technologies that implement multiple threads per core (like intel's hyperthreading) and I am wondering whats the extent of parallelism in these kinds of technologies. Is it true parallelism or just more effective concurrency? It seems they still share the same execution units and core resources, basically seems like its just virtualizing the usage. So I am unsure how true parallelism could occur. And if this is the case then what is the benefit? You can achieve concurrency through effective thread context switching.


Solution

  • So there are a lot of factors that determine the benefits present in Hyperthreading. First off since they are sharing resources there is obviously no true parallelism but their is some increase in concurrency depending on the type of processor.

    There are three types of hardware threading. Fine grained which switches threads in a round robin fashion with the goal of increased throughput, at the cost of increased individual thread latency. Switching is done on a clock to clock basis. There is course grained which is more like a context switch, where the processor switching the thread when a stall or some sort of memory fetching occurs. Then there is simultaneous in which thread switching occurs in the same clock, meaning there is multiple thread data in the Reorder Buffer and Pipeline at the same time. They are depicted as follows.

    Shades correspond to threads of execution

    Hyperthreading corresponds to the SMT in this diagram. And as seen, the effectiveness of the design depends primarily on one thing: how busy the pipeline is. In dynamically scheduled processors, where the goal is to keep the pipeline and execution units as busy as possible, the advantages see diminishing returns of around 0 to 5 percent from what I have seen. For statically scheduled processors, where the pipeline has a lot of stalls the benefits are much more prevalent and see gains of around 20 to 40% depending on the capabilities of the compiler reordering the instructions.