Search code examples
c++performanceoptimizationx86

Use _mm_pause() or _tpause() for busy-spin loop?


My spinlock obviously has a busy-spin loop whilst the lock fails to be acquired:

while(try_lock() == false)
{
    // Use _mm_pause() or _tpause() here?
}

I noticed I don't have _mm_pause() inside the loop. I understand omitting this can cause performance degradation regarding memory barriers/fences/ordering?

Before adding _mm_pause() I discovered _tpause():

https://www.felixcloutier.com/x86/tpause

However, from the Intel Intrinsics Guide it usage seems slightly more complicated.

I would like to maximize performance/not concerned with power consumption.

Which should I use and if it's _t_pause(), how is it used correctly? I cannot find any example usage, even on Github.

Architecture will be 2022+ Intel Xeon models.

EDIT:

I've just noticed _mm_pause() latency is 140 cycles?!

enter image description here

Unfortunately I didn't see a latency for _tpause().


Solution

  • From this Linux patch:

    /*
     * On Intel the TPAUSE instruction waits until any of:
     * 1) the TSC counter exceeds the value provided in EDX:EAX
     * 2) global timeout in IA32_UMWAIT_CONTROL is exceeded
     * 3) an external interrupt occurs
     */
    

    So it looks like TPAUSE is intended for power-optimized sleep case, not for low-latency spinning. You should use PAUSE for that.

    Also PAUSE latency and behavior depends heavily on microarchitecture, so you should check / benchmark against your real target. If your CPU is Xeon 2022+, then it's very unlikely on Skylake microarchitecture (which was introduced in ~2015).