My spinlock obviously has a busy-spin loop whilst the lock fails to be acquired:
while(try_lock() == false)
{
// Use _mm_pause() or _tpause() here?
}
I noticed I don't have _mm_pause()
inside the loop. I understand omitting this can cause performance degradation regarding memory barriers/fences/ordering?
Before adding _mm_pause()
I discovered _tpause()
:
https://www.felixcloutier.com/x86/tpause
However, from the Intel Intrinsics Guide it usage seems slightly more complicated.
I would like to maximize performance/not concerned with power consumption.
Which should I use and if it's _t_pause()
, how is it used correctly? I cannot find any example usage, even on Github.
Architecture will be 2022+ Intel Xeon models.
EDIT:
I've just noticed _mm_pause()
latency is 140 cycles?!
Unfortunately I didn't see a latency for _tpause()
.
From this Linux patch:
/*
* On Intel the TPAUSE instruction waits until any of:
* 1) the TSC counter exceeds the value provided in EDX:EAX
* 2) global timeout in IA32_UMWAIT_CONTROL is exceeded
* 3) an external interrupt occurs
*/
So it looks like TPAUSE
is intended for power-optimized sleep case, not for low-latency spinning. You should use PAUSE
for that.
Also PAUSE
latency and behavior depends heavily on microarchitecture, so you should check / benchmark against your real target. If your CPU is Xeon 2022+, then it's very unlikely on Skylake microarchitecture (which was introduced in ~2015).