Why does hint::spin_loop use ISB on aarch64?

In std::hint there's a spin_loop function with the following definition in its documentation:

Emits a machine instruction to signal the processor that it is running in a busy-wait spin-loop (“spin lock”).

Upon receiving the spin-loop signal the processor can optimize its behavior by, for example, saving power or switching hyper-threads.

Depending on the target architecture, this compiles to either:

_mm_pause, A.K.A. the pause intrinsic on x86
yield instruction on 32-bit arm
ISB SY on 64-bit arm (aarch64)

That last one has got my head spinning a little bit (😉). I thought that ISB is a lengthy operation, which would mean that, if used within a spin lock, the thread lags a bit in trying to detect whether the lock is open again, but otherwise there's hardly any profit to it.

What are the advantages of using ISB SY instead of a NOP in a spin loop on aarch64?

Solution

I had to dig into the Rust repository history to get to this answer:

The yield has been replaced with isb in c064b6560b7c:

On arm64 we have seen on several databases that ISB (instruction synchronization barrier) is better to use than yield in a spin loop. The yield instruction is a nop. The isb instruction puts the processor to sleep for some short time. isb is a good equivalent to the pause instruction on x86.

[...]

So essentially, it uses the time it takes for an ISB to complete to pause the processor, so that it wastes less power.

Peter Cordes explained it nicely in one of his comments:

ISB SY doesn't stall for long, just saves a bit of power vs. spamming loads in a tight loop.