Search code examples
assemblyx86intelcpu-cacherdtsc

Is there a cheaper serializing instruction than cpuid?


I have seen the related question including here and here, but it seems that the only instruction ever mentioned for serializing rdtsc is cpuid.

Unfortunately, cpuid takes roughly 1000 cycles on my system, so I am wondering if anyone knows of a cheaper (fewer cycles and no read or write to memory) serializing instruction?

I looked at iret, but that seems to change control flow, which is also undesirable.

I have actually looked at the whitespaper linked in Alex's answer about rdtscp, but it says:

The RDTSCP instruction waits until all previous instructions have been executed before reading the counter. However, subsequent instructions may begin execution before the read operation is performed.

That second point seems to be make it less than ideal.


Solution

  • Have you looked at the rdtscp instruction? This is the read serialized version of rdtsc.

    For benchmarking I would recommend to read this whitepaper. It provides a couple of best practices for measuring clock ticks.

    Alex(Intel)