Search code examples
x86x86-64cpuidrdtsc

How to ensure that RDTSC is accurate?


I've read that RDTSC can gives false readings and should not be relied upon.
Is this true and if so what can be done about it?


Solution

  • Very old CPU's have a RDTSC that is accurate.

    The problem

    However newer CPU's have a problem.
    Engineers decided that RDTSC would be great for telling time.
    However if a CPU throttles the frequency RDTSC is useless for telling time.
    The aforementioned braindead engineers then decided to 'fix' this problem by having the TSC always run at the same frequency, even if the CPU slows down.

    This has the 'advantage' that TSC can be used for telling elapsed (wall clock) time. However it makes the TSC useless less useful for profiling.

    How to tell if your CPU is not broken

    You can tell if your CPU is fine by reading the TSC_invariant bit in the CPUID.

    Set EAX to 80000007H and read bit 8 of EDX.
    If it is 0 then your CPU is fine.
    If it's 1 then your CPU is broken and you need to make sure you profile whilst running the CPU at full throttle.

    function IsTimerBroken: boolean;
    {$ifdef CPUX86}
    asm
      //Make sure RDTSC measure CPU cycles, not wall clock time.
      push ebx
      mov eax,$80000007  //Has TSC Invariant support?
      cpuid
      pop ebx
      xor eax,eax        //Assume no
      and edx,$10        //test TSC_invariant bit
      setnz al           //if set, return true, your PC is broken.
    end;
    {$endif}
      //Make sure RDTSC measure CPU cycles, not wall clock time.
    {$ifdef CPUX64}
    asm
      mov r8,rbx
      mov eax,$80000007  //TSC Invariant support?
      cpuid
      mov rbx,r8
      xor eax,eax
      and edx,$10 //test bit 8
      setnz al
    end;
    {$endif}
    

    How to fix out of order execution issues

    See: http://www.intel.de/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf

    Use the following code:

    function RDTSC: int64;
    {$IFDEF CPUX64}
    asm
      {$IFDEF AllowOutOfOrder}
      rdtsc
      {$ELSE}
      rdtscp        // On x64 we can use the serializing version of RDTSC
      push rbx      // Serialize the code after, to avoid OoO sneaking in
      push rax      // later instructions before the RDTSCP runs.
      push rdx      // See: http://www.intel.de/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf
      xor eax,eax
      cpuid
      pop rdx
      pop rax
      pop rbx
      {$ENDIF}
      shl rdx,32
      or rax,rdx
      {$ELSE}
    {$IFDEF CPUX86}
    asm
      {$IFNDEF AllowOutOfOrder}
      xor eax,eax
      push ebx
      cpuid         // On x86 we can't assume the existance of RDTSP
      pop ebx       // so use CPUID to serialize
      {$ENDIF}
      rdtsc
      {$ELSE}
    error!
    {$ENDIF}
    {$ENDIF}
    end;
    

    How to run RDTSC on a broken CPU

    The trick is to force the CPU to run at 100%.
    This is usually done by running the sample code many many times.
    I usually use 1.000.000 to start with.
    I then time those 1 million runs 10x and take the lowest time of those attempts.

    Comparisons with theoretical timings show that this gives very accurate results.