Search code examples
gccassemblyx86-64inline-assembly

Which inline assembly code is correct for rdtscp?


Disclaimer: Words cannot describe how much I detest AT&T style syntax

I have a problem that I hope is caused by register clobbering. If not, I have a much bigger problem.

The first version I used was

static unsigned long long rdtscp(void)
{
    unsigned int hi, lo;
    __asm__ __volatile__("rdtscp" : "=a"(lo), "=d"(hi));
    return (unsigned long long)lo | ((unsigned long long)hi << 32);
}

I notice there is no 'clobbering' stuff in this version. Whether or not this is a problem I don't know... I suppose it depends if the compiler inlines the function or not. Using this version causes me problems that aren't always reproducible.

The next version I found is

static unsigned long long rdtscp(void)
{
    unsigned long long tsc;
    __asm__ __volatile__(
        "rdtscp;"
        "shl $32, %%rdx;"
        "or %%rdx, %%rax"
        : "=a"(tsc)
        :
        : "%rcx", "%rdx");

    return tsc;
}

This is reassuringly unreadable and official looking, but like I said my issue isn't always reproducible so I'm merely trying to rule out one possible cause of my problem.

The reason I believe the first version is a problem is that it is overwriting a register that previously held a function parameter.

What's correct... version 1, or version 2, or both?


Solution

  • Here's C++ code that will return the TSC and store the auxiliary 32-bits (Processor ID) into the reference parameter

    static inline uint64_t rdtscp( uint32_t & aux )
    {
        uint64_t rax,rdx;
        asm volatile ( "rdtscp\n" : "=a" (rax), "=d" (rdx), "=c" (aux) : : );
        return (rdx << 32) + rax;
    }
    

    It is better to do the shift and add to merge both 32-bit halves in C++ statement rather than inline, this allows the compiler to schedule those instructions as it sees fit.

    Update, about aux: The RDTSCP instruction returns the TSC (in two registers), and the Processor ID (aux) in a 3rd register (unlike the RDTSC instruction which only returns the TSC). The Processor ID is an MSR (Machine Specific Register) which therefore must be initialized by privileged system software, its purpose is to identify which "core" is executing the instruction. The value is therefore O/S dependent.

    See http://felixcloutier.com/x86/rdtscp