Search code examples
clinuxpointersassemblyprocessor

C and inline asm bug


I'm working on a Linux device driver where I meet an annoying bug that I've reduced to the userland code below.

The purpose is to read the number of Cores in the Processor through the cpuid instruction

It appears that a core dump is produced in the third phase only and I'm stuck with an explanation ?

Output:

Phase 1: FYI Proc=0x1ac1010
Phase 2: CPU count=8
Segmentation fault (core dumped)

This code has been written for Linux 4.0.7-2 64 bits, compiled with gcc version 5.1.0

#include <stdio.h>
#include <stdlib.h>

typedef struct {
    unsigned long CPU_count;
} PROC;

PROC *Proc=NULL;

unsigned long CPU_count;

unsigned long CPU_Count(void)
{
    unsigned long c=0;

    __asm__ volatile
    (
        "movq   $0x4, %%rax     \n\t"
        "xorq   %%rcx, %%rcx    \n\t"
        "cpuid                  \n\t"
        "shr    $26, %%rax      \n\t"
        "and    $0x3f, %%rax    \n\t"
        "inc    %%rax           \n\t"
        "movq   %%rax, %0"
        : "=m" (c)
        :
        : "rax", "memory"
    );
    return(c);
}

int main(int argc, char *argv[])
{
    if((Proc=malloc(sizeof(PROC))) != NULL)
    {
        printf("Phase 1: FYI Proc=%p\n", Proc);

        CPU_count=CPU_Count();
        printf("Phase 2: CPU count=%lu\n", CPU_count);

        Proc->CPU_count=CPU_Count();
        printf("Phase 3: CPU count=%lu\n", Proc->CPU_count);

        free(Proc);
        return(0);
    }
    else
        return(-1);
}

Solution

  • As mentioned by Carl Norum since you're asm statement doesn't fully describe its effect on the machine state likely what's happening is that you're clobbering a register GCC is using to store some value. Like say the Proc pointer.

    The solution is to define all the inputs and outputs of your assembly instruction, along with anything else it clobbers. In this case it's also a good idea to do as little as possible in inline assembly, and leave everything else to the compiler. In your example only the cpuid instruction needs to be in inline assembly. So try this instead:

    void
    cpuid(unsigned value, unsigned leaf,
          unsigned *eax, unsigned *ebx, unsigned *ecx, unsigned *edx) {
        asm volatile("cpuid"
                 : "=a" (*eax), "=b"(*ebx), "=c"(*ecx), "=d"(*edx)
                 : "a" (value), "c" (leaf));
    }
    
    unsigned long int CPU_Count(void)
    {
        unsigned eax, ebx, ecx, edx;
        cpuid(4, 0, &eax, &ebx, &ecx, &edx);
        return ((eax >> 26) & 0x3F) + 1;
    }
    

    Note your method of determining the number of CPU cores won't work reliably. On my 4 core CPU your program says I have 8 cores. This is because the field you're extracting gives the maximum APIC id, not the actual number of cores. There's a thread on Intel's web forum that addresses this issue.