Search code examples
linuxlinux-kernelsmp

How are percpu pointers implemented in the Linux kernel?


On multiprocessor, each core can have its own variables. I thought they are different variables in different addresses, although they are in same process and have the same name.

But I am wondering, how does the kernel implement this? Does it dispense a piece of memory to deposit all the percpu pointers, and every time it redirects the pointer to certain address with shift or something?


Solution

  • Normal global variables are not per CPU. Automatic variables are on the stack, and different CPUs use different stack, so naturally they get separate variables.

    I guess you're referring to Linux's per-CPU variable infrastructure.
    Most of the magic is here (asm-generic/percpu.h):

    extern unsigned long __per_cpu_offset[NR_CPUS];
    
    #define per_cpu_offset(x) (__per_cpu_offset[x])
    
    /* Separate out the type, so (int[3], foo) works. */
    #define DEFINE_PER_CPU(type, name) \
        __attribute__((__section__(".data.percpu"))) __typeof__(type) per_cpu__##name
    
    /* var is in discarded region: offset to particular copy we want */
    #define per_cpu(var, cpu) (*RELOC_HIDE(&per_cpu__##var, __per_cpu_offset[cpu]))
    #define __get_cpu_var(var) per_cpu(var, smp_processor_id())
    

    The macro RELOC_HIDE(ptr, offset) simply advances ptr by the given offset in bytes (regardless of the pointer type).

    What does it do?

    1. When defining DEFINE_PER_CPU(int, x), an integer __per_cpu_x is created in the special .data.percpu section.
    2. When the kernel is loaded, this section is loaded multiple times - once per CPU (this part of the magic isn't in the code above).
    3. The __per_cpu_offset array is filled with the distances between the copies. Supposing 1000 bytes of per cpu data are used, __per_cpu_offset[n] would contain 1000*n.
    4. The symbol per_cpu__x will be relocated, during load, to CPU 0's per_cpu__x.
    5. __get_cpu_var(x), when running on CPU 3, will translate to *RELOC_HIDE(&per_cpu__x, __per_cpu_offset[3]). This starts with CPU 0's x, adds the offset between CPU 0's data and CPU 3's, and eventually dereferences the resulting pointer.