Search code examples
linuxlinux-kernelx86-64cpu-architectureinterrupt

Is x86_64 IDT shared between CPUs in Linux kernel?


TLDR:

Q1: Does Intel x86_64 architecture has per CPU idtr? If so, then should IDT be loaded N times, where N is the number of CPUs? I mean for each CPU, not for one CPU N times.

Q2: I found that IDT is shared between CPUs on x86_64, while comment in Linux says the opposite (x86_64 has per CPU IDT tables), which position is correct?

Lengthy description

I'm investigating IDT (Interrupt Descriptor Table) setup in Linux and I found such a comment in arch/x86/include/asm/irq_vectors.h:

/*
 * Linux IRQ vector layout.
 *
 * There are 256 IDT entries (per CPU - each entry is 8 bytes) which can
 * be defined by Linux. They are used as a jump table by the CPU when a
 * given vector is triggered - by a CPU-external, CPU-internal or
 * software-triggered event.
 *
 * Linux sets the kernel code address each entry jumps to early during
 * bootup, and never changes them. This is the general layout of the
 * IDT entries:
 *
 *  Vectors   0 ...  31 : system traps and exceptions - hardcoded events
 *  Vectors  32 ... 127 : device interrupts
 *  Vector  128         : legacy int80 syscall interface
 *  Vectors 129 ... LOCAL_TIMER_VECTOR-1
 *  Vectors LOCAL_TIMER_VECTOR ... 255 : special interrupts
 *
 * 64-bit x86 has per CPU IDT tables, 32-bit has one shared IDT table.
 *
 * This file enumerates the exact layout of them:
 */

Layout of IDT is understood, but one line made me confused:

64-bit x86 has per CPU IDT tables, 32-bit has one shared IDT table.

The reason of confusion is the following: AFAIK "main" (not IVT/early IDT) IDT is loaded at:

void __init idt_setup_apic_and_irq_gates(void)
{
    /* Prepare interrupt gates and idt_descr */
    ...
    /* Map IDT into CPU entry area and reload it. */
    idt_map_in_cea();
    load_idt(&idt_descr);
    ...
}

So, looking at idt_map_in_cea:

static void __init idt_map_in_cea(void)
{
    /*
     * Set the IDT descriptor to a fixed read-only location in the cpu
     * entry area, so that the "sidt" instruction will not leak the
     * location of the kernel, and to defend the IDT against arbitrary
     * memory write vulnerabilities.
     */
    cea_set_pte(CPU_ENTRY_AREA_RO_IDT_VADDR, __pa_symbol(idt_table),
            PAGE_KERNEL_RO);
    idt_descr.address = CPU_ENTRY_AREA_RO_IDT;
}

Here I see IDT is mapped into CPU_ENTRY_AREA_RO_IDT which is equal to fffffe0000000000 (according to linux virtual memory map this is indeed the start of CPU entry area) and then that one is loaded using lidt in load_idt() function.

First, I thought that "since this is virtual address there should be different page tables, so there will be different physical instances of IDT", but dumping idtr using sidt (actually store_idt function) gives this virtual address (fffffe0000000000) as expected, but walking page tables for_each_online_cpu gives the same physical address. Using gdb/QEMU I found out that this address is correct and correspond to idt_table symbol (actual IDT table that has been loaded after idt_map_in_cea). This is actually what made me confused, I can see that IDT is shared between CPUs, is it so or am I missing something?

Also, when I copy the IDT to my own LKM memory and then reload it using lidt (actually load_idt function) walking page tables for_each_online_cpu gives the same LKM address even though I didn't execute lidt for each cpu to load new IDT for all CPUs.

EDIT 1:

I use smp_call_function_single to get IDTR of a special CPU:

static void smp_get_idtr(void *info)
{
    struct desc_ptr *idt_ptr = info;
    store_idt(idt_ptr);
}

static void idtr_per_cpu_show(void)
{
    int cpu;

    for_each_online_cpu(cpu) {
        struct desc_ptr *idt_ptr = kzalloc(...);
        
        /* ... */
        
        smp_call_function_single(cpu, smp_get_idtr, idt_ptr, 1);
        
        /* print address/base */

        kfree(idt_ptr);
    }
}

EDIT 2:

I found that, I used the same scheme as above to load newly copied IDT, so all CPUs had the same table (Yeah, I'm lazy one who just copy-pasted the function). I fixed that and before reloading I can still see the same physical address of initial IDT, but once I reloaded table with my own copy on say CPU 0 then I have my own IDT's address only for that CPU, others still have initial physical and virtual address (e.g. like below for 4-socket VM):

CPU 0:
    base   = 0xffff8b3c484dc000
    limit  = 0xfff
    
    phys   = 0x0x00000001084dc000

CPU 1:
    base   = 0xfffffe0000000000
    limit  = 0xfff

    phys   = 0x0x000000007659b000

CPU 2:
    base   = 0xfffffe0000000000
    limit  = 0xfff

    phys   = 0x0x000000007659b000

CPU 3:
    base   = 0xfffffe0000000000
    limit  = 0xfff

    phys   = 0x0x000000007659b000

Solution

  • IDT IS shared between CPUs

    During initialization phase of secondary CPUs in start_secondary there's a call to cpu_init_exception_handling:

    /*
     * Activate a secondary processor.
     */
    static void notrace start_secondary(void *unused)
    {
        /* ... */
    
        cpu_init_exception_handling();
    
        /* ... */
    

    At this moment "main" IDT was set up for CPU 0, but idt_table hasn't been changed, so cpu_init_exception_handling loads that IDT's address:

    /*
     * Setup everything needed to handle exceptions from the IDT, including the IST
     * exceptions which use paranoid_entry().
     */
    void cpu_init_exception_handling(void)
    {
        /* ... */
    
        /* Finally load the IDT */
        load_current_idt();
    }
    

    which expands to:

    void load_current_idt(void)
    {
        /* ... */
    
        load_idt(&idt_descr);
    }
    

    and since CPU 0 initialization idt_descr wasn't changed and IDT is not writable it loads the same IDT for the rest of CPUs.