Search code examples
kernelx86-64pagingosdevpage-tables

Why does QEMU return the wrong addresses when filling the higher half of the PML4?


I'm writing a small x86-64 OS I boot with UEFI. I am trying to make the kernel a higher half kernel by shifting the executable of the kernel to 0x800000000000. This address should be halfway through the PML4. Basically, I should fill entry 256 of the PML4 to address this higher half. I tried to do that but my code was triple faulting. Since I test the kernel on QEMU and debug with GDB I thus used monitor info mem in GDB to see the virtual to physical address mapping. It returned the following:

(gdb) monitor info mem
0000000000000000-0000000000400000 0000000000400000 -rw
ffff800000000000-ffff800000c00000 0000000000c00000 -rw

Instead of having mapped 0x800000000000 it mapped ffff800000000000. This is probably why the code is triple faulting when I jump to the higher half. Here's a small example of the code I have:

typedef unsigned char UINT8;
typedef unsigned short UINT16;
typedef unsigned int UINT32;
typedef unsigned long UINT64;

struct GDT{
    UINT64 nullDescriptor;
    
    UINT16 codeLimit;
    UINT16 codeBaseLow;
    UINT8 codeBaseMid;
    UINT8 codeFlags;
    UINT8 codeLimitMid;
    UINT8 codeBaseHigh;
    
    UINT16 dataLimit;
    UINT16 dataBaseLow;
    UINT8 dataBaseMid;
    UINT8 dataFlags;
    UINT8 dataLimitMid;
    UINT8 dataBaseHigh;
}__attribute__((packed));

struct GDTR{
    UINT16 size;
    GDT* address;
}__attribute__((packed));

void main(){
    //Identity mapping for the first 4MB
    UINT64* pml4Ptr = (UINT64*)0x200000;
    *pml4Ptr = 0x20101b;
    
    UINT64* pdpPtr = (UINT64*)0x201000;
    *pdpPtr = 0x20201b;
    
    UINT64* pdPtr = (UINT64*)0x202000;
    *pdPtr = 0x20301b;
    *(pdPtr + 1) = 0x20401b;
    
    UINT64* ptPtr = (UINT64*)0x203000;
    UINT64 physAddr = 0x1b;
    for (UINT32 i = 0; i < 2 * 512; i++){
        *(ptPtr + i) = physAddr;
        physAddr += 0x1000;
    }
    
    //Kernel mapping for the higher half
    *(pml4Ptr + 256) = 0x20601b; //When this is less then 256 I get the right addresses
    
    pdpPtr = (UINT64*)0x206000;
    *pdpPtr = 0x20701b;
    
    pdPtr = (UINT64*)0x207000;
    *pdPtr = 0x20801b;
    *(pdPtr + 1) = 0x20901b;
    *(pdPtr + 2) = 0x20a01b;
    *(pdPtr + 3) = 0x20b01b;
    *(pdPtr + 4) = 0x20c01b;
    *(pdPtr + 5) = 0x20d01b;
    
    ptPtr = (UINT64*)0x208000;
    physAddr = 0x1b;
    for (UINT32 i = 0; i < 6 * 512; i++){
        *(ptPtr + i) = physAddr;
        physAddr += 0x1000;
    }

    asm volatile(
    "movq $0x200018, %rax\n\t"
    "mov %rax, %cr3\n\t"
    "movq $0x375000, %rsp\n\t"
    );
        
    GDT gdt = {
        .nullDescriptor = 0,
        
        .codeLimit = 0x0000,
        .codeBaseLow = 0,
        .codeBaseMid = 0,
        .codeFlags = 0x9a,
        .codeLimitMid = 0xaf,
        .codeBaseHigh = 0,
        
        .dataLimit = 0x0000,
        .dataBaseLow = 0,
        .dataBaseMid = 0,
        .dataFlags = 0x92,
        .dataLimitMid = 0x00,
        .dataBaseHigh = 0
    };
    
    GDT* gdtAddr = &gdt;
    GDTR gdtr = { 23, gdtAddr };
    GDTR* gdtrAddr = &gdtr;
    
    asm volatile("lgdt (%0)" : : "r"(gdtrAddr));
    
    asm volatile(
    "sub $16, %rsp\n\t"
    "movq $8, 8(%rsp)\n\t"
    "movabsq $fun, %rax\n\t"
    "mov %rax, (%rsp)\n\t"
    "lretq\n\t"
    "fun:\n\t"
    "movq $0x10, %rax\n\t"
    "mov %ax, %ss\n\t"
    "mov %ax, %es\n\t"
    "mov %ax, %ds\n\t"
    "mov %ax, %gs\n\t"
    "mov %ax, %fs\n\t"
    "hlt"
    );
}

I have a pointer to the PML4 address then I do *(pml4Ptr + 256) = 0x20601b;. This should map entry 256 of PML4 to 0x206000 without caching. The rest of the small code snippet under the kernel mapping section should map 12MB of data in the higher half of the virtual address starting from physical address 0. Instead, I get the addresses above which seems weird.

If I set entry 255 of the PML4 with the same code (*(pml4Ptr + 255) = 0x20601b;), I get the result below:

(gdb) monitor info mem
0000000000000000-0000000000400000 0000000000400000 -rw
00007f8000000000-00007f8000c00000 0000000000c00000 -rw

I actually get the right addresses!? Is there a known bug that QEMU doesn't address the PML4 properly when you fill the higher half or is there something I overlooked in my code?

I also looked at the page tables directly (since they are placed at static positions in RAM). I get the following result:

user@user-System-Product-Name:~$ hexdump -C result.bin
00000000  3b 10 20 00 00 00 00 00  00 00 00 00 00 00 00 00  |;. .............|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000800  1b 60 20 00 00 00 00 00  00 00 00 00 00 00 00 00  |.` .............|
00000810  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001000  3b 20 20 00 00 00 00 00  00 00 00 00 00 00 00 00  |;  .............|
00001010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00002000  1b 30 20 00 00 00 00 00  3b 40 20 00 00 00 00 00  |.0 .....;@ .....|
00002010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00003000  1b 00 00 00 00 00 00 00  1b 10 00 00 00 00 00 00  |................|
00003010  1b 20 00 00 00 00 00 00  1b 30 00 00 00 00 00 00  |. .......0......|
00003020  1b 40 00 00 00 00 00 00  1b 50 00 00 00 00 00 00  |[email protected]......|
00003030  1b 60 00 00 00 00 00 00  1b 70 00 00 00 00 00 00  |.`.......p......|
00003040  1b 80 00 00 00 00 00 00  1b 90 00 00 00 00 00 00  |................|
00003050  1b a0 00 00 00 00 00 00  1b b0 00 00 00 00 00 00  |................|
00003060  1b c0 00 00 00 00 00 00  1b d0 00 00 00 00 00 00  |................|
00003070  1b e0 00 00 00 00 00 00  1b f0 00 00 00 00 00 00  |................|
00003080  1b 00 01 00 00 00 00 00  1b 10 01 00 00 00 00 00  |................|
00003090  1b 20 01 00 00 00 00 00  1b 30 01 00 00 00 00 00  |. .......0......|
000030a0  1b 40 01 00 00 00 00 00  1b 50 01 00 00 00 00 00  |[email protected]......|
000030b0  1b 60 01 00 00 00 00 00  1b 70 01 00 00 00 00 00  |.`.......p......|
000030c0  1b 80 01 00 00 00 00 00  1b 90 01 00 00 00 00 00  |................|
000030d0  1b a0 01 00 00 00 00 00  1b b0 01 00 00 00 00 00  |................|
000030e0  1b c0 01 00 00 00 00 00  1b d0 01 00 00 00 00 00  |................|
000030f0  1b e0 01 00 00 00 00 00  1b f0 01 00 00 00 00 00  |................|
00003100  1b 00 02 00 00 00 00 00  1b 10 02 00 00 00 00 00  |................|
00003110  1b 20 02 00 00 00 00 00  1b 30 02 00 00 00 00 00  |. .......0......|
00003120  1b 40 02 00 00 00 00 00  1b 50 02 00 00 00 00 00  |[email protected]......|
00003130  1b 60 02 00 00 00 00 00  1b 70 02 00 00 00 00 00  |.`.......p......|
00003140  1b 80 02 00 00 00 00 00  1b 90 02 00 00 00 00 00  |................|
00003150  1b a0 02 00 00 00 00 00  1b b0 02 00 00 00 00 00  |................|
00003160  1b c0 02 00 00 00 00 00  1b d0 02 00 00 00 00 00  |................|
00003170  1b e0 02 00 00 00 00 00  1b f0 02 00 00 00 00 00  |................|
00003180  1b 00 03 00 00 00 00 00  1b 10 03 00 00 00 00 00  |................|
00003190  1b 20 03 00 00 00 00 00  1b 30 03 00 00 00 00 00  |. .......0......|
000031a0  1b 40 03 00 00 00 00 00  1b 50 03 00 00 00 00 00  |[email protected]......|
000031b0  1b 60 03 00 00 00 00 00  1b 70 03 00 00 00 00 00  |.`.......p......|
000031c0  1b 80 03 00 00 00 00 00  1b 90 03 00 00 00 00 00  |................|
000031d0  1b a0 03 00 00 00 00 00  1b b0 03 00 00 00 00 00  |................|
000031e0  1b c0 03 00 00 00 00 00  1b d0 03 00 00 00 00 00  |................|
000031f0  1b e0 03 00 00 00 00 00  1b f0 03 00 00 00 00 00  |................|
00003200  1b 00 04 00 00 00 00 00  1b 10 04 00 00 00 00 00  |................|
00003210  1b 20 04 00 00 00 00 00  1b 30 04 00 00 00 00 00  |. .......0......|
00003220  1b 40 04 00 00 00 00 00  1b 50 04 00 00 00 00 00  |[email protected]......|
00003230  1b 60 04 00 00 00 00 00  1b 70 04 00 00 00 00 00  |.`.......p......|
00003240  1b 80 04 00 00 00 00 00  1b 90 04 00 00 00 00 00  |................|
00003250  1b a0 04 00 00 00 00 00  1b b0 04 00 00 00 00 00  |................|
00003260  1b c0 04 00 00 00 00 00  1b d0 04 00 00 00 00 00  |................|
00003270  1b e0 04 00 00 00 00 00  1b f0 04 00 00 00 00 00  |................|
00003280  1b 00 05 00 00 00 00 00  1b 10 05 00 00 00 00 00  |................|
...
...
00004fa0  1b 40 3f 00 00 00 00 00  1b 50 3f 00 00 00 00 00  |.@?......P?.....|
00004fb0  1b 60 3f 00 00 00 00 00  1b 70 3f 00 00 00 00 00  |.`?......p?.....|
00004fc0  1b 80 3f 00 00 00 00 00  1b 90 3f 00 00 00 00 00  |..?.......?.....|
00004fd0  1b a0 3f 00 00 00 00 00  1b b0 3f 00 00 00 00 00  |..?.......?.....|
00004fe0  1b c0 3f 00 00 00 00 00  1b d0 3f 00 00 00 00 00  |..?.......?.....|
00004ff0  1b e0 3f 00 00 00 00 00  1b f0 3f 00 00 00 00 00  |..?.......?.....|
00005000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00006000  1b 70 20 00 00 00 00 00  00 00 00 00 00 00 00 00  |.p .............|
00006010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00007000  1b 80 20 00 00 00 00 00  1b 90 20 00 00 00 00 00  |.. ....... .....|
00007010  1b a0 20 00 00 00 00 00  1b b0 20 00 00 00 00 00  |.. ....... .....|
00007020  1b c0 20 00 00 00 00 00  1b d0 20 00 00 00 00 00  |.. ....... .....|
00007030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00008000  1b 00 00 00 00 00 00 00  1b 10 00 00 00 00 00 00  |................|
00008010  1b 20 00 00 00 00 00 00  1b 30 00 00 00 00 00 00  |. .......0......|
00008020  1b 40 00 00 00 00 00 00  1b 50 00 00 00 00 00 00  |[email protected]......|
00008030  1b 60 00 00 00 00 00 00  1b 70 00 00 00 00 00 00  |.`.......p......|
00008040  1b 80 00 00 00 00 00 00  1b 90 00 00 00 00 00 00  |................|
00008050  1b a0 00 00 00 00 00 00  1b b0 00 00 00 00 00 00  |................|
00008060  1b c0 00 00 00 00 00 00  1b d0 00 00 00 00 00 00  |................|
00008070  1b e0 00 00 00 00 00 00  1b f0 00 00 00 00 00 00  |................|
00008080  1b 00 01 00 00 00 00 00  1b 10 01 00 00 00 00 00  |................|
00008090  1b 20 01 00 00 00 00 00  1b 30 01 00 00 00 00 00  |. .......0......|
000080a0  1b 40 01 00 00 00 00 00  1b 50 01 00 00 00 00 00  |[email protected]......|
000080b0  1b 60 01 00 00 00 00 00  1b 70 01 00 00 00 00 00  |.`.......p......|
000080c0  1b 80 01 00 00 00 00 00  1b 90 01 00 00 00 00 00  |................|
000080d0  1b a0 01 00 00 00 00 00  1b b0 01 00 00 00 00 00  |................|
000080e0  1b c0 01 00 00 00 00 00  1b d0 01 00 00 00 00 00  |................|
000080f0  1b e0 01 00 00 00 00 00  1b f0 01 00 00 00 00 00  |................|
00008100  1b 00 02 00 00 00 00 00  1b 10 02 00 00 00 00 00  |................|
00008110  1b 20 02 00 00 00 00 00  1b 30 02 00 00 00 00 00  |. .......0......|
00008120  1b 40 02 00 00 00 00 00  1b 50 02 00 00 00 00 00  |[email protected]......|
00008130  1b 60 02 00 00 00 00 00  1b 70 02 00 00 00 00 00  |.`.......p......|
00008140  1b 80 02 00 00 00 00 00  1b 90 02 00 00 00 00 00  |................|
00008150  1b a0 02 00 00 00 00 00  1b b0 02 00 00 00 00 00  |................|
00008160  1b c0 02 00 00 00 00 00  1b d0 02 00 00 00 00 00  |................|
00008170  1b e0 02 00 00 00 00 00  1b f0 02 00 00 00 00 00  |................|
00008180  1b 00 03 00 00 00 00 00  1b 10 03 00 00 00 00 00  |................|
00008190  1b 20 03 00 00 00 00 00  1b 30 03 00 00 00 00 00  |. .......0......|
000081a0  1b 40 03 00 00 00 00 00  1b 50 03 00 00 00 00 00  |[email protected]......|
000081b0  1b 60 03 00 00 00 00 00  1b 70 03 00 00 00 00 00  |.`.......p......|
000081c0  1b 80 03 00 00 00 00 00  1b 90 03 00 00 00 00 00  |................|
000081d0  1b a0 03 00 00 00 00 00  1b b0 03 00 00 00 00 00  |................|
000081e0  1b c0 03 00 00 00 00 00  1b d0 03 00 00 00 00 00  |................|
000081f0  1b e0 03 00 00 00 00 00  1b f0 03 00 00 00 00 00  |................|
00008200  1b 00 04 00 00 00 00 00  1b 10 04 00 00 00 00 00  |................|
00008210  1b 20 04 00 00 00 00 00  1b 30 04 00 00 00 00 00  |. .......0......|
00008220  1b 40 04 00 00 00 00 00  1b 50 04 00 00 00 00 00  |[email protected]......|
00008230  1b 60 04 00 00 00 00 00  1b 70 04 00 00 00 00 00  |.`.......p......|
00008240  1b 80 04 00 00 00 00 00  1b 90 04 00 00 00 00 00  |................|
00008250  1b a0 04 00 00 00 00 00  1b b0 04 00 00 00 00 00  |................|
00008260  1b c0 04 00 00 00 00 00  1b d0 04 00 00 00 00 00  |................|
00008270  1b e0 04 00 00 00 00 00  1b f0 04 00 00 00 00 00  |................|
00008280  1b 00 05 00 00 00 00 00  1b 10 05 00 00 00 00 00  |................|
00008290  1b 20 05 00 00 00 00 00  1b 30 05 00 00 00 00 00  |. .......0......|
000082a0  1b 40 05 00 00 00 00 00  1b 50 05 00 00 00 00 00  |[email protected]......|
000082b0  1b 60 05 00 00 00 00 00  1b 70 05 00 00 00 00 00  |.`.......p......|
000082c0  1b 80 05 00 00 00 00 00  1b 90 05 00 00 00 00 00  |................|
000082d0  1b a0 05 00 00 00 00 00  1b b0 05 00 00 00 00 00  |................|
000082e0  1b c0 05 00 00 00 00 00  1b d0 05 00 00 00 00 00  |................|
000082f0  1b e0 05 00 00 00 00 00  1b f0 05 00 00 00 00 00  |................|
00008300  1b 00 06 00 00 00 00 00  1b 10 06 00 00 00 00 00  |................|
00008310  1b 20 06 00 00 00 00 00  1b 30 06 00 00 00 00 00  |. .......0......|
00008320  1b 40 06 00 00 00 00 00  1b 50 06 00 00 00 00 00  |[email protected]......|
00008330  1b 60 06 00 00 00 00 00  1b 70 06 00 00 00 00 00  |.`.......p......|
00008340  1b 80 06 00 00 00 00 00  1b 90 06 00 00 00 00 00  |................|
00008350  1b a0 06 00 00 00 00 00  1b b0 06 00 00 00 00 00  |................|
00008360  1b c0 06 00 00 00 00 00  1b d0 06 00 00 00 00 00  |................|
00008370  1b e0 06 00 00 00 00 00  1b f0 06 00 00 00 00 00  |................|
00008380  1b 00 07 00 00 00 00 00  1b 10 07 00 00 00 00 00  |................|
00008390  1b 20 07 00 00 00 00 00  1b 30 07 00 00 00 00 00  |. .......0......|
000083a0  1b 40 07 00 00 00 00 00  1b 50 07 00 00 00 00 00  |[email protected]......|
000083b0  1b 60 07 00 00 00 00 00  1b 70 07 00 00 00 00 00  |.`.......p......|
000083c0  1b 80 07 00 00 00 00 00  1b 90 07 00 00 00 00 00  |................|
000083d0  1b a0 07 00 00 00 00 00  1b b0 07 00 00 00 00 00  |................|
000083e0  1b c0 07 00 00 00 00 00  1b d0 07 00 00 00 00 00  |................|

This dump seems correct. It starts at 0x200000. It means that address 0 in the dump is 0x200000, address 0x1000 is 0x201000 etc.

I compile the code with this script:

g++ -fomit-frame-pointer --static -ffreestanding -nostdlib -mgeneral-regs-only -mno-red-zone -c -m64 Startup/Source/Main.cpp -oStartup/Object/Main.o
ld -entry main --oformat elf64-x86-64 --no-dynamic-linker -static -nostdlib -Ttext-segment=300000 Startup/Object/Main.o -ostartup.elf

Can anyone pinpoint any problem with the code or anything I overlooked?


Solution

  • 0x800000000000 isn't a canonical address, thus there is no place in the page directory layout for it. (And attempts to dereference it will #GP fault instead of triggering a TLB miss => page walk.)

    You have PML4 so you the top 16 virtual address bits must be copies of bit #47. i.e. the pointer must be representable as a 48-bit value sign-extended to 64-bit.
    i.e. ((int64_t)addr << 16) >> 16 == addr must be true.
    (With the right shift using sar arithmetic right shift).
    (x86-64 canonical address?).

    The high half of the usable (canonical) range does in fact start at ffff800000000000, and the PML4E for it does come right after the one for the top of the low half, 00007f8000000000.

    Address canonical form and pointer arithmetic includes an ASCII diagram of the hole in virtual address space.

    You're correct about where the top of the low half is, but you forgot to sign-extend 0x800000000000 to 64-bit to reach the bottom of the high half. Pointers are still nominally 64-bit, not just truncated to 48-bit. That's why addresses like ffff800000000000 can exist at all.


    If you had PML5 enabled (e.g. Ice Lake hardware or a software emulation of that feature), an extra level of page directories would get you up to 57 bits of virtual address space, making it possible to address 0x800000000000.

    See also x86-64: canonical addresses and actual available range