Search code examples
assemblylinux-kernelx86real-modeprotected-mode

Transition from real to protected mode in the Linux kernel


I am currently studying low level organization of operating systems. In order to achive that I am trying to understand how Linux kernel is loaded.

A thing that I cannot comprehend is the transition from 16-bit (real mode) to 32-bit (protected mode). It happens in this file.

The protected_mode_jump function performs various auxiliary calculations for 32-bit code that is executed later, then enables PE bit in the CR0 reguster

    movl    %cr0, %edx
    orb $X86_CR0_PE, %dl    # Protected mode
    movl    %edx, %cr0

and after that performs long jump to 32-bit code:

    # Transition to 32-bit mode
    .byte   0x66, 0xea      # ljmpl opcode
2:  .long   in_pm32         # offset
    .word   __BOOT_CS       # segment

As far as I understand in_pm32 is the address of the 32-bit function which is declared right below the protected_mode_jump:

    .code32
    .section ".text32","ax"
GLOBAL(in_pm32)
    # some code
    # ...
    # some code
ENDPROC(in_pm32)

The __BOOT_CS sector base is 0 (the GDT is set beforehand here), so that means that offset should be basically absolute address of the in_pm32 function.

That's the issue. During machine code generation the assembler/linker should not know the absolute address of the in_pm32 function, because it does not know where it will be loaded in the memory in the real mode (various bootloaders can occupy various amounts of space, and the real mode kernel is loaded just after a bootloader).

Moreover, the linker script (setup.ld in the same folder) sets the origin of the code as 0, so seems like in_pm32 address will be the offset from the beginning of the real mode kernel. It should work just fine with 16-bit code because CS register is set properly, but when long jump happens the CPU is already in protected mode, so a relative offset should not work.

So my question: Why does the long jump in Protected Mode (.byte 0x66, 0xea) sets the proper code position if the offset (.long in_pm32) is relative?

Seems like I am missing something really important.


Solution

  • It appears that your question really is about how the offset stored at the following line can possibly work since it is relative to the start of the segment, not necessarily the start of memory:

     2:  .long   in_pm32         # offset
    

    It is true that in_pm32 is relative to the offset the linker script uses. In particular the linker script has:

    . = 0;
    .bstext     : { *(.bstext) }
    .bsdata     : { *(.bsdata) }
    
    . = 495;
    .header     : { *(.header) }
    .entrytext  : { *(.entrytext) }
    .inittext   : { *(.inittext) }
    .initdata   : { *(.initdata) }
    __end_init = .;
    
    .text       : { *(.text) }
    .text32     : { *(.text32) } 
    

    The Virtual Memory Address is set to zero (and subsequently 495), so one would think that anything in the .text32 section will have to be fixed in low memory. This would be a correct observation had it not been for these instructions in protected_mode_jump:

        xorl    %ebx, %ebx
        movw    %cs, %bx
        shll    $4, %ebx
        addl    %ebx, 2f
    
    [snip]
    
        # Transition to 32-bit mode
        .byte   0x66, 0xea      # ljmpl opcode
    2:  .long   in_pm32         # offset
        .word   __BOOT_CS       # segment
    

    There is a manually encoded FAR JMP at the end that is used to set the CS selector to a 32-bit code descriptor to finalize the transition to 32-bit protected mode. But the key thing to observe are in these lines specifically:

        xorl    %ebx, %ebx
        movw    %cs, %bx
        shll    $4, %ebx
        addl    %ebx, 2f
    

    This takes the value in CS and shifts it left by 4 bits (multiply by 16) and then adds it to the value stored at label 2f. This is the way you take a real mode segment:offset pair and convert it into a linear address (which is the same as a physical address in this case). Label 2f is in fact the offset in_pm32 in this line:

    2:  .long   in_pm32         # offset
    

    When those instruction are complete, the long word value in_pm32 in the FAR JMP will be adjusted (at run time) by adding the linear address of the current real mode code segment to the value in_pm32. This .long (DWORD) value will be replaced with (CS<<4)+in_pm32.

    This code was designed to be relocatable to any real mode segment. The final linear address is computed at run time before the FAR JMP. This is in effect self-modifying code.