Search code examples
assemblyx86memory-alignmentosdevmemory-segmentation

Is the address checked by the memory alignment check mechanism a effective address, a linear address or a physical address?


I am studying the issue of alignment check. But I don't know whether the processor is checking on effective addresses, linear addresses or physical addresses, or all checks.

For example, the effective address of a data has been aligned, but the linear address formed by adding the base address of the segment descriptor is no longer aligned, and the processor throws an #AC exception at this time.


Solution

  • TL;DR

    I think it's the linear address.

    The test result is A B B A C B (by row)

    Keep reading for the test methodology and the test code.


    It's not the effective address (aka the offset)

    To test this it suffices to use a segment with a base that is not aligned.
    In my test, I've used a 32-bit data segment with a base of 1.

    The test is a "simple" legacy (i.e. non-UEFI) bootloader that will create said descriptor and test accessing the offsets 0x7000 and 0x7003 with DWORD width.
    The former will generate an #AC, the latter won't.

    This demonstrates that it's not the offset alone that is checked, because 0x7000 is an aligned offset that still faults with a base of 1.

    This is expected.

    I have a tradition of using a minimal output for the tests, so an explanation is mandatory.

    First, six blue As are written in six consecutive rows in the VGA buffer.
    Then before executing a load, a pointer is set to each of these As.
    The #AC handler will increment the pointed-to byte.
    So, if a row contains a B, the access generated an #AC.

    The first four rows are used for:

    1. Access using a segment with base 0 and offset 0x7000h. As expected, no #AC
    2. Access using a segment with base 0 and offset 0x7003h. As expected, #AC
    3. Access using a segment with base 1 and offset 0x7000h. This does generate an #AC thereby demonstrating that it's either the linear of the physical address that's checked.
    4. Access using a segment with base 1 and offset 0x7003h. This doesn't generate an #AC, confirming point 3.

    The next two rows are used to check the linear address vs the physical address.

    It's not the physical address: #AC instead of #PF

    The #AC test only alignments up to 16 bytes but a linear and a physical address share the same alignment up to 4KiB at least.
    We would need a memory access that requires a data structure aligned on, at least, 8KiB to test if it's the physical or the linear address that's used for the check.

    Unfortunately, there is no such access (yet).

    I thought I could still gather some insight by checking what exception is generated when a misaligned load target an unmapped page.
    If a #PF is generated, the CPU will first translate the linear address and will then check. On the other way around, if an #AC is generated, the CPU will check before translating (remember that the page is not mapped).

    I modified the test to enable page, map the minimum amount of pages and handle a #PF by incrementing the byte under the pointer by two.

    When a load is executed, the corresponding A will either become a B if an #AC is generated or a C if a #PF is generated.
    Note that both are faults (eip on the stack points to the offending instruction) but both handlers resume from the next instruction (so each load is executed only once).

    These are the meaning of the last two rows:

    1. Access to an unmapped page using a segment with base 1 and offset 0x7003h. This generates a #PF as expected (the access is aligned so the only exception possible here is a #PF).
    2. Access to an unmapped page using a segment with base 1 and offset 0x7000h. This generates an #AC, therefore the CPU checks the alignment before attempting to translate the address.

    Point 6 seems to suggest that the CPU will perform the check on the linear address since no access to the page table is done.
    In point 6 both exceptions could be generated, the fact that #PF is not generated means that the CPU hasn't attempted translating the address when the alignment check is performed. (Or that #AC logically takes precedence. But likely the hardware wouldn't do a page walk before taking the #AC exception, even if it did probe the TLB after doing the base+offset calculation.)

    Test code

    The code is messy and more cumbersome than one may expect.
    The main hindrance is #AC only working at CPL=3.
    So we need to create the CPL=3 descriptor, plus a TSS segment and a TSS descriptor.
    To handle the exception we need an IDT and we also need paging.

    BITS 16
    ORG 7c00h
    
      ;Skip the BPB (My BIOS actively overwrite it)
      jmp SHORT __SKIP_BPB__
    
      ;I eyeballed the BPB size (at least the part that may be overwritten)
      TIMES 40h db 0
    
    __SKIP_BPB__:
      ;Set up the segments (including CS)
      xor ax, ax
      mov ds, ax
      mov ss, ax
      xor sp, sp
      jmp 0:__START__
    
    __START__:
      ;Clear and set the video mode (before we switch to PM)
      mov ax, 03h
      int 10h
      
      ;Disable the interrupts and load the GDT and IDT
      cli
      lgdt [GDT]
      lidt [IDT]
      
      ;Enable PM
      mov eax, cr0
      or al, 1
      mov cr0, eax
      
    
      ;Write a TSS segment, we zeros 104h DWORDs and only set the SS0:ESP0 fields
      mov di, 7000h
      mov cx, 104h
      xor ax, ax
      rep stosd
      
      mov DWORD [7004h], 7c00h    ;ESP0
      mov WORD [7008h], 10h       ;SS0
      
      
      ;Set AC in EFLAGS
      pushfd
      or DWORD [esp], 1 << 18 
      popfd
      
      ;Set AM in CR0
      mov eax, cr0
      or eax, 1<<18
      mov cr0, eax
    
      ;OK, let's go in PM for real
      jmp 08h:__32__
      
    __32__:
      BITS 32
    
      ;Set the stack and DS
      mov ax, 10h 
      mov ss, ax 
      mov esp, 7c00h
      mov ds, ax
      
      ;Set the #AC handler
      mov DWORD [IDT+8+17*8], ((AC_handler-$$+7c00h) & 0ffffh) | 00080000h
      mov DWORD [IDT+8+17*8+4], 8e00h | (((AC_handler-$$+7c00h) >> 16) << 16)
      ;Set the #PF handler
      mov DWORD [IDT+8+14*8], ((PF_handler-$$+7c00h) & 0ffffh) | 00080000h
      mov DWORD [IDT+8+14*8+4], 8e00h | (((PF_handler-$$+7c00h) >> 16) << 16)
    
      ;Set the TSS
      mov ax, 30h
      ltr ax
    
      ;Paging is:
      ;7xxx -> Identity mapped (contains code and all the stacks and system structures)
      ;8xxx -> Not present
      ;9xxx -> Mapped to the VGA text buffer (0b8xxxh)
      ;Note that the paging structures are at 6000h and 5000h, this is OK as these are physical addresses
    
      ;Set the Page Directory at 6000h
      mov eax, 6000h
      mov cr3, eax
      ;Set the Page Directory Entry 0 (for 00000000h-00300000h) to point to a Page Table at 5000h 
      mov DWORD [eax], 5007h
      ;Set the Page Table Entry 7 (for 00007xxxh) to identity map and Page Table Entry 8 (for 000008xxxh) to be not present
      mov eax, 5000h + 7*4
      mov DWORD [eax], 7007h
      mov DWORD [eax+4], 8006h
      ;Map page 9000h to 0b8000h
      mov DWORD [eax+8],  0b801fh
    
      ;Enable paging
      mov eax, cr0 
      or eax, 80000000h
      mov cr0, eax
    
      ;Change privilege (goto CPL=3)
      push DWORD 23h            ;SS3
      push DWORD 07a00h         ;ESP3
      push DWORD 1bh            ;CS3
      push DWORD __32user__     ;EIP3
      retf 
    
    __32user__:
    
      ; 
      ;Here we are at CPL=3
      ;
    
      ;Set DS to segment with base 0 and ES to one with base 1
      mov ax, 23h
      mov ds, ax
      mov ax, 2bh
      mov es, ax
    
      ;Write six As in six consecutive row (starting from the 4th)
      xor ecx, ecx 
      mov ecx, 6
      mov ebx, 9000h + 80*2*3   ;Points to 4th row in the VGA text framebuffer
    .init_markers:
      mov WORD [ebx], 0941h
      add bx, 80*2
      dec ecx 
      jnz .init_markers
    
      ;ebx points to the first A
      sub ebx, 80*2 * 6
    
      ;Base 0 + Offset 0 = 0, Should not fault (marker stays A)
      mov eax, DWORD [ds:7000h]
    
      ;Base 0 + Offset 1 = 1, Should fault (marker becomes B)
      add bx, 80*2
      mov eax, DWORD [ds:7001h]
    
      ;Base 1 + Offset 0 = 1, Should fault (marker becomes B)
      add bx, 80*2
      mov eax, DWORD [es:7000h]
    
      ;Base 1 + Offset 3 = 4, Should not fault (marker stays A)
      add bx, 80*2
      mov eax, DWORD [es:7003h]
    
      ;Base 1 + Offset 3 = 4 but page not mapped, Should #PF (markers becomes C)
      add bx, 80*2
      mov eax, DWORD [es:8003h]
    
      ;Base 1 + Offset 0 = 1 but page not mapped, if #PF the markers becomes C, if #AC the markers becomes B
      add bx, 80*2
      mov eax, DWORD [es:8000h]
    
      ;Loop foever (cannot use HLT at CPL=3)
      jmp $
      
    
    ;#PF handler
    ;Increment the byte pointed by ebx by two
    PF_handler:
      add esp, 04h        ;Remove the error code
      add DWORD [esp], 6  ;Skip the current instruction
      add BYTE [ebx], 2   ;Increment
    
      iret 
    
    ;#AC handler
    ;Same as the #PF handler but increment by one
    AC_handler:
      add esp, 04h
      add DWORD [esp], 6
      inc BYTE [ebx]
    
      iret
      
    
      ;The GDT (entry 0 is used as the content for GDTR)
      GDT dw GDT.end-GDT - 1
          dd GDT
          dw 0
          
          dd 0000ffffh, 00cf9a00h   ;08 Code, 32, DPL 0
          dd 0000ffffh, 00cf9200h       ;10 Data, 32, DPL 0
          
          dd 0000ffffh, 00cffa00h       ;18 Code, 32, DPL 3
          dd 0000ffffh, 00cff200h       ;20 Data, 32, DPL 3
          dd 0001ffffh, 00cff200h       ;28 Data, 32, DPL 3, Base = 1
    
          dd 7000ffffh, 00cf8900h       ;30 Data, 32, 0 (TSS)
    
          .end: 
    
      ;The IDT, to save space the entries are set dynamically      
      IDT dw 18*8-1
          dd IDT+8
          dw 0
          
    
      ;Signature
      TIMES 510-($-$$) db 0
      dw 0aa55h
    

    Does it make sense to check the linear address?

    I don't think it's particularly relevant. As noted above, a linear and a physical address share the same alignment up to 4KiB.
    So, for now, it doesn't matter at all.
    Right now, accesses wider than 64 bytes would still need to be performed in chunks and this limit is set deep in the microarchitectures of the x86 CPUs.