Search code examples
assemblyx86-16bootloaderboot

I can't seem to get this 16-bit memory detection assembly code to work


A while ago I was working on a simple bootloader project and I decided to start working on it again. Anyways, I'm trying to detect memory using BIOS INT=15H EAX=E820h. I used the interrupt in a loop and allocated space for a memory map to hold all the entries. Now I'm trying to parse the entries starting from the last entry. My goal is to find the highest 1MB area I can use to hold a file I'm reading from the disk.

This is my code so far. It's being tested on Bochs 2.6.11, with 32MB of RAM and everything else set to default settings. Of course, it's 16-bit real mode code.

[bits 16]

BlEntry:
    
    [...] ; above here all I did was set all segments to 0, except DS which is set to 7C0h
    
    ;
    ; Build the system's memory map
    ;

    xor ebx, ebx ; HANDLE for next call
    mov edx, 534D4150h ; magic
    mov ecx, 20 ; buffer size
    sub sp, cx ; allocate 20 bytes
    mov di, sp ; di points to buffer
BlpBuildMmBegin:
    mov eax, 0E820h ; magic
    mov [di+20], DWORD 1 ; Request ACPI compat-entry
    int 15h
    jc BlInt10Failure
    cmp eax, edx ; magics should match
    jne BlInt10Failure
    test ebx, ebx ; are we finished?
    je BlpFindEiArea ; jump out
    sub sp, cx ; allocate 20 bytes again
    mov di, sp ; setup di again
    jmp BlpBuildMmBegin ; do it all again
BlpBuildMmEnd:
     
    ;
    ; Load the EI into memory
    ;
    
    ; try to find the highest address we can map the EI to
BlpFindEiAreaStart:
    add di, 20 ; move onto the next entry
    cmp di, bp
    je BlNoMemoryFailure
BlpFindEiArea:
    cmp [di+16], DWORD 1 ; check if we can use this memory region
    jne BlpFindEiAreaStart ; of not, try again
    cmp [di+8], DWORD 100000h ; 1MB EI file size
    jge BlpFindEiAreaEnd
    jmp BlpFindEiAreaStart
BlpFindEiAreaEnd:
    mov eax, DWORD [di]
    cli
    hlt ;hang the system for now, I still need to add more functionality here

As the code shows, when I've traversed everything, execution jumps to BlNoMemoryFailure, which simply uses teletype output to show a No Memory! output then hangs the system. That's what the problem is - I can't get this code to stop saying the message. Do I have the structures wrong? I'm referencing this website as I write my code http://www.uruk.org/orig-grub/mem64mb.html


Solution

  • Originally int 0x15, eax=0xE820 returned a 20-byte structure. This was extended to 24 bytes by a version of ACPI (I think it was ACPI 3.0 but didn't check and could be wrong) that introduced a new "flags" field to the structure.

    This code allocates space on the stack for the 20 byte structure (without the extra flags field):

        mov ecx, 20 ; buffer size
        sub sp, cx ; allocate 20 bytes
    

    The mov [di+20], DWORD 1 ; Request ACPI compat-entry pre-sets the extra flags field that isn't in the original 20 bytes, and corrupts the stack because only 20 bytes were allocated.

    The int 15h causes the value in cx (maximum buffer size) to be replaced with "size of data actually returned".

    The sub sp, cx after the end of that loop frees however many bytes of data that the int 0x15 returned, which may be completely different to the original maximum buffer size that was actually allocated; possibly corrupting the stack a second time (especially if you try to fix previous problems by replacing the mov ecx, 20 with a mov ecx, 24).

    Also note that there's 2 different ways that a BIOS can handle "end of list reached"; and returning ebx = 0 for the last entry in the list is only one possibility. The other possibility is that the BIOS return a non-zero value in ebx for the last entry, and then returns an error when you try to use that value to get the entry after the last entry. For this reason you can't just do jc BlInt10Failure.

    For detecting "end of loop reached" reliably; I'd recommend doing an initial int 0x15 to get the first entry that does do jc BlInt10Failure, followed by a loop to get any remaining entries that does jc BlpFindEiArea instead (in other words, "failure" after the first entry is treated as "end of list" and not treated as failure).

    Note that, if you do use an initial int 0x15 to get the first entry, then that can also determine if you're working with a BIOS that returns 20 byte structure or a 24 byte (or larger) structure; which means that you can have 2 separate loops where one doesn't bother pre-setting the extra flags field (because it knows it's not used) and the other doesn't bother pre-setting the extra flags field (because it knows it will be set). It can also be useful if your code is "very defensive" and checks that the data returned is sane (e.g. that the "type" field isn't an impossible value); and/or if you want to keep track of where the memory map came from (e.g. using some kind of enum that says "newer 0xE820", "older 0xE820", or any of about 8 other older alternatives that were used on BIOS, or UEFI).

    Once you have a list of entries you probably shouldn't trust it blindly. It's a good idea to check for unknown values in the "type" field (and replace them with a single "unknown/reserved" value) while sorting the list (so it's not in random order), while detecting if any of the areas reported overlap (and having code to handle "overlapping but reported as different types" cases by finding the least dangerous alternative "type" to use for each possibility), while discarding any entries that have "size = 0 bytes" (which can happen - e.g. BIOS using statically defined entry numbers and..). Note that different computers have different bugs, and in some cases int 0x15 will be "hooked" by something else (e.g. PXE/network boot ROMs typically redirect int 0x15 to its own code to hide the memory that the ROM itself is using).

    I also wouldn't trust that int 0x15, eax=0xE820 leaves various values unmodified. For example, I wouldn't assume that it doesn't modify the value in ebp or edx (even though it shouldn't), or that (f the buffer size is larger) it won't overwrite that value at [es:di+20] but still only return 20 bytes (even though it shouldn't), or that it won't return carry = clear, ah = error code because the function failed (even though it shouldn't).

    Finally; when you're searching through the resulting (nicely sorted and sanity checked) memory map; the "area address" and "area size" fields are 64-bit, so you can not just compare the lowest 32-bits (and you should not use jge when is for signed numbers - use jae for unsigned numbers instead). In other words, this:

     cmp [di+8], DWORD 100000h 
     jge BlpFindEiAreaEnd
    

    ..should probably be:

     cmp dword [di+8+4], 0          ;Is size >= 1 MiB?
     ja BlpFindEiAreaEnd            ; yes
     cmp dword [di+8], 0x00100000   ;Is size >= 1 MiB?
     jae BlpFindEiAreaEnd           ; yes
                                    ; no
    

    ..but I don't know why you were checking for "size >= 1 MiB" when you said you were looking for RAM at the highest usable address (and suspect that's another bug).