Search code examples
assemblyx86x86-16bootloaderreal-mode

call in x86 real mode does not save return address


I'm trying to write a real mode bootloader and I'm currently having problems trying to enable the A20 line. Here's my code so far, I'm assembling with NASM:

[bits 16]

[global _start]

jmp _start

bios_print:
 lodsb
 test al, al
 jz bios_print_done
 mov ah, 0x0E
 mov bh, 0
 int 0x10
 jmp bios_print

bios_print_done:
 ret

a20_is_enabled:
 push ds
 push si
 push es
 push di

 xor ax, ax
 mov ds, ax
 mov si, BOOT_ID_OFFS

 mov ax, BOOT_ID_OFFS_PLUS_1MB_SEGM
 mov es, ax
 mov di, BOOT_ID_OFFS_PLUS_1MB_OFFS

 cmp word [es:di], BOOT_ID

 mov ax, 1
 jne a20_is_enabled_done

 mov ax, word [ds:si]
 xor ax, ax
 mov [ds:si], ax

 cmp word [es:di], BOOT_ID

 push ax
 xor ax, ax
 mov [ds:si], ax
 pop ax

 mov ax, 1
 jne a20_is_enabled_done

 mov ax, 0

a20_is_enabled_done:
 pop di
 pos es
 pop si
 pop ds

 ret

a20_enable_bios:
 mov ax, 0x2403
 int 0x15
 jc a20_enable_bios_failure
 test ah, ah
 jnz a20_enable_bios_failure

 mov ax, 0x2401
 int 0x15
 jc a20_enable_bios_failure
 test ah, ah
 jnz a20_enable_bios_failure

 mov ax, 1
 jmp a20_enable_bios_done

a20_enable_bios_failure:
 mov ax, 0

a20_enable_bios_done:
 ret

a20_enable:

 push si

 mov si, word MSG_A20_TRY_BIOS
 call bios_print

 pop si

 call a20_enable_bios

 test ax, ax
 jz a20_enable_failure

 call a20_is_enabled

 test ax, ax
 jnz a20_enable_success

a20_enable_failure:

 push si

 mov si, word MSG_A20_FAILURE
 call bios_print

 pop si

 mov ax, 0
 jmp a20_enable_done

a20_enable_success:

 push si

 mov si, word MSG_A20_SUCCESS
 call bios_print

 pop si

 mov ax, 1

a20_enable_done:
 ret

_start:
 xor ax, ax
 mov ds, ax

 cld

 cli

 push si

 mov si, word MSG_GREETING
 call bios_print

 pop si

 call a20_enable

 test ax, ax
 jz boot_error

 ; TODO

boot_error:
 jmp boot_error

BOOT_ID equ 0xAA55
BOOT_ID_OFFS equ 0x7DFE
BOOT_ID_OFFS_PLUS_1MB_SEGM equ 0xFFFF
BOOT_ID_OFFS_PLUS_1MB_OFFS equ BOOT_ID_OFFS + (0x1 << 20) - (BOOT_ID_OFFS_PLUS_1MB_SEGM << 4)

MSG_GREETING db 'Hello from the bootloader', 0xA, 0xD, 0
MSG_A20_TRY_BIOS db 'Trying to enable A20 line via BIOS interrupt', 0xA, 0xD, 0
MSG_A20_SUCCESS db 'Successfully enabled A20 line', 0xA, 0xD, 0
MSG_A20_FAILURE db 'Failed to enable A20 line', 0xA, 0xD, 0

times 510-($-$$) db 0
dw BOOT_ID

The problem is the function a20_is_enabled which is supposed to check if the A20 line is enabled after a20_enable_bios has activated it via a BIOS interrupt (I know this is not foolproof, more code will follow here). When I debug the code everything seems to be fine until call a20_is_enabled. The processor does then indeed perform a near call to to correct address here but no return address is pushed onto the stack (which I have verified with gdb). So when ret is executed in a20_is_enabled, the instruction pointer is set to some garbage address. Why is this?

EDIT: note that there is not ORG 0x7C00 at the beginning of my assembly code. This is because I first create an elf file so that I can debug my code using gdb and that doesn't play well with ORG, So I actually do this:

nasm -f elf32 -g -F dwarf boot.asm -o boot.o
ld -Ttext=0x7c00 -melf_i386 boot.o -o boot.elf
objcopy -O binary boot.elf boot.bin

Solution

  • Normally one might close this question as it is caused by a typographical error but the error isn't necessarily obvious at first. One has to pay close attention in a debugger observing the instructions that are being executed.

    This had me scratching my head since when I looked in the debugger the sequence:

     push ds
     push si
     push es
     push di
    
     ; Snip other code
    
     pop di
     pos es
     pop si
     pop ds
     ret
    

    only showed the processor executing 3 POPs and a ret when there are clearly 4 POP instructions. Because the processor isn't doing enough POPs the return address is incorrect and ret returns to the wrong part of memory and causes unexpected behavior.

    The problem is rather trivial and because of a stroke of bad luck an instruction is produced without error but isn't the instruction you want. If you look closely this is the culprit:

     pos es
    

    There is a typo. POS should be POP. My brain didn't catch it at first. pos is being treated as a label and es is a segment override so can appear on a line by itself. This caused the instruction es pop si to be produced.

    Clearly the fix is to change it to:

     pop es