I am currently following a tutorial on OS development, which includes a discussion on bootloaders.
My bootloader is currently in 16-bit real mode, therefore, I am able to use the BIOS interrupts provided (e.g. VGA video interrupts, etc.).
The BIOS provides the video interrupt 0x10
(i.e. video teletype output). The video interrupt has the function 0x0E
, which allows me to print a character to the screen.
Here is this basic bootloader:
org 0x7c00 ; Set program start (origin) address location at 0x7c00.
; This program is loaded by the BIOS at 0x7c00.
bits 16 ; We live in 16-bit Real Mode.
start:
jmp loader
bootmsg db "Welcome to my Operating System!", 0 ; My data string.
;-------------------------------------------------------
; Description: Print a null terminating string
;-------------------------------------------------------
print:
lodsb ; Load string byte at address DS:SI and place in AL.
; Then, increment/decrement SI as defined by the Direction Flag (DF) in FLAGS.
or al, al ; Set the zero flag - is AL zero?
jz printdone ; Check if this is the null byte
mov ah, 0eh
int 10h
jmp print
printdone:
ret
loader:
;|---------- Related to my question ----------|
xor ax, ax
mov ds, ax
mov es, ax
;|--------------------------------------------|
mov si, bootmsg
call print
cli ; Clears all interrupts.
hlt ; Halts the system.
times 510 - ($-$$) db 0 ; Make sure our bootloader is 512 bytes large.
dw 0xAA55 ; Boot signature - Byte 511 is 0xAA and Byte 512 is 0x55, indicated a bootable disk.1
As shown in the above code, I have highlighted the following three lines:
xor ax, ax
mov ds, ax
mov es, ax
According to the original source, it says the following:
Setup segments to insure they are 0. Remember that we have ORG 0x7c00. This means all addresses are based from 0x7c00:0. Because the data segments are within the same code segment, null em.
I am a bit confused. From my understanding, the org
instruction tells the loader to load this program at address 0x7c00
. Why don't we take this as our start address then? Meaning, our two overlapping Data and Code segments are not located at a base address of zero. The base address should be 0x7c0. Why does the author set the base address to 0x0?
mov ax, 07c0h
mov dx, ax
mov es, ax
I have been looking into the org
instruction more and other documentation and I understand what is going on.
According to the NASM documentation on the org
directive, short for origin:
The function of the ORG directive is to specify the origin address which NASM will assume the program begins at when it is loaded into memory. [...] NASM's ORG does exactly what the directive says: origin. Its sole function is to specify one offset which is added to all internal address references within the section.
Therefore, the NASM compiler assumes that the program will be loaded at the address specified with the origin instruction (i.e. org
). The BIOS does exactly this. According to the following, once the BIOS finds a valid boot sector that contains a valid boot signature, the bootloader will be "loaded into memory at 0x0000:0x7c00 (segment 0, address 0x7c00)."
From the quote above, when the NASM documentation says "internal address references," it is referring to all references to concrete memory regions that are being used in the code (e.g. referencing a label, etc.). For example, the line in the bootloader code above: mov si, bootmsg
will resolve bootmsg
to 0x07c00 + offset
, where the offset is determined by the position of the first byte of my string bootmsg
(i.e. 'W').
With my code above, if I disassembly the bin file using the ndisasm utility I see the following:
00000000 EB2C jmp short 0x2e
00000002 57
00000003 656C
00000005 636F6D
00000008 6520746F
0000000C 206D79
0000000F 204F70
00000012 657261
00000015 7469
00000017 6E
00000018 67205379
0000001C 7374
0000001E 656D
00000020 2100
00000022 AC lodsb
00000023 08C0 or al,al
00000025 7406 jz 0x2d
00000027 B40E mov ah,0xe
00000029 CD10 int 0x10
0000002B EBF5 jmp short 0x22
0000002D C3 ret
0000002E 31C0 xor ax,ax
00000030 8ED8 mov ds,ax
00000032 8EC0 mov es,ax
00000034 BE027C mov si,0x7c02
00000037 E8E8FF call 0x22
0000003A FA cli
0000003B F4 hlt
00000... ... ...
(I removed the generated instructions from 0x00000002 to 0x00000020, because that is my bootmsg
string and is representing data, not code).
As we can see from the output assembly, at the address 0x00000034, my bootmsg
has been replaced with 0x7c02 (e.g. 0x7c00 + offset=0x02).
Michael Petch provided some very solid insight too. It is a common misconception to think the bootloader is loaded to 0x7c0:0x0000 (segment 0x07c0, offset 0). Although one could technically use this, it has been standardized to use the segment offset of zero instead (A good practice is to enforce CS:IP at the very start of your boot sector). As Michael has mentioned, if one wants more information, look at section 4 of the following guide on segment offset addressing.