nasm x86-16 cpu-registers segment memory-segmentation

Setting segment registers after ORG instruction

I am currently following a tutorial on OS development, which includes a discussion on bootloaders.

My bootloader is currently in 16-bit real mode, therefore, I am able to use the BIOS interrupts provided (e.g. VGA video interrupts, etc.).

The BIOS provides the video interrupt 0x10 (i.e. video teletype output). The video interrupt has the function 0x0E, which allows me to print a character to the screen.

Here is this basic bootloader:

org     0x7c00              ; Set program start (origin) address location at 0x7c00.
                            ; This program is loaded by the BIOS at 0x7c00.
bits    16                  ; We live in 16-bit Real Mode.

start:  
        jmp loader

bootmsg     db      "Welcome to my Operating System!", 0        ; My data string.

;-------------------------------------------------------
;   Description:    Print a null terminating string
;-------------------------------------------------------
print:
    lodsb                   ; Load string byte at address DS:SI and place in AL.
                            ; Then, increment/decrement SI as defined by the Direction Flag (DF) in FLAGS.
    or      al, al          ; Set the zero flag - is AL zero?
    jz      printdone       ; Check if this is the null byte
    mov     ah, 0eh
    int     10h
    jmp     print
printdone:
    ret

loader:
    ;|---------- Related to my question ----------|
        xor     ax, ax
        mov     ds, ax
        mov     es, ax
    ;|--------------------------------------------|

    mov     si, bootmsg
    call    print

    cli                     ; Clears all interrupts.
    hlt                     ; Halts the system.

times 510 - ($-$$) db 0    ; Make sure our bootloader is 512 bytes large. 

dw      0xAA55              ; Boot signature - Byte 511 is 0xAA and Byte 512 is 0x55, indicated a bootable disk.1

As shown in the above code, I have highlighted the following three lines:

xor     ax, ax
mov     ds, ax
mov     es, ax

According to the original source, it says the following:

Setup segments to insure they are 0. Remember that we have ORG 0x7c00. This means all addresses are based from 0x7c00:0. Because the data segments are within the same code segment, null em.

I am a bit confused. From my understanding, the org instruction tells the loader to load this program at address 0x7c00. Why don't we take this as our start address then? Meaning, our two overlapping Data and Code segments are not located at a base address of zero. The base address should be 0x7c0. Why does the author set the base address to 0x0?

mov ax, 07c0h
mov dx, ax
mov es, ax

Solution

I have been looking into the org instruction more and other documentation and I understand what is going on.

According to the NASM documentation on the org directive, short for origin:

The function of the ORG directive is to specify the origin address which NASM will assume the program begins at when it is loaded into memory. [...] NASM's ORG does exactly what the directive says: origin. Its sole function is to specify one offset which is added to all internal address references within the section.

Therefore, the NASM compiler assumes that the program will be loaded at the address specified with the origin instruction (i.e. org). The BIOS does exactly this. According to the following, once the BIOS finds a valid boot sector that contains a valid boot signature, the bootloader will be "loaded into memory at 0x0000:0x7c00 (segment 0, address 0x7c00)."

From the quote above, when the NASM documentation says "internal address references," it is referring to all references to concrete memory regions that are being used in the code (e.g. referencing a label, etc.). For example, the line in the bootloader code above: mov si, bootmsg will resolve bootmsg to 0x07c00 + offset, where the offset is determined by the position of the first byte of my string bootmsg (i.e. 'W').

With my code above, if I disassembly the bin file using the ndisasm utility I see the following:

00000000  EB2C              jmp short 0x2e
00000002  57                
00000003  656C              
00000005  636F6D            
00000008  6520746F          
0000000C  206D79            
0000000F  204F70            
00000012  657261            
00000015  7469              
00000017  6E                
00000018  67205379          
0000001C  7374              
0000001E  656D              
00000020  2100              
00000022  AC                lodsb
00000023  08C0              or al,al
00000025  7406              jz 0x2d
00000027  B40E              mov ah,0xe
00000029  CD10              int 0x10
0000002B  EBF5              jmp short 0x22
0000002D  C3                ret
0000002E  31C0              xor ax,ax
00000030  8ED8              mov ds,ax
00000032  8EC0              mov es,ax
00000034  BE027C            mov si,0x7c02
00000037  E8E8FF            call 0x22
0000003A  FA                cli
0000003B  F4                hlt
00000...  ...               ...

(I removed the generated instructions from 0x00000002 to 0x00000020, because that is my bootmsg string and is representing data, not code).

As we can see from the output assembly, at the address 0x00000034, my bootmsg has been replaced with 0x7c02 (e.g. 0x7c00 + offset=0x02).

Michael Petch provided some very solid insight too. It is a common misconception to think the bootloader is loaded to 0x7c0:0x0000 (segment 0x07c0, offset 0). Although one could technically use this, it has been standardized to use the segment offset of zero instead (A good practice is to enforce CS:IP at the very start of your boot sector). As Michael has mentioned, if one wants more information, look at section 4 of the following guide on segment offset addressing.