So, I'm building a mini-OS using a PDF I found online. So far, I was making decent progress on the project, however the pdf has reached a concept it doesn't really explain well.
The reader is given four examples of storing a string at a register with the intention of printing it. The fourth example is where things become confusing. Using figure 3.6 below, we find out that the string is 30 bytes away from the offset at bit '1e'.
Now, I understand why we are at 0x7cXX as it is the offset of the boot sector. What I do NOT understand is how we know that it's last 2 bits are 1e.
What I think the answer is : Well, we store the offset 0x0e to the ah register, right? And the interrupt command is 0x10 which is the low order bit of ah. The thing is, that doesn't really explain why that makes such sweeping changes to the binary version of the program in figure 3.6, and I got that from my intuition rather than any salient logic.
Why do we know the string variable is at 0x7c1e?
mov ah , 0x0e ; int 10/ ah = 0eh -> scrolling teletype BIOS routine
mov al , [0x7c1e ]
int 0 x10 ; Does this print an X?
jmp $ ; Jump forever.
the_secret :
db "X "
; Padding and magic BIOS number.
times 510 -( $ - $$ ) db 0
dw 0 xaa55
Figure 3.6
```00 7 c 8a 07 cd 10 a0 1e 7c cd 10 e9 fd ff 58 00
```00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
```*
```00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 aa
Using the label is the proper way to do this, as suggested in the comments. You also will have to tell the assembler what your origin point (org
) is to have it emit the correct address. Besides, you should initialise the data segment (ds
) register because your memory load uses it implicitly.
I am assuming you're using NASM because the times
directive is a NASMism. To fix your code and have it use a label like you should:
org 7C00h
xor ax, ax ; set ax to zero (well-known zeroing idiom)
mov ds, ax ; set ds base to zero
mov ah, 0x0E ; int 10/ ah = 0eh -> scrolling teletype BIOS routine
mov al, byte [the_secret]
int 0x10 ; This prints an 'X'
jmp $ ; Jump forever.
the_secret:
db "X "
; Padding and magic BIOS number.
times 510 - ($ - $$) db 0
dw 0xAA55
Assembling this as the full source for NASM results in loading from the byte at address 7C0Dh. To get back to your question: How do we know that the last two nybbles (8 bits, 2 hexadecimal digits) are any particular value?
We can count the preceding instructions' sizes, which in this case is 2 bytes for every one of the 6 instructions, except 3 bytes for the load to al
. That is 2 * 5 + 3 = 13 = 0Dh. However, you would have to recompute the address whenever the preceding code is changed.
This is exactly a task for which we use an assembler, instead of just entering numeric values for everything. The numbers to enter would be machine code bytes for instructions and relative or absolute addresses for references. Using mnemonic instructions lets the assembler determine which machine codes to emit. Using a label lets the assembler compute the references into numerical addresses.
Only very limited assemblers do not allow symbolic references. If you find yourself using one of the 86-DOS debuggers it may limit you to explicit numeric addresses. But you shouldn't use these for nontrivial assembling.
In conclusion, use labels to let the assembler do its job.