I am learning assembly for fun and its just my 3rd day today. Perhaps I mis-understood the location counter in linker script. As per my understanding, location counter defines at which address in memory, the sections must be loaded in memory (physical or virtual).
However, the following linker script taken from this SO post seems to alter the resulting image (to put the magic number in last 2 bytes of resulting MBR image).
link.ld
SECTIONS
{
/* The BIOS loads the code from the disk to this location.
* We must tell that to the linker so that it can properly
* calculate the addresses of symbols we might jump to.
*/
. = 0x7c00;
.text :
{
__start = .;
*(.text)
/* Place the magic boot bytes at the end of the first 512 sector. */
. = 0x1FE;
SHORT(0xAA55)
}
}
My code is:
main.S
.code16
mov $msg, %si
mov $0x0e, %ah
loop:
lodsb
or %al, %al
jz halt
int $0x10
jmp loop
halt:
hlt
msg:
.asciz "hello world"
I assemble and link with:
as -g -o main.o main.S
ld --oformat binary -o main.img -T link.ld main.o
qemu-system-x86_64 -hda main.img
Sooner I realized that the option --oformat binary
has to do something with this, as excluding this does not create 512 byte image. Maybe I should be looking for ELF vs binary format? Can someone please help me understand why binary
format was used, how it interprets location counter (as it should have done something with . = 0x7C00
as well)?
Hexdump of resulting 512 byte hello world image gives me this :
00000000 bf 0f 7c b4 0e ac 08 c0 74 04 cd 10 eb f7 f4 68 |..|.....t......h| 00000010 65 6c 6c 6f 20 77 6f 72 6c 64 00 66 2e 0f 1f 84 |ello world.f....| 00000020 00 00 00 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 |.....f.........f| 00000030 2e 0f 1f 84 00 00 00 00 00 66 2e 0f 1f 84 00 00 |.........f......| 00000040 00 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 2e 0f |...f.........f..| 00000050 1f 84 00 00 00 00 00 66 2e 0f 1f 84 00 00 00 00 |.......f........| 00000060 00 66 2e 0f 1f 84 00 00 00 00 00 66 2e 0f 1f 84 |.f.........f....| 00000070 00 00 00 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 |.....f.........f| 00000080 2e 0f 1f 84 00 00 00 00 00 66 2e 0f 1f 84 00 00 |.........f......| 00000090 00 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 2e 0f |...f.........f..| 000000a0 1f 84 00 00 00 00 00 66 2e 0f 1f 84 00 00 00 00 |.......f........| 000000b0 00 66 2e 0f 1f 84 00 00 00 00 00 66 2e 0f 1f 84 |.f.........f....| 000000c0 00 00 00 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 |.....f.........f| 000000d0 2e 0f 1f 84 00 00 00 00 00 66 2e 0f 1f 84 00 00 |.........f......| 000000e0 00 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 2e 0f |...f.........f..| 000000f0 1f 84 00 00 00 00 00 66 2e 0f 1f 84 00 00 00 00 |.......f........| 00000100 00 66 2e 0f 1f 84 00 00 00 00 00 66 2e 0f 1f 84 |.f.........f....| 00000110 00 00 00 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 |.....f.........f| 00000120 2e 0f 1f 84 00 00 00 00 00 66 2e 0f 1f 84 00 00 |.........f......| 00000130 00 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 2e 0f |...f.........f..| 00000140 1f 84 00 00 00 00 00 66 2e 0f 1f 84 00 00 00 00 |.......f........| 00000150 00 66 2e 0f 1f 84 00 00 00 00 00 66 2e 0f 1f 84 |.f.........f....| 00000160 00 00 00 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 |.....f.........f| 00000170 2e 0f 1f 84 00 00 00 00 00 66 2e 0f 1f 84 00 00 |.........f......| 00000180 00 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 2e 0f |...f.........f..| 00000190 1f 84 00 00 00 00 00 66 2e 0f 1f 84 00 00 00 00 |.......f........| 000001a0 00 66 2e 0f 1f 84 00 00 00 00 00 66 2e 0f 1f 84 |.f.........f....| 000001b0 00 00 00 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 |.....f.........f| 000001c0 2e 0f 1f 84 00 00 00 00 00 66 2e 0f 1f 84 00 00 |.........f......| 000001d0 00 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 2e 0f |...f.........f..| 000001e0 1f 84 00 00 00 00 00 66 2e 0f 1f 84 00 00 00 00 |.......f........| 000001f0 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 55 aa |.f............U.| 00000200
I don't understand the impact of . = 0x7C00
here? Is that info lost? (maybe not needed because the GRUB would anyway load it at 0x7C00)
. = 0x7c00;
.text :
{
__start = .;
*(.text)
/* Place the magic boot bytes at the end of the first 512 sector. */
. = 0x1FE;
SHORT(0xAA55)
}
0x7C00 you are telling the linker (this is not assembly language BTW, not related). that I want the next thing to be at address 0x7C00 in the address space (for the processor). with .text below it that means we want the .text code to be linked starting at address 0x7C00. So if there is anything position specific then it would be based off that address.
__start give me the address as of this point (within .text)
*(.text) put all the .text code here
. = 0x1FE move the pointer to 0x1FE within .text
SHORT(0xAA55) place these two bytes here at offset 0x1Fe and 0x1FF in .text
So assuming the code fits then this makes a 0x200 byte blob that is to be loaded at 0x7C00 in address space.
Now when you objcopy -O binary hello.elf hello.bin
the tool is going to look for the first loadable thing and the first portion of the output file is that first loadable thing. If this is the only thing you have in the "binary" then the 0x200 bytes will go to the file hello.bin.
The information that tells you that 0x7C00 is where this needs to be found by the processor, is lost in that -O binary file format. the elf had it others have it but that one doesn't.
Further if you had this 0x200 bytes at 0x7C00 and you had another 2 bytes at 0x8000 then the -O binary output would be 0x402 bytes long. The first 0x200 bytes would come from .text at 0x7C00 the lowest loadable thing, then 0x200 bytes of padding so that the next to bytes relative to the beginning of the file are in the right place, if you were to take hello.bin and put at 0x7c00 then those two bytes would be at 0x8000.
If you had these 0x200 at 0x7C00 and were to add another item to the linker script with 0x02 bytes at 0x7000 then hello.bin would start with those two bytes there would be 0xBFE bytes of padding then the 0x200 bytes of .text. so that when the bin file was loaded into memory at 0x7000 the two bytes and the 0x200 bytes are at the proper place.
So objcopy -O binary creates essentially a memory image of what needs to be loaded, sometimes with padding, but without information as to what the starting address is for that load. That you have to just know.
The elf file will contain the 0xAA55 as well in some form, I would assume the whole 0x200 bytes is one thing shown in .text, but perhaps it broke it into two items. Depends on the tool that created the elf as to which way and what the padding is.