org 0x7c00
is the normal way to get correct absolute addresses in a flat binary, but I was curious about a different way which I expected to work.
I tried using section boot vstart=0x7c00 align=1
to tell YASM the right memory address, with symbol
in another section that uses start=300
.
mov [symbol+$$], register
yasm -fbin boot.asm
gives error: effective address too complex
on that line.
From my understanding, symbol+$$
should be able to be processed into a number (instead of a segment+offset), right? If I am wrong, please tell me, but if I am right then why does YASM tell me that the address is too complex?
Is there another way to use start=
and/or vstart=
instead of org
and still get correct absolute addressing?
Using [symbol]
doesn't work; that assembles to an absolute address of [0000]
The reason why I wanted to do this, is because I have binary machine code for a boot loader that relocates itself, but it stores a few values in some symbols before it relocates, (for example, the boot drive which is passed in dl
)
YASM supports a binary program with "sections" that can have different addressing offsets, So what I did was I had the code setup where the MBR was the first 300 bytes of the first sector, the variables were stored after the 300 bytes and before the 446th byte, I wanted to use this method so that I can use variables that are technically from other sections, but get copied relative to the current sections offset.
Here is a simplified example of what I am trying to do:
; example.asm
; yasm -fbin example.asm
%define virtual(_name, _offset) section _name vstart=_offset align=1
%define absolute(_name, _offset) section _name start=_offset align=1
virtual(boot, 0x7c00) ; Virtual Offset of 0x7c00 (in-file offset of 0)
start:
; This is just an example
; There isn't going to be much here.
mov [boot_drive+$$], dl
cli
hlt
absolute(vars, 300) ; Virtual AND in-file offset of 300
boot_drive db 0
Your basic problem is that you're not actually adding two numbers, you're adding two symbols, and assemblers don't generally allow this. This is because object file formats don't have any way to represent the addition of two symbols as a relocation, and that's because it doesn't really make much sense to add two symbols. While in this case you're generating a binary file which doesn't support relocations, and so the assembler could invent its own virtual relocations that handle this, apparently this hasn't been implemented in YASM as an exception to the general rule.
The reason why the addition of two symbols doesn't make sense in the general case, when object files may be generated, is that symbols are more than just numbers. They also refer to a section, and sections can end up living anywhere in memory. Your [bootdrive + $$]
expression is saying to take the actual address of bootdrive
as loaded in memory, and add it to the the actual address of the start of the current section. When generating object files an assembler will have no idea what these actual addresses will be, the sections the symbols belong to could be put anywhere. Even the linker may not know, if it's generating a relocatable executable, it will depend on where the operating system loads the executable.
(This ignores the fact that you've told the assembler that bootdrive
should be treated as having a different actual address than assembler would otherwise think it would have. This also something that your assembler doesn't support in the usual case of outputting an object file.)
Now, in the case of generating a binary file, there's no linker involved, so YASM could know that bootdrive
has an "actual" address of 300 and that $$
has an actual address of 0x7c00. But this would require that the assembler make an exception when evaluating effective addresses, one it would it have to propagate to the backend that generates binary files. That exception hasn't been implemented in your assembler, and you may have a hard time convincing the YASM (or NASM) developers to do so.
Your difficulty convincing them would come from the fact that even with binary files it doesn't really make sense to add two symbols, even if you could. Your example code would only work because the address of bootdrive
isn't its actual address. Indeed, the reason why you're adding $$
to it is to calculate its actual address. Since your example use case is contrived and unnecessary, there are better ways to write a bootloader that relocates itself, it doesn't make a good argument for why it can make sense to add two symbols.
As for a workaround, I can't really think of any direct solution that would still involve using bootloader
and $$
. When someone tries to add two symbols there's often a way it can be rewritten in a form that works, often by subtracting two symbols. Subtracting two symbols that are in the same section is supported by assemblers, as it removes the common section from the equation. So for example, [foo + bar_begin - bar_end]
could be written as [foo + (bar_begin - bar_end)]
. However I'm not sure what there is that you can subtract from bootloader
and $$
to remove either of their sections from the equation.
While I'm sure there's some other way of solving your problem that would still let you accomplish what you want using the section directives you're using, I'm not going to bother trying to figure out what that might be. Instead I'm going to suggest a workaround that you've said you don't want, if not for your own benefit then for the benefit of others that might come to this post in a similar situation.
My solution is to not use section directives to solve the problem of a bootsector living at two different address. Instead you can use an ORG that reflects where the majority of the code lives after being copied. The small amount of code that needs to be executed at the original location can easily be made position independent so it doesn't care what ORG is used.
The following is the framework of a self-relocating MBR boot block. Most of the code necessary for implementing an MBR has been left out for brevity.
BITS 16
RELOC_OFFSET EQU 0x600
ORG RELOC_OFFSET
start:
xor ax, ax
mov ds, ax
mov es, ax
mov ss, ax
mov sp, 0x7c00
mov di, RELOC_OFFSET
mov si, 0x7c00
mov cx, 512 / 2
cld
rep movsw
jmp 0:relocated_entry
relocated_entry:
mov [boot_drive], dl
; ...
mov dl, [boot_drive]
jmp 0:0x7c00
boot_drive DB 0
TIMES 446 - ($ - $$) DB 0
partition_table:
DB 0x80, 0x01, 0x00, 0x05, 0x17, 0x01, 0x03, 0x01, 0x04, 0x00, 0x00, 0x00, 0x04, 0x00, 0x00, 0x00
; ...
TIMES 510 - ($ - $$) DB 0
DB 0x55, 0xaa
The key thing here is that boot_drive
is only accessed after the code has been moved.
There's no need to save DL any earlier because the initial code doesn't need to change DL. Indeed it may be possible eliminate saving DL altogether as generally its not necessary to modify DL in an MBR bootsector. The TIMES directive is used to ensure that partition table and magic number are where they need be.
Here's the output of objdump -D -b binary -m i8086 -M intel --adjust-vma=0x600
:
600: 31 c0 xor ax,ax
602: 8e d8 mov ds,ax
604: 8e c0 mov es,ax
606: 8e d0 mov ss,ax
608: bc 00 7c mov sp,0x7c00
60b: bf 00 06 mov di,0x600
60e: be 00 7c mov si,0x7c00
611: b9 00 01 mov cx,0x100
614: fc cld
615: f3 a5 rep movs WORD PTR es:[di],WORD PTR ds:[si]
617: ea 1c 06 00 00 jmp 0x0:0x61c
61c: 88 16 29 06 mov BYTE PTR ds:0x629,dl
620: 8a 16 29 06 mov dl,BYTE PTR ds:0x629
624: ea 00 7c 00 00 jmp 0x0:0x7c00
...
7bd: 00 80 01 00 add BYTE PTR [bx+si+0x1],al
7c1: 05 17 01 add ax,0x117
7c4: 03 01 add ax,WORD PTR [bx+di]
7c6: 04 00 add al,0x0
7c8: 00 00 add BYTE PTR [bx+si],al
7ca: 04 00 add al,0x0
...
7fc: 00 00 add BYTE PTR [bx+si],al
7fe: 55 push bp
7ff: aa stos BYTE PTR es:[di],al