Search code examples
assemblyx86-16memory-addressbootloaderyasm

YASM [symbol+$$] Effective Address is Too Complex in a flat binary


org 0x7c00 is the normal way to get correct absolute addresses in a flat binary, but I was curious about a different way which I expected to work.

I tried using section boot vstart=0x7c00 align=1 to tell YASM the right memory address, with symbol in another section that uses start=300.

mov [symbol+$$], register

yasm -fbin boot.asm gives error: effective address too complex on that line.

From my understanding, symbol+$$ should be able to be processed into a number (instead of a segment+offset), right? If I am wrong, please tell me, but if I am right then why does YASM tell me that the address is too complex?

Is there another way to use start= and/or vstart= instead of org and still get correct absolute addressing?

Using [symbol] doesn't work; that assembles to an absolute address of [0000]


The reason why I wanted to do this, is because I have binary machine code for a boot loader that relocates itself, but it stores a few values in some symbols before it relocates, (for example, the boot drive which is passed in dl)

YASM supports a binary program with "sections" that can have different addressing offsets, So what I did was I had the code setup where the MBR was the first 300 bytes of the first sector, the variables were stored after the 300 bytes and before the 446th byte, I wanted to use this method so that I can use variables that are technically from other sections, but get copied relative to the current sections offset.

Here is a simplified example of what I am trying to do:

; example.asm
; yasm -fbin example.asm

%define virtual(_name, _offset) section _name vstart=_offset align=1
%define absolute(_name, _offset) section _name start=_offset align=1

virtual(boot, 0x7c00) ; Virtual Offset of 0x7c00 (in-file offset of 0)
start:
    ; This is just an example
    ; There isn't going to be much here.
    mov [boot_drive+$$], dl

    cli 
    hlt

absolute(vars, 300) ; Virtual AND in-file offset of 300

boot_drive db 0

Solution

  • Your basic problem is that you're not actually adding two numbers, you're adding two symbols, and assemblers don't generally allow this. This is because object file formats don't have any way to represent the addition of two symbols as a relocation, and that's because it doesn't really make much sense to add two symbols. While in this case you're generating a binary file which doesn't support relocations, and so the assembler could invent its own virtual relocations that handle this, apparently this hasn't been implemented in YASM as an exception to the general rule.

    Why assemblers don't allow adding symbols

    The reason why the addition of two symbols doesn't make sense in the general case, when object files may be generated, is that symbols are more than just numbers. They also refer to a section, and sections can end up living anywhere in memory. Your [bootdrive + $$] expression is saying to take the actual address of bootdrive as loaded in memory, and add it to the the actual address of the start of the current section. When generating object files an assembler will have no idea what these actual addresses will be, the sections the symbols belong to could be put anywhere. Even the linker may not know, if it's generating a relocatable executable, it will depend on where the operating system loads the executable.

    (This ignores the fact that you've told the assembler that bootdrive should be treated as having a different actual address than assembler would otherwise think it would have. This also something that your assembler doesn't support in the usual case of outputting an object file.)

    Binary files could be an exception, but aren't

    Now, in the case of generating a binary file, there's no linker involved, so YASM could know that bootdrive has an "actual" address of 300 and that $$ has an actual address of 0x7c00. But this would require that the assembler make an exception when evaluating effective addresses, one it would it have to propagate to the backend that generates binary files. That exception hasn't been implemented in your assembler, and you may have a hard time convincing the YASM (or NASM) developers to do so.

    Your difficulty convincing them would come from the fact that even with binary files it doesn't really make sense to add two symbols, even if you could. Your example code would only work because the address of bootdrive isn't its actual address. Indeed, the reason why you're adding $$ to it is to calculate its actual address. Since your example use case is contrived and unnecessary, there are better ways to write a bootloader that relocates itself, it doesn't make a good argument for why it can make sense to add two symbols.

    There's probably no direct workaround

    As for a workaround, I can't really think of any direct solution that would still involve using bootloader and $$. When someone tries to add two symbols there's often a way it can be rewritten in a form that works, often by subtracting two symbols. Subtracting two symbols that are in the same section is supported by assemblers, as it removes the common section from the equation. So for example, [foo + bar_begin - bar_end] could be written as [foo + (bar_begin - bar_end)]. However I'm not sure what there is that you can subtract from bootloader and $$ to remove either of their sections from the equation.

    While I'm sure there's some other way of solving your problem that would still let you accomplish what you want using the section directives you're using, I'm not going to bother trying to figure out what that might be. Instead I'm going to suggest a workaround that you've said you don't want, if not for your own benefit then for the benefit of others that might come to this post in a similar situation.

    My solution, even it's not what you want

    My solution is to not use section directives to solve the problem of a bootsector living at two different address. Instead you can use an ORG that reflects where the majority of the code lives after being copied. The small amount of code that needs to be executed at the original location can easily be made position independent so it doesn't care what ORG is used.

    The following is the framework of a self-relocating MBR boot block. Most of the code necessary for implementing an MBR has been left out for brevity.

        BITS    16
    
    RELOC_OFFSET EQU 0x600
    
        ORG RELOC_OFFSET
    
    start:
        xor ax, ax
        mov ds, ax
        mov es, ax
        mov ss, ax
        mov sp, 0x7c00
    
        mov di, RELOC_OFFSET
        mov si, 0x7c00
        mov cx, 512 / 2
        cld
        rep movsw
        jmp 0:relocated_entry
    
    relocated_entry:
        mov [boot_drive], dl
        ; ...
        mov dl, [boot_drive]
        jmp 0:0x7c00
    
    boot_drive DB   0
    
        TIMES   446 - ($ - $$) DB 0
    partition_table:
        DB  0x80, 0x01, 0x00, 0x05, 0x17, 0x01, 0x03, 0x01, 0x04, 0x00, 0x00, 0x00, 0x04, 0x00, 0x00, 0x00
        ; ...
    
        TIMES   510 - ($ - $$) DB 0
        DB  0x55, 0xaa
    

    The key thing here is that boot_drive is only accessed after the code has been moved. There's no need to save DL any earlier because the initial code doesn't need to change DL. Indeed it may be possible eliminate saving DL altogether as generally its not necessary to modify DL in an MBR bootsector. The TIMES directive is used to ensure that partition table and magic number are where they need be.

    Here's the output of objdump -D -b binary -m i8086 -M intel --adjust-vma=0x600:

     600:   31 c0                   xor    ax,ax
     602:   8e d8                   mov    ds,ax
     604:   8e c0                   mov    es,ax
     606:   8e d0                   mov    ss,ax
     608:   bc 00 7c                mov    sp,0x7c00
     60b:   bf 00 06                mov    di,0x600
     60e:   be 00 7c                mov    si,0x7c00
     611:   b9 00 01                mov    cx,0x100
     614:   fc                      cld    
     615:   f3 a5                   rep movs WORD PTR es:[di],WORD PTR ds:[si]
     617:   ea 1c 06 00 00          jmp    0x0:0x61c
     61c:   88 16 29 06             mov    BYTE PTR ds:0x629,dl
     620:   8a 16 29 06             mov    dl,BYTE PTR ds:0x629
     624:   ea 00 7c 00 00          jmp    0x0:0x7c00
        ...
     7bd:   00 80 01 00             add    BYTE PTR [bx+si+0x1],al
     7c1:   05 17 01                add    ax,0x117
     7c4:   03 01                   add    ax,WORD PTR [bx+di]
     7c6:   04 00                   add    al,0x0
     7c8:   00 00                   add    BYTE PTR [bx+si],al
     7ca:   04 00                   add    al,0x0
        ...
     7fc:   00 00                   add    BYTE PTR [bx+si],al
     7fe:   55                      push   bp
     7ff:   aa                      stos   BYTE PTR es:[di],al