Search code examples
armbare-metallinker-scripts

ARM microcontroller loading raw binary


I am learning how bare metal programming on ARM works and I am having difficulties understanding how the addresses defined in the linker script are used.

This is my linker script:

ENTRY(ResetHandler)

MEMORY
{
    ROM (rx) : ORIGIN = 0x08000000, LENGTH = 512K
    RAM (rwx) : ORIGIN = 0x20000000, LENGTH = 128K
}

SECTIONS
{
    .text :
    {
        KEEP(*(.isr_vector))
        *(.text)
        *(.text.*)
        *(.rodata)
        *(.rodata.*)
        . = ALIGN(4);
        _etext = .;
    }>ROM AT>ROM

    .data :
    {
        _sdata = .;
        *(.data)
        *(.data.*)
        . = ALIGN(4);
        _edata = .;
    }>RAM AT>ROM

    .bss (NOLOAD) :
    {
        _sbss = .;
        *(.bss)
        *(.bss.*)
        *(COMMON)
        . = ALIGN(4);
        _ebss = .;
    }>RAM AT>ROM
}

The addresses of all sections are described in the linker file. What I don't understand is that my final compilation result is a raw binary containing only code and data, no addresses. When this binary file is loaded, how the sections are positioned to the correct addresses defined in the linker file when I do not specify anything but the binary file during loading? All the information about the LMA and VMA from the linker script is lost. Is all this performed by a bootloader?


Solution

  • So how this works is...No there is no bootloader, you have to conform to hardware/logic. To use an elf version of the binary (they are all considered binary files) you would need some software to parse that file just like when you run a program on a command or gui on an operating system.

    Starting with this

    .thumb
    
    .word 0x20000000
    .word reset
    
    .thumb_func
    reset:
        b .
    
    .align
    .word somedata
    
    
    .section .data
    somedata: .word 0x12345678
    

    Something minimal for demonstration purposes. You can disassemble the object to see what kind of data we are looking for in the binary.

    Disassembly of section .text:
    
    00000000 <reset-0x8>:
       0:   20000000
       4:   00000000
    
    00000008 <reset>:
       8:   e7fe        b.n 8 <reset>
       a:   46c0        nop         ; (mov r8, r8)
       c:   00000000
    
    Disassembly of section .data:
    
    00000000 <somedata>:
       0:   12345678
    

    The addresses are all zero based because it is not yet linked.

    Starting with this linker script

    MEMORY
    {
        rom : ORIGIN = 0x00001000, LENGTH = 0x100
        ram : ORIGIN = 0x00002000, LENGTH = 0x100
    }
    SECTIONS
    {
        .rom : { *(.text) } > rom
        .ram : { *(.data) } > ram
    }
    

    And linking gives

    Disassembly of section .rom:
    
    00001000 <reset-0x8>:
        1000:   20000000 
        1004:   00001009 
    
    00001008 <reset>:
        1008:   e7fe        b.n 1008 <reset>
        100a:   46c0        nop         ; (mov r8, r8)
        100c:   00002000 
    
    Disassembly of section .ram:
    
    00002000 <somedata>:
        2000:   12345678 
    

    I am not using real addresses, no need to fill up my hard drive...This easily demonstrates how it works.

    To prepare the data for the flash on the mcu you use objcopy with -O binary. What that does is look at the loadable sections of the elf file and starts with the lowest address and then pads the file so that other sections land in the right place. This form of binary is a memory image if you will, but the base address, nor any other information is in the file format. The user has to know.

    hexdump -C so.bin
    00000000  00 00 00 20 09 10 00 00  fe e7 c0 46 00 20 00 00  |... .......F. ..|
    00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
    *
    00001000  78 56 34 12                                       |xV4.|
    00001004
    

    So this is showing us that this binary file starts with the bytes that we defined at our lowest address 0x1000 and then it pads the file so that the bytes we wanted at 0x2000 are 0x1000 bytes into the file. The file is 0x1004 bytes.

    rom : ORIGIN = 0x08000000, LENGTH = 0x800
    ram : ORIGIN = 0x20000000, LENGTH = 0x800
    

    Your mcu does not have 0x20000000-0x08000000+plus your data amount of flash in the device, and if it were flash then your code would not work. Your read-write sram is at 0x20000000 and flash is at 0x08000000 and is some size.

    So what has to happen is you have to first link everything based on the final address definition, then you pack all that data together into something that lands in the flash.

    MEMORY
    {
        rom : ORIGIN = 0x00001000, LENGTH = 0x100
        ram : ORIGIN = 0x00002000, LENGTH = 0x100
    }
    SECTIONS
    {
        .rom : { *(.text) } > rom
        .ram : { *(.data) } > ram AT >rom
    }
    

    The AT >rom (has to be capitalized for some reason, anyway) does this. It says I want you to link the .data for the ram address space definition but I want you to load it based on the rom address space definition.

    Now we get this

    00000000  00 00 00 20 09 10 00 00  fe e7 c0 46 00 20 00 00  |... .......F. ..|
    00000010  78 56 34 12                                       |xV4.|
    00000014
    

    It is up to you the programmer to take the .data and copy it to ram before you start executing compiled code, before your C entry point (main() usually but using the word main can/does cause extra, unused, stuff in your binary consuming precious resources, you do not have to use main, any valid function name will do)

    How do you know how much and where? Let the tools do the work for you.

    .thumb
    
    .word 0x20000000
    .word reset
    
    .thumb_func
    reset:
        b .
    
    .align
    stuff:
    .word somedata
    .word __one__
    .word __two__
    .word __three__
    .word __four__
    
    .section .data
    somedata: .word 0x12345678
    

    and

    MEMORY
    {
        rom : ORIGIN = 0x08000000, LENGTH = 0x100
        ram : ORIGIN = 0x20000000, LENGTH = 0x100
    }
    SECTIONS
    {
        .rom : { *(.text) } > rom
        __one__ = .;
        .ram : 
        { 
            __two__ = .;
            *(.data) 
            . = ALIGN(4);
            __three__ = .;
        } > ram AT >rom
        __four__ = .;
    
    }
    

    Which is already pretty ugly (why I do not use .data in my code nor expect .bss to be zero, makes it all nice and pretty and works just as well).

    Disassembly of section .rom:
    
    08000000 <reset-0x8>:
     8000000:   20000000
     8000004:   08000009
    
    08000008 <reset>:
     8000008:   e7fe        b.n 8000008 <reset>
     800000a:   46c0        nop         ; (mov r8, r8)
    
    0800000c <stuff>:
     800000c:   20000000
     8000010:   08000020
     8000014:   20000000
     8000018:   20000004
     800001c:   20000004
    
    Disassembly of section .ram:
    
    20000000 <__two__>:
    20000000:   12345678
    

    and

    my so.bin file is 36 bytes (note I changed to real stm32f103 addresses).

    and

    00000000  00 00 00 20 09 00 00 08  fe e7 c0 46 00 00 00 20  |... .......F... |
    00000010  20 00 00 08 00 00 00 20  04 00 00 20 04 00 00 20  | ...... ... ... |
    00000020  78 56 34 12                                       |xV4.|
    00000024
    

    You can create labels/addresses (one, etc) in the linker script that can be used/seen in the code. The tools do the work for you.

    8000010: 08000020 .word one 8000014: 20000000 .word two 8000018: 20000004 .word three 800001c: 20000004 .word four

    So you can take three minus two and get the number of bytes to copy or you can have your loop start at two while less than three when it copies from one. Something like this perhaps

    ldr r0,=__one__
    ldr r1,=__two__
    ldr r2.=__three__
    datacopy:
        ldr r3,[r1]
        str r3,[r2]
        add r0,#4
        add r1,#4
        cmp r1,r2
        bne datacopy
    

    (if you align by 8 on both ends you can do an stm of two words, or 16, four words, making the copy faster).

    Yes this is correct

     8000004:   08000009
    

    It is the address of the reset vector ORRED (do not think add but think orr) with one. The lsbit has to be set so 0x08000008|1 = 0x08000009. Before executing a binary when building a new setup as I have done above, before you go and try to run it, check the vector table. It will crash if you do not have the vector table done right and depending on your board layout and chip, that might be a bricked board. Or at least more wires or solder to unbrick it.

    This lsbit thing is in the arm docs which you should have before starting any of this work. The same docs indicate an entry point but technically if you follow it arm does not document the entry point as 0x00000000 (default VTOR) it is actually defined by signals/straps on the edge of arms logic within the chip vendors logic (st in this case), and they do not have to use 0x00000000. But they might have angry customers if they do not...But wait they did not right?

    0x08000000 is not 0x00000000. This is not the case actually. Search for boot0 in the st documentation that is also required before doing any of this work (the REFERECE manual, not the programmers manual, from st you need the datasheet and the reference manual, and if a board from them like a nucleo then the users manual for that board and in general no other docs from st. From arm you need, in this case, the cortex-m3 technical reference manual (ARM TRM) and in that you see armv7-m so you need the armv7-m architectural reference manual (ARM ARM)(there are many trms and arms from arm), you DO NOT need their programmers reference manual, it only makes your understanding worse not better).

    You find that the st documentation says that boot0 and boot1 pin combinations on reset will determine what address 0x00000000 is aliased to. For normal operation it points at the application flash which is at 0x08000000. So the logic reads 0x00000000 and 0x00000004 st aliases that to 0x08000000 and finds

    08000000 <reset-0x8>:
     8000000:   20000000
     8000004:   08000009
    

    Writing 0x20000000 into the stack pointer and then fetching its first instruction from 0x08000008. By linking to 0x08000000 and not 0x00000000 (which will technically work...on some boards) you now are not aliased from the start you are in the correct address space. The size of the aliased window is not as big as some of the flashes on some of the parts, so you would run into a problem. Some of the newer nucleo boards debugger firmware will no longer load binaries if it sees the second word being zero based, it will declare an error.

    Selecting another boot0/boot1 combination will for example as documented boot the internal factory bootloader, with which, on this part, you can use the uart and their uart protocol to program the flash, then change the boot0/boot1 inputs and then reset to boot into your firmware. You can also choose a boot0/boot1 combination that boots from sram, so you link and place your program at 0x20000000 (with vector table) and then over an swd debugger download your program to sram then do a reset and it will boot your program from sram. Yes of course you could just load the program into sram and start it from the debugger and not have a vector table, etc, but they chose to add this feature. Some of their parts, with usb, have the ability to download over usb when the internal debugger is selected, while the F103 part has usb, it does not have this feature. Where you see usb download capability for a blue pill, is because it has a bootloader in the application flash, and if you use the arduino sandbox and blue pill based infrastructure your application will contain this bootloader. Some blue pills ship without the right loader and you have to use uart or swd. Some do and you can jump right in with the Arduino sandbox. Not sure if you have a blue pill and if for the moment you are using the sandbox or are asking this because you want to leave the sandbox


    Short answer. Yes all the address information is lost for the -O binary binary file. But that is fine. A properly built binary is one chunk destined for the flash address space, per the rules of the logic (vector table up front). And per the hardware spec for THIS chip, that image loads/lives in address space starting at 0x08000000 so the first word in your binary is the first word at 0x08000000. The linker script which is married to the bootstrap (an inseparable pair) takes the work the toolchain did for you so that your bootstrap can copy .data from flash to ram and zero .bss before you call your entry point.

    If you were making a binary file for an operating system, that operating system has rules about the construction of that binary and you would need to conform to that (hello_world.c for Linux and Windows are not the same rules, nor I assume linux and bsd). For an mcu you have to conform to that specific chips rules (note within stm32 so far all support 0x08000000 but some you want to use 0x00200000 and not 0x08000000. cortex-m7 based ones for example)

    Your linker script does not "need" an entry point as there is no operating system to observe that entry point, you the programmer have to work with the linker to put the "entry" point which from hardware is the vector table not the reset vector. Now saying that if you are using link time optimization to remove unused code/data from the binary (not a bad idea for resource restricted mcu work), then that link time optimization works by following the code path and it needs a starting point, it may not even complain if you do not have an ENTRY it will instead, create an empty binary (your objdump will look good but the -O binary will be empty or will fail). If you have the linker display what it removed (very good idea to add that, it should have been default when lto was selected) then you will see everything get removed including your reset vector, making you realize you did something wrong. And if you want to run your elf file on qemu arm then what we learned here on so is that you also need an entry point, but not sure if it is the reset vector or the vector table, basically it needs to be an address orred with one to tell qemu if this is an arm entry or cortex-m (even though you specified it on the command line!). That is how they designed it...sigh...Otherwise you do not need an entry point.

    Also BTW you do not need _start (and already mentioned you do not need main()). Where _start and main come in is from a default linker script if you do not supply your own, the default linker script associated with your toolchain (and/or C library) has ENTRY(_start), you can easily grep for this. And the C library bootstrap that goes with this linker script, has a call to main(), you do not want to use the toolchain bootstrap anyway. You often see _start because folks don't understand why it is there. I do it as well as more of a habit even though it is not used. You can see in my output above

    08000000 <reset-0x8>:
     8000000:   20000000    andcs   r0, r0, r0
     8000004:   08000009    stmdaeq r0, {r0, r3}
    

    Having _start there or some label would have given us a label for the table rather than math related to some other label nearby so _start or whatever else works to make that a cleaner read.

    You do not need an isr_vector or any special section for the vector table, elementary understanding of the tools is all that is required. My stm32 linker scripts for my projects look like this

    MEMORY
    {
        rom : ORIGIN = 0x08000000, LENGTH = 0x1000
        ram : ORIGIN = 0x20000000, LENGTH = 0x1000
    }
    SECTIONS
    {
        .text   : { *(.text*)   } > rom
        .rodata : { *(.rodata*) } > rom
        .bss    : { *(.bss*)    } > ram
    }
    

    And I use global variables (of course, this is a resource restricted embedded platform!) and it all works great. Not complex stuff. I find the (rwx) stuff creates more problems than solutions so I very much avoid that.

    IMO less is more. Most folks like to solve all their worldly problems in a crazy (note: toolchain specific) linker script which is delicate and complicated. I will never understand that mentality.

    Summary

    The addressing is lost, yes this is correct. A properly built binary is meant to load into the application flash on the part and be prepared to work with the chip logic. (vector table up front basically and linked properly). The -O binary output of objcopy starts with the lowest loadable address defined, 0x08000000 in your case and then padded as necessary (properly built would not have padding other than whatever alignment the tool did or you asked for) to line everything up.

    So the addressing information from the elf file is lost. But both the programmer and the logic know the rules for that platform. And the .data and .bss knowledge are put in the image/code by the programmer, ideally with the assistance of the toolchain. Basically the other address spaces ARE in the file, but as part of the code not as part of a binary file format.

    (Even if you take someone else's code or you play in someone's sandbox, you are still the programmer responsible for making sure the binary is generated properly for this target).