Search code examples
assemblyarmlinker-scripts

arm gcc Linker and .word


I am trying to understand my startup file and linker file. Since I don't really know assembly and its quirks (know some basics honestly) I am stuck as a simple .word directive.

My assembly line:

.word _sdata

And then in linker I have

//inside of .data section
. = ALIGN(4);
_sdata = .;

i figured out what my linker does more or less. still kinda wondering about align part and why if i dont call it _sdata is created in the right position but relates to my text section.

Anyway, the question is what exactly does this .word do. i know it refers to the _sdata because later in my startup file it is used like ldr r1, =_sdata. basically i want to know in details what .word _sdata does


Solution

  • You are confusing assembly language with a toolchain specific linker script.

    .word simply means place a value in the program at this location.

    It is not an instruction it is a directive but is part of the assembly language for that assembler. Assembly language is defined by the assembler, the specific tool not the target processor architecture nor some spec. There are many x86 assembly languages with AT&T vs Intel not being a factor in the number of them. ARM, MIPS, etc also have many different, usually incompatible, assembly languages. Most if it is in the directive syntax and labels and comments and other items like that. Sometimes the instructions.

    .globl _start
    _start:
        ldr r0,next_add
        bx r0
    next:
        bx lr
    
    .word 1,2,3
    next_add:
        .word next
    .word 0x12345678
    

    Assemble and link and disassemble:

    Disassembly of section .text:
    
    08000000 <_start>:
     8000000:   e59f0010    ldr r0, [pc, #16]   ; 8000018 <next_add>
     8000004:   e12fff10    bx  r0
    
    08000008 <next>:
     8000008:   e12fff1e    bx  lr
     800000c:   00000001    andeq   r0, r0, r1
     8000010:   00000002    andeq   r0, r0, r2
     8000014:   00000003    andeq   r0, r0, r3
    
    08000018 <next_add>:
     8000018:   08000008    stmdaeq r0, {r3}
     800001c:   12345678    eorsne  r5, r4, #120, 12    ; 0x7800000
    

    I used a disassembler to see what happened so ignore the disassembly starting on line 800000c for those it is the data we are after, the 32 bit number, these are the items we asked to be placed in the program using the .word directive.

    And an example of why you might want to do something like that you might want the address of some label that the linker will fill in later, you don't have to hand count instructions or bytes to figure that out for yourself, let the tools do the job.

    The real question I suspect is based on linker scripts and this also looks to be gnu tools

    so.s

    .text
    .globl _start
    _start:
        bx lr
    
    .data
    .word _tdata
    .word _pdata
    .word _sdata
    

    so.ld

    MEMORY
    {
        rom : ORIGIN = 0x08000000, LENGTH = 0x1000
        ram : ORIGIN = 0x20000000, LENGTH = 0x1000
    }
    
    SECTIONS
    {
        .text : { *(.text*) } > rom
        .data :
        {
            _tdata = .;
            *(.data*)
            _pdata = .;
            . = ALIGN(8);
            _sdata = .;
        } > ram
    }
    

    assemble, link, disassemble

    Disassembly of section .text:
    
    08000000 <_start>:
     8000000:   e12fff1e    bx  lr
    
    Disassembly of section .data:
    
    20000000 <_tdata>:
    20000000:   20000000    andcs   r0, r0, r0
    20000004:   2000000c    andcs   r0, r0, r12
    20000008:   20000010    andcs   r0, r0, r0, lsl r0
    

    This time I used .word in the .data section not .text. it can go anywhere it is one of the handful of ways of placing bits of information in the program where you want those bits.

    Same here all of the disassembly in the .data section is to be ignored its data not instructions, the disassembler is just trying to do its job because it doesn't know data from instructions.

    What lines like _sdata = .; mean in the linker script is the linker is creating a variable and . means the current location so I am creating a variable that the linker will fill in with the value of the address within the program at that location in the memory map definition called a linker script.

    You can see I placed a number of them in there.

    .data :
    {
        _tdata = .;
        *(.data*)
        _pdata = .;
        . = ALIGN(8);
        _sdata = .;
    

    _tdata should be set to the address of the start of .data which I have defined here as 0x20000000 (the first item in the script using the ram address space). But like a label in assembly language, this is just a value it does not allocate space for this item, internally in the tool it has a table with a name and value, like a label that value is something we can ask for in the code.

    starting at 0x20000000 we want the .data items to be placed, and so the three .words I have requested will go there at 0x20000000 0x20000004 and 0x20000008.

    The first data item I asked for is the label/address _tdata which we know is going to be the start of .data or 0x20000000. _pdata is the address after the .data items are placed so 0x2000000c would be that address. And we see the linker generating that for us. Because these are .words and already aligned and the tool generally aligns on a word boundary anyway for this target, I changed it to ALIGN(8).

    Now the . is on the left and that is saying I want you to make the current address equal to the thing on the right so . = ALIGN(8); means take the current address find the next address that has that alignment (128 bit, 8 byte) even if it is the current address. and change the address pointer to that value. The next line after that is assign this label/variable the value of the address pointer.

    So after 0x2000000c the ALIGN(8) caused the address to change to 0x20000010 then _sdata = . caused _sdata the variable/label to equal 0x20000010, and the linker sees that someone asked for that global label/variable and it placed it to clean up the linking job.

    It is quite common for linker scripts to have these kinds of things you will often see a variable/label placed before and after a section so that some code can know where that section starts and how big, for example a C programmer expects the .bss data to be zeroed so one very common way is for the bootstrap to zero that memory but to know where is the code asks the linker by creating these variables and then using them in the program sometimes you will see bss_size = bss_end - bss_start; in the linker script with the other two being before and after .bss. you will see .ALIGNs used so that the code that zeros memory can make assumptions about alignment and make a simpler/faster fill routine (no you don't use memset() that doesn't make sense you cant use C until you bootstrap it and you cant bootstrap it using a C function that cant be used until you bootstrap C).

    As I have demonstrated here it is relatively easy to use the tools particularly gnu, to see what is going on. Might be easier not to mess with the linker script nuances like this until you have a better handle on the language (either one) and you don't need to run any code so you don't even need a functioning program. Just use the tools and examine the outputs.

    You don't need a linker script initially you can

    arm-none-gnueabi-as so.s -o so.o
    arm-none-gnueabi-ld -Ttext=0x1000 -Tdata=0x2000 so.o -o so.elf
    arm-none-gnueabi-objdump -D so.elf
    

    Then later complicate things with a linker script, start simple and work your way up if desired. Most linker scripts you will find in the wild are overly and unnecessarily complicated. The places where you are likely messing with linker scripts yourself don't need all that mess.