Search code examples
assemblygnuarmv7binutils

GNU as: how to load a .bss/.data symbol to a register?


My problem is very basic. I'm making my first bare-metal program in assembler. The architecture is ARMv7-M and I'm using GNU as and I'm writing in UAL.

I have a variable in .bss (or .data, doesn't matter) declared as follows:

.lcomm a_variable, 4

Then I want to read its value somewhere in the program. For that I first load its address into a register and then load the value of the variable itself into another register:

adr     r0, a_variable
ldr     r1, [r0, #0]

So far so good. The compiled object contains my a_variable symbol:

00000000 b a_variable

And the generated instructions look like this:

0:  f2af 0004   subw    r0, pc, #4
4:  6801        ldr     r1, [r0, #0]

The problem begins when I want to link the object into resulting image. ld relocates a_variable symbol into the final .bss section at a new address:

20001074 b a_variable

But the final code remains the same and the program really tries to read a_variable from address 0x0 but not from 0x20001074.

I expect that ld somehow substitutes the new address because it seems to do so when you link objects compiled by GCC. I mean if I write a piece of C code doing something similar:

static int a_variable;
void foo(void)
{
    a_variable = 5;
}

...then I get the following instructions in the object file:

0:  f240 0300   movw    r3, #0
4:  f2c0 0300   movt    r3, #0
8:  2005        movs    r0, #5
a:  6018        str r0, [r3, #0]

...but the final image looks like this:

800c:       f242 338c       movw    r3, #9100       ; 0x238c
8010:       f2c0 0301       movt    r3, #1
8014:       2005            movs    r0, #5
8016:       6018            str     r0, [r3, #0]

So ld appears to have substituted the real address for the placeholder which as left.

My question is why doesn't this work in case of hand-written assembler code? What do I miss?


Solution

  • The ADR instruction only works when used with a nearby symbol (+/- 4095 in Thumb2 mode) defined in the same section and source file. The GNU assembler should have given an error for referencing the symbol in a different section. In ARM mode your code generates a Error: symbol .bss is in a different section error, but there's apparently a bug in how GAS handles the ADR instruction in Thumb mode that causes it silently accept it.

    Instead you can either use the LDR or MOVW/MOVT instructions to load an arbitrary 32-bit constant, including addresses, into a register. The LDR instruction will put the address into a constant pool and load it from there, while the MOVW/MOVT instructions form the constant in two step, just like with your compiler. The former instruction only takes 6 bytes (2 for the instruct, 4 for the constant), the later two instructions take 8 bytes. For example:

        .syntax unified
        .arch armv7-m
        .code 16
    
        .bss
        .lcomm a_variable, 4
    
        .text
    
        ldr     r1, =a_variable
        movw    r2, #:lower16:a_variable
        movt    r2, #:upper16:a_variable
    

    Which when assembled, linked and disassembled gives:

    $ arm-linux-gnueabi-as -o test.o test.s
    $ arm-linux-gnueabi-ld -Tbss=f0000000 test.o
    arm-linux-gnueabi-ld: warning: cannot find entry symbol _start; defaulting to 0000000000010074
    $ arm-linux-gnueabi-objdump -d a.out
    ...    
    00010074 <.text>:
       10074:       4902            ldr     r1, [pc, #8]    ; (10080 <__bss_start-0x10f80>)
       10076:       f240 0200       movw    r2, #0
       1007a:       f2cf 0200       movt    r2, #61440      ; 0xf000
       1007e:       0000            movs    r0, r0
       10080:       f0000000        .word   0xf0000000