My problem is very basic. I'm making my first bare-metal program in assembler. The architecture is ARMv7-M and I'm using GNU as and I'm writing in UAL.
I have a variable in .bss (or .data, doesn't matter) declared as follows:
.lcomm a_variable, 4
Then I want to read its value somewhere in the program. For that I first load its address into a register and then load the value of the variable itself into another register:
adr r0, a_variable
ldr r1, [r0, #0]
So far so good. The compiled object contains my a_variable symbol:
00000000 b a_variable
And the generated instructions look like this:
0: f2af 0004 subw r0, pc, #4
4: 6801 ldr r1, [r0, #0]
The problem begins when I want to link the object into resulting image. ld relocates a_variable symbol into the final .bss section at a new address:
20001074 b a_variable
But the final code remains the same and the program really tries to read a_variable from address 0x0 but not from 0x20001074.
I expect that ld somehow substitutes the new address because it seems to do so when you link objects compiled by GCC. I mean if I write a piece of C code doing something similar:
static int a_variable;
void foo(void)
{
a_variable = 5;
}
...then I get the following instructions in the object file:
0: f240 0300 movw r3, #0
4: f2c0 0300 movt r3, #0
8: 2005 movs r0, #5
a: 6018 str r0, [r3, #0]
...but the final image looks like this:
800c: f242 338c movw r3, #9100 ; 0x238c
8010: f2c0 0301 movt r3, #1
8014: 2005 movs r0, #5
8016: 6018 str r0, [r3, #0]
So ld appears to have substituted the real address for the placeholder which as left.
My question is why doesn't this work in case of hand-written assembler code? What do I miss?
The ADR instruction only works when used with a nearby symbol (+/- 4095 in Thumb2 mode) defined in the same section and source file. The GNU assembler should have given an error for referencing the symbol in a different section. In ARM mode your code generates a Error: symbol .bss is in a different section
error, but there's apparently a bug in how GAS handles the ADR instruction in Thumb mode that causes it silently accept it.
Instead you can either use the LDR or MOVW/MOVT instructions to load an arbitrary 32-bit constant, including addresses, into a register. The LDR instruction will put the address into a constant pool and load it from there, while the MOVW/MOVT instructions form the constant in two step, just like with your compiler. The former instruction only takes 6 bytes (2 for the instruct, 4 for the constant), the later two instructions take 8 bytes. For example:
.syntax unified
.arch armv7-m
.code 16
.bss
.lcomm a_variable, 4
.text
ldr r1, =a_variable
movw r2, #:lower16:a_variable
movt r2, #:upper16:a_variable
Which when assembled, linked and disassembled gives:
$ arm-linux-gnueabi-as -o test.o test.s
$ arm-linux-gnueabi-ld -Tbss=f0000000 test.o
arm-linux-gnueabi-ld: warning: cannot find entry symbol _start; defaulting to 0000000000010074
$ arm-linux-gnueabi-objdump -d a.out
...
00010074 <.text>:
10074: 4902 ldr r1, [pc, #8] ; (10080 <__bss_start-0x10f80>)
10076: f240 0200 movw r2, #0
1007a: f2cf 0200 movt r2, #61440 ; 0xf000
1007e: 0000 movs r0, r0
10080: f0000000 .word 0xf0000000