I'm trying to build a bare-metal application (no OS, no bootloader) and run it in QEMU and am seeing some weird behavior with the str
instruction not seeming to do anything.
For some context I just want to inject my program directly into RAM and run it. I'm using a modified bare-metal linker and startup.S as an example for laying out the memory and setting up the C environment. I don't really care about which ARM platform I'm using so I used the same one from their example, the vexpress-a9 with the cortex-a9 processor.
I modified the start-up file in order to have execution start directly at the start exception vector at 0x0
(which I'm treating as ROM, even though I know it's not). The idea is that the .text
section gets put here, some set-up happens to set-up the .data
, .bss
and stack, and then I branch to main
.
MEMORY
{
ROM (rx) : ORIGIN = 0x00000000, LENGTH = 1M
RAM (rwx): ORIGIN = 0x00400000, LENGTH = 4M
}
This actually works in that I can start QEMU, attach a gdb session, and step through the initialization code, but for the set-up that should happen in "RAM" (starting at 0x00400000
) nothing gets initialized at all.
For the bit of assembly below, the idea is that I want to fill the FIQ stack section with 0xFEFEFEFE
. So I set r1
to the start of the stack, sp
to the end, and while r1
< sp
I populate the address contained within r1
with the value in r0
and increment the address in r1
by 4 bytes.
Reset_Handler:
/* FIQ stack */
msr cpsr_c, MODE_FIQ
ldr r1, =_fiq_stack_start
ldr sp, =_fiq_stack_end
movw r0, #0xFEFE
movt r0, #0xFEFE
fiq_loop:
cmp r1, sp
strlt r0, [r1], #4 <<<< ISSUE HERE
blt fiq_loop
This does loop correctly for the right number of iterations (the size of the stack), but nothing is happening for the strlt r0, [r1], #4
instruction.
If I inspect before the str
instruction, r1
is the start of the stack and the value is 0x0
:
>>> p/x $r1
$2 = 0x400008
>>> x/2hx $r1
0x400008: 0x0000 0x0000
After I step over the str
instruction, r1
has moved 4 bytes, but the memory at the start of the stack is still 0x0
:
>>> p/x $r1
$3 = 0x40000c
>>> x/2hx 0x400008
0x400008: 0x0000 0x0000
The memory doesn't get updated, but I can directly set values there so I know that it can be updated:
>>> set *(0x400008)=0x12345678
>>> x/2hx 0x400008
0x400008: 0x5678 0x1234
I'm starting qemu with:
qemu-system-arm \
-nographic \
-s \
-S \
--no-reboot \
-machine vexpress-a9 \
-cpu cortex-a9 \
-m 12M \
-device loader,file=out.elf
I've compiled with the -mcpu=cortex-a9
option, and believe I've provided QEMU with enough RAM. I'm really lost as to what's happening here, any help is appreciated.
Per request, I've also added clarification on the state of the following entities:
What is the value of _fiq_stack_start
?
0x00400008
<- This is what I expect, as I expect the fiq stack to start after the .data section, which holds 8 bytes
What is the value of _fiq_stack_end
?
0x00401008
<- This is what I expect, as I specified the stack to be 4096 bytes
What are the contents of r1
at the moment of the cmp
instruction?
r1 = 0x00400008
<- This is what I expect, as r1 should contain the start of the stack.
What are the contents of the sp register?
0x00401008
<- This is what I expect, as this should be the end of the stack
What are the the condition code bits at the moment the strlt
starts?
Before the compare
CPSR = 0x40000111
and after the compareCPSR = 0x80000111
. This is expected b/c the value inr1
is less than the value ofsp
and the result of a positive signed comparison should put a 1 in bit 31.
What are the contents of r0
?
0xfefefefe
<- This is what I expect based on the twomv
instructions to fill ther0
register with the value I want to be in the stack.
What happens if you change the strlt
to str
?
I actually tested this already, and I got the same behavior.
I've also tried these simple instructions:
mov r0, #0x1234
mov r3, #0x2
str r0, [r3] /* Store value of R0 into addr at r3 */
And after stepping over each instruction I would expect the value 0x1
held with r0
to be placed into the memory address of 0x2
held within r3
. But after inspection it isn't.
>>> p/x $r0
$7 = 0x1
>>> p/x $r3
$8 = 0x2
>>> x/2hx $r3
0x2 <_Reset+2>: 0xea00 0x0041
It's as if the str instruction is completely ignored.
"When I try to store to memory it's as if nothing happens" almost always means "I'm trying to store to somewhere where there isn't actually RAM". Sometimes this is "nothing's there", sometimes this is "there's flash or ROM there so the write is ignored". The root cause is usually "program linked to the wrong addresses".
These addresses:
ROM (rx) : ORIGIN = 0x00000000, LENGTH = 1M
RAM (rwx): ORIGIN = 0x00400000, LENGTH = 4M
don't match what QEMU models for the vexpress-a9. The first is OK (address 0 is the remappable area, which QEMU models as "always the flash memory, not guest runtime configurable"), but there is no RAM at 0x00400000 -- this address is inside the flash memory so it is not writeable like RAM.
You should use a linker map which puts the RAM area in what the memory map calls "Local DDR2", which starts at 0x6000_0000. The linker script in the tutorial you started from gets this (sort of) right because it uses 0x6000_0000 and 0x7000_0000 -- though note that that only works because the documentation in that tutorial says to use -m 512M which provides enough RAM to get up to the 0x7000_0000 memory range.
The reason your linker script happens to work on the xilinx-zynq-a9 board is that that board puts its block of RAM at address 0.
The overall thing here that I think is important to understand is that when you're writing a linker script you don't get to choose the addresses arbitrarily. The linker script must be written to match the address map of the board you're going to run the resulting bare-metal binary on, and then that binary won't run on a different board type.
(Incidentally, there is a QEMU bug on this board where we try to map both RAM and flash at address 0 -- I think effectively the flash "wins", so the effect is that the low address area is flash.)