I've been trying to get into embedded C programming...the very bare-metal, only toolchain, not IDE. For that I've been writing my own linker script as well. The issue is that when there are any other functions except main
inside my C code, the PC
and SP
(program counter and stack pointer registers) are messed up.
I'll provide the code:
Linker:
ENTRY(start)
MEMORY {
FLASH (rw) : ORIGIN = 0x0000000000000000, LENGTH = 0x000000000000BFFF
RAM (rwx) : ORIGIN = 0x0000000040000000, LENGTH = 0x000000001FFFFFFF
}
_estack = ORIGIN(RAM) + LENGTH(RAM);
SECTIONS {
.text : {
. = ALIGN(4);
*(.text)
} > FLASH
.data : {
. = ALIGN(4);
_sdata = .;
*(.data)
*(.data*)
. = ALIGN(4);
_edata = .;
} > RAM AT> FLASH
.bss : {
_sbss = .;
*(.bss)
*(.bss*)
. = ALIGN(4);
_ebss = .;
} > RAM
}
ASM file (I'll add more stuff later):
.syntax unified
.cpu cortex-a8
.global start
.word _estack
.word _sbss
.word _ebss
.word _sdata
.word _edata
.section .text
start:
ldr sp, =_estack
b main
Also compile flags are:
CFLAGS := -c \
-g
-march=armv7-a \
-nostdlib
QEMU_FLAGS := -m $(MEMORY) \
-S \
-machine cubieboard \
-cpu cortex-a8 \
-gdb tcp::$(PORT)
Problem demonstration:
C Code:
int main(void) {
volatile int a = 0, b = 1;
for(int i = 2; i < 4; i++) {
int k = a + b;
a = b;
b = k;
}
return 0;
}
GDB (set architecture to "arm") output:
start () at start.s:16
16 ldr sp, =_estack
(gdb) load
Loading section .text, size 0x94 lma 0x0
Start address 0x00000088, load size 148
Transfer rate: 1184 bits in <1 sec, 148 bytes/write.
(gdb) info reg
...
sp 0x0 0x0 <main>
lr 0x0 0
pc 0x88 0x88 <start>
cpsr 0x400001d3 1073742291
fpscr 0x0 0
fpsid 0x410330c0 1090728128
...
Quit
(gdb) step
start () at start.s:18
18 b main
(gdb) step
main () at main.c:1
1 int main(void) {
(gdb) step
3 volatile int a = 0, b = 1;
(gdb) info reg
...
r11 0x5ffffffb 1610612731
r12 0x0 0
sp 0x5fffffe7 0x5fffffe7
lr 0x0 0
pc 0xc 0xc <main+12>
cpsr 0x400001d3 1073742291
fpscr 0x0 0
fpsid 0x410330c0 1090728128
...
ie. all good.
When I have functions:
C code:
void my_func(void) {
int volatile c = 1000;
}
int main(void) {
volatile int a = 0, b = 1;
for(int i = 2; i < 4; i++) {
int k = a + b;
a = b;
b = k;
}
my_func();
return 0;
}
GDB Output:
start () at start.s:16
16 ldr sp, =_estack
(gdb) load
Loading section .text, size 0xbc lma 0x0
Start address 0x000000b0, load size 188
Transfer rate: 1504 bits in <1 sec, 188 bytes/write.
(gdb) step
start () at start.s:18
18 b main
(gdb) step
main () at main.c:5
5 int main(void) {
(gdb) step
my_func () at main.c:3
3 }
(gdb) print a
No symbol "a" in current context.
(gdb) info regs
Undefined info command: "regs". Try "help info".
(gdb) info reg
...
sp 0x0 0x0 <my_func>
lr 0x2c 44
pc 0x14 0x14 <my_func+20>
cpsr 0x400001d7 1073742295
...
(gdb) disas my_func
Dump of assembler code for function my_func:
0x00000000 <+0>: push {r11} @ (str r11, [sp, #-4]!)
0x00000004 <+4>: add r11, sp, #0
0x00000008 <+8>: sub sp, sp, #12
0x0000000c <+12>: mov r3, #1000 @ 0x3e8
0x00000010 <+16>: str r3, [r11, #-8]
=> 0x00000014 <+20>: nop @ (mov r0, r0)
0x00000018 <+24>: add sp, r11, #0
0x0000001c <+28>: pop {r11} @ (ldr r11, [sp], #4)
0x00000020 <+32>: bx lr
End of assembler dump.
(gdb) disas main
Dump of assembler code for function main:
0x00000024 <+0>: push {r11, lr}
0x00000028 <+4>: add r11, sp, #4
0x0000002c <+8>: sub sp, sp, #16
0x00000030 <+12>: mov r3, #0
0x00000034 <+16>: str r3, [r11, #-16]
0x00000038 <+20>: mov r3, #1
0x0000003c <+24>: str r3, [r11, #-20] @ 0xffffffec
0x00000040 <+28>: mov r3, #2
0x00000044 <+32>: str r3, [r11, #-8]
0x00000048 <+36>: b 0x78 <main+84>
0x0000004c <+40>: ldr r2, [r11, #-16]
0x00000050 <+44>: ldr r3, [r11, #-20] @ 0xffffffec
0x00000054 <+48>: add r3, r2, r3
0x00000058 <+52>: str r3, [r11, #-12]
0x0000005c <+56>: ldr r3, [r11, #-20] @ 0xffffffec
0x00000060 <+60>: str r3, [r11, #-16]
0x00000064 <+64>: ldr r3, [r11, #-12]
0x00000068 <+68>: str r3, [r11, #-20] @ 0xffffffec
0x0000006c <+72>: ldr r3, [r11, #-8]
0x00000070 <+76>: add r3, r3, #1
0x00000074 <+80>: str r3, [r11, #-8]
0x00000078 <+84>: ldr r3, [r11, #-8]
0x0000007c <+88>: cmp r3, #3
0x00000080 <+92>: ble 0x4c <main+40>
0x00000084 <+96>: bl 0x0 <my_func>
0x00000088 <+100>: mov r3, #0
0x0000008c <+104>: mov r0, r3
0x00000090 <+108>: sub sp, r11, #4
0x00000094 <+112>: pop {r11, lr}
0x00000098 <+116>: bx lr
(gdb) print &c
$1 = (volatile int *) 0xfffffff8
0xfffffff8 extends beyond the RAM as written on linker skript.
Can someone please explain what is going wrong and how I prevent it?
You have the wrong values for the RAM and FLASH lengths in your linker script. They should be a nice round multiple of something -- memory sizes in hardware are never odd numbers like that.
This mistake has resulted in your setting the initial SP value to 0x5fffffff. This is not permitted by the Arm Procedure Calling Standard, which says:
SP mod 4 = 0. The stack must at all times be aligned to a word boundary.
and that at a public interface (i.e. when you call into a function compiled by the C compiler):
SP mod 8 = 0. The stack must be double-word aligned.
(https://github.com/ARM-software/abi-aa/blob/main/aapcs32/aapcs32.rst)
If you break these rules then the behaviour is not defined, which is to say that anything in theory could happen. The exact specifics of what happens will depend on what code the C compiler has emitted.
In this case we can see that the compiler has emitted LDR and STR instructions to access memory in the stack. In the mode in which the CPU starts execution, those will take a Data Abort exception if you try to use them on an unaligned address. Exceptions on Arm are handled by the CPU starting execution at the appropriate entry point in low memory. Your program doesn't set up any code at the exception vectors, it just puts your startup routine there, so if an exception happens at any point execution will jump to partway through that startup routine.
I would suggest that you put a proper set of exception vectors at the start of your program, even if the code at each entry point is just "branch to same address", i.e. a tight loop. That way if you have a bug in your code then you'll see that you've taken an exception (because the CPU will start looping at one of the exception vector addresses), rather than having weird behaviour.
You should also probably have a look through the procedure calling standard, because it specifies all the things your code needs to have set up to be able to call a C function.