test.c (bare metal)
#include <stdio.h>
int add1(int a, int b)
{
int c;
c = a + b;
return c;
}
int main()
{
int x, y, z;
x = 3;
y = 4;
z = add1(x,y);
printf("z = %d\n", z);
}
I do aarch64-none-elf-gcc test.c -specs=rdimon.specs
and get a.out. I do aarch64-none-elf-objdump -d a.out
and got the assemlby code. Here is the main function.
00000000004002e0 <add1>:
4002e0: d10083ff sub sp, sp, #0x20 <-- reduce sp by 0x20 (just above it are saved fp and lr of main)
4002e4: b9000fe0 str w0, [sp, #12] <-- save first param x at sp + 12
4002e8: b9000be1 str w1, [sp, #8] <-- save second param y at sp + 8
4002ec: b9400fe1 ldr w1, [sp, #12] <-- load w1 with x
4002f0: b9400be0 ldr w0, [sp, #8] <-- load w0 with y
4002f4: 0b000020 add w0, w1, w0 <-- w0 = w1 + w0
4002f8: b9001fe0 str w0, [sp, #28] <-- store x0 to sp+28
4002fc: b9401fe0 ldr w0, [sp, #28] <-- load w0 with the result (seems redundant)
400300: 910083ff add sp, sp, #0x20 <-- increment sp by 0x20
400304: d65f03c0 ret
0000000000400308 <main>:
400308: a9be7bfd stp x29, x30, [sp, #-32]! <-- save x29(fp) and x30(lr) at sp - 0x20
40030c: 910003fd mov x29, sp <-- set fp to new sp, the base of stack growth(down)
400310: 52800060 mov w0, #0x3 // #3
400314: b9001fe0 str w0, [sp, #28] <-- x is assigned in sp + #28
400318: 52800080 mov w0, #0x4 // #4
40031c: b9001be0 str w0, [sp, #24] <-- y is assiged in sp + #24
400320: b9401be1 ldr w1, [sp, #24] <-- load func param for y
400324: b9401fe0 ldr w0, [sp, #28] <-- load func param for x
400328: 97ffffee bl 4002e0 <add1> <-- call add1 (args are in w0, w1)
40032c: b90017e0 str w0, [sp, #20] <-- store x0(result z) to sp+20
400330: b94017e1 ldr w1, [sp, #20] <-- load w1 with the result (why? seems redundant. it's already in w0)
400334: d0000060 adrp x0, 40e000 <__sfp_handle_exceptions+0x28>
400338: 91028000 add x0, x0, #0xa0 <-- looks like loading param x0 for printf
40033c: 940000e7 bl 4006d8 <printf>
400340: 52800000 mov w0, #0x0 // #0 <-- for main's return value..
400344: a8c27bfd ldp x29, x30, [sp], #32 <-- recover x29 and x30 (look's like values in x29, x30 was used in the fuction who called main)
400348: d65f03c0 ret
40034c: d503201f nop
I added my understanding with <--
mark. Could someone see the code and give me some corrections? Any small comment will be appreciated. (please see from <main>
)
ADD : Thanks for the comments. I think I forget to ask my real questions. At the start of main, the program who called main should have put it's return address(after main) in x30. And since main should call another function itself, it should modify x30, so it saves x30 in its stack. But why does it store it in sp - #0x20? and why are the variables x,y,z stored in sp + #20, sp + #24, sp + #28? If the main function calls printf, I guess sp and x29 will be decremented by some amount. Is this amount dependent on how much stack area the called function(here printf) uses? or is it constant? and how is the x29, x30 storage location in main determined? Is it determined so that those two values are located just above the called function(printf)'s stack area? Sorry for too many questions.
In laying out the stack for main
, the compiler has to satisfy the following constraints:
x29
and x30
need to be saved on the stack. They occupy 8 bytes each.
The local variables x,y,z
need stack space, 4 bytes each. (If you were optimizing, you'd see them kept in registers instead, or optimized completely out of existence.) That brings us to a total of 8+8+4+4+4=28
bytes.
The stack pointer sp
must always be kept aligned to 16 bytes; this is an architectural and ABI constraint (the OS can choose to relax this requirement but normally doesn't). So we can't just subtract 28 from sp
; we must round up to the next multiple of 16, which is 32.
So that's where the 32 or 0x20
that you mention comes from. Note that it is entirely for stack memory used by main
itself. It's not a universal constant; you would see it change if you added or removed enough local variables from main
.
It has nothing to do with whatever printf
needs. If printf
needs stack space for its own local variables, then the code within printf
will have to take care of adjusting the stack pointer accordingly. The compiler when compiling main
does not know how much space that would be, and does not care.
Now the compiler needs to organize these five objects x29, x30, x, y, z
within the 32 bytes of stack space that it will create for itself. The choice of what to put where could be almost completely arbitrary, except for the following point.
The function's prologue needs to both subtract 32 from the stack pointer, and store the registers x29, x30
somewhere within the allocated space. This can all be done in a single instruction with the pre-indexed store-pair instruction stp x29, x30, [sp, #-32]!
. It subtracts 32
from sp
, then stores x29
and x30
in the 16 bytes starting at the address where sp
now points. So in order to use this instruction, we have to accept placing x29
and x30
at the bottom of the allocated space, at offsets [sp+0]
and [sp+8]
relative to the new value of sp
. Putting them anywhere else would require extra instructions and be less efficient.
(Actually, because this is the most convenient way to do it, the ABI actually requires that stack frames be set up this way, with x29, x30
contiguous on the stack in that order, when they are used at all (5.2.3).)
We still have 16 bytes starting at [sp+16]
to play with, in which x,y,z
must be placed. The compiler has chosen to put them at addresses [sp+28], [sp+24], [sp+20]
respectively. The 4 bytes at [sp+16]
remain unused, but remember, we had to waste 4 bytes somewhere in order to achieve the proper stack alignment. The choice of arranging these objects, and which slot to leave unused, was completely arbitrary and any other arrangement would have worked just as well.