When I compare the stack usage provided by gcc -fstack-usage
option of two similar functions, one with a function call and another one without, I have different results.
Let's consider this piece of code:
void donothing(void) {}
void leaf(void) {
int i = 0;
}
void noleaf(void) {
int i = 0;
donothing();
}
int main(void) {
leaf();
noleaf();
return 0;
}
I would like to compare the stack usage of leaf
and noleaf
. Naively, I would say the stack size is equal for both functions as they have the same local variables. However, when I use -fstack-usage
option of gcc, I have different results:
$ gcc -fstack-usage -o exe leaf.c
$ cat leaf.su
leaf.c:1:6:donothing 16 static
leaf.c:3:6:leaf 16 static
leaf.c:7:6:noleaf 32 static
leaf.c:12:5:main 16 static
We can see that leaf
has the same stack size than donothing
. It means that the local variable is not taken into account.
When I look at the assembly code, I can see that the stack is not manipulated the same way between leaf
and noleaf
:
000000000000112c <leaf>:
112c: 55 push %rbp
112d: 48 89 e5 mov %rsp,%rbp
1130: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
1137: 90 nop
1138: 5d pop %rbp
1139: c3 retq
000000000000113a <noleaf>:
113a: 55 push %rbp
113b: 48 89 e5 mov %rsp,%rbp
113e: 48 83 ec 10 sub $0x10,%rsp
1142: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
1149: e8 d7 ff ff ff callq 1125 <donothing>
114e: 90 nop
114f: c9 leaveq
1150: c3 retq
In noleaf
, we allocate the space on the stack with sub $0x10,%rsp
, but, in leaf
, the local variable is directly stored on the stack with movl $0x0,-0x4(%rbp)
with no "preallocation".
However, even if no "preallocation" is done, the local variable of leaf
is still on the stack, right? As the stack is also used for the local variable in leaf
, I would expect to have a stack usage of 32 bytes for this function too. Could I say that the output of -fstack-usage
is wrong?
EDIT: some comments and answers suggest the difference of stack usage is due to callq
and the alignment requirements. Let's consider a modified version of the source file:
void donothing(void) {}
void leaf(void) {
{ int i = 0; int j = 0; int k = 0; int l = 0; }
{ int i = 0; int j = 0; int k = 0; int l = 0; }
{ int i = 0; int j = 0; int k = 0; int l = 0; }
{ int i = 0; int j = 0; int k = 0; int l = 0; }
}
void noleaf(void) {
{ int i = 0; int j = 0; int k = 0; int l = 0; }
{ int i = 0; int j = 0; int k = 0; int l = 0; }
{ int i = 0; int j = 0; int k = 0; int l = 0; }
{ int i = 0; int j = 0; int k = 0; int l = 0; }
donothing();
}
int main(void) {
leaf();
noleaf();
return 0;
}
If we agree that the difference of stack usage is only explained by the callq
and the alignment, we should not observe a difference greater than 16 bytes between the two functions. However, here is the results of -fstack-usage
:
su.c:1:6:donothing 16 static
su.c:3:6:leaf 16 static
su.c:10:6:noleaf 80 static
su.c:18:5:main 16 static
We can see that noleaf
uses 80 bytes which seems normal to me (64 bytes for the local variables and 16 bytes for the stack pointers, saved rip and rbp). However, the stack size for leaf
is still 16 bytes. The local variables are not taken into account.
Here is the assembly code:
000000000000112c <leaf>:
112c: 55 push %rbp
112d: 48 89 e5 mov %rsp,%rbp
1130: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
1137: c7 45 f8 00 00 00 00 movl $0x0,-0x8(%rbp)
113e: c7 45 f4 00 00 00 00 movl $0x0,-0xc(%rbp)
1145: c7 45 f0 00 00 00 00 movl $0x0,-0x10(%rbp)
114c: c7 45 ec 00 00 00 00 movl $0x0,-0x14(%rbp)
1153: c7 45 e8 00 00 00 00 movl $0x0,-0x18(%rbp)
115a: c7 45 e4 00 00 00 00 movl $0x0,-0x1c(%rbp)
1161: c7 45 e0 00 00 00 00 movl $0x0,-0x20(%rbp)
1168: c7 45 dc 00 00 00 00 movl $0x0,-0x24(%rbp)
116f: c7 45 d8 00 00 00 00 movl $0x0,-0x28(%rbp)
1176: c7 45 d4 00 00 00 00 movl $0x0,-0x2c(%rbp)
117d: c7 45 d0 00 00 00 00 movl $0x0,-0x30(%rbp)
1184: c7 45 cc 00 00 00 00 movl $0x0,-0x34(%rbp)
118b: c7 45 c8 00 00 00 00 movl $0x0,-0x38(%rbp)
1192: c7 45 c4 00 00 00 00 movl $0x0,-0x3c(%rbp)
1199: c7 45 c0 00 00 00 00 movl $0x0,-0x40(%rbp)
11a0: 90 nop
11a1: 5d pop %rbp
11a2: c3 retq
00000000000011a3 <noleaf>:
11a3: 55 push %rbp
11a4: 48 89 e5 mov %rsp,%rbp
11a7: 48 83 ec 40 sub $0x40,%rsp
11ab: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
11b2: c7 45 f8 00 00 00 00 movl $0x0,-0x8(%rbp)
11b9: c7 45 f4 00 00 00 00 movl $0x0,-0xc(%rbp)
11c0: c7 45 f0 00 00 00 00 movl $0x0,-0x10(%rbp)
11c7: c7 45 ec 00 00 00 00 movl $0x0,-0x14(%rbp)
11ce: c7 45 e8 00 00 00 00 movl $0x0,-0x18(%rbp)
11d5: c7 45 e4 00 00 00 00 movl $0x0,-0x1c(%rbp)
11dc: c7 45 e0 00 00 00 00 movl $0x0,-0x20(%rbp)
11e3: c7 45 dc 00 00 00 00 movl $0x0,-0x24(%rbp)
11ea: c7 45 d8 00 00 00 00 movl $0x0,-0x28(%rbp)
11f1: c7 45 d4 00 00 00 00 movl $0x0,-0x2c(%rbp)
11f8: c7 45 d0 00 00 00 00 movl $0x0,-0x30(%rbp)
11ff: c7 45 cc 00 00 00 00 movl $0x0,-0x34(%rbp)
1206: c7 45 c8 00 00 00 00 movl $0x0,-0x38(%rbp)
120d: c7 45 c4 00 00 00 00 movl $0x0,-0x3c(%rbp)
1214: c7 45 c0 00 00 00 00 movl $0x0,-0x40(%rbp)
121b: e8 05 ff ff ff callq 1125 <donothing>
1220: 90 nop
1221: c9 leaveq
1222: c3 retq
Therefore, I think the callq
statement is not enough to explain the difference of stack usage between the two functions.
The examples shown are consistent with GCC counting each routine as using 16 bytes (the stack frame used to call it, including the initial push %rbp
in each routine) plus the amount by which it decrements the stack pointer.
Use of the red zone (the portion of the stack that may be used but is below where %rsp
points) is not counted.
This is a reasonable accounting method since the total change in %rsp
between the initial start routine and any other routine will equal the sum of the stack use GCC reports for each of the routines in the call stack.