Search code examples
cgccstack-memory

Is -fstack-usage wrong for leaf functions?


When I compare the stack usage provided by gcc -fstack-usage option of two similar functions, one with a function call and another one without, I have different results.

Let's consider this piece of code:

void donothing(void) {}

void leaf(void) {
    int i = 0;
}

void noleaf(void) {
    int i = 0;
    donothing();
}

int main(void) {
    leaf();
    noleaf();

    return 0;
}

I would like to compare the stack usage of leaf and noleaf. Naively, I would say the stack size is equal for both functions as they have the same local variables. However, when I use -fstack-usage option of gcc, I have different results:

$ gcc -fstack-usage -o exe leaf.c
$ cat leaf.su 
leaf.c:1:6:donothing    16  static
leaf.c:3:6:leaf         16  static
leaf.c:7:6:noleaf       32  static
leaf.c:12:5:main        16  static

We can see that leaf has the same stack size than donothing. It means that the local variable is not taken into account.

When I look at the assembly code, I can see that the stack is not manipulated the same way between leaf and noleaf:

000000000000112c <leaf>:
    112c:   55                      push   %rbp
    112d:   48 89 e5                mov    %rsp,%rbp
    1130:   c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)
    1137:   90                      nop
    1138:   5d                      pop    %rbp
    1139:   c3                      retq   

000000000000113a <noleaf>:
    113a:   55                      push   %rbp
    113b:   48 89 e5                mov    %rsp,%rbp
    113e:   48 83 ec 10             sub    $0x10,%rsp
    1142:   c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)
    1149:   e8 d7 ff ff ff          callq  1125 <donothing>
    114e:   90                      nop
    114f:   c9                      leaveq 
    1150:   c3                      retq

In noleaf, we allocate the space on the stack with sub $0x10,%rsp, but, in leaf, the local variable is directly stored on the stack with movl $0x0,-0x4(%rbp) with no "preallocation".

However, even if no "preallocation" is done, the local variable of leaf is still on the stack, right? As the stack is also used for the local variable in leaf, I would expect to have a stack usage of 32 bytes for this function too. Could I say that the output of -fstack-usage is wrong?


EDIT: some comments and answers suggest the difference of stack usage is due to callq and the alignment requirements. Let's consider a modified version of the source file:

void donothing(void) {}

void leaf(void) {
    { int i = 0; int j = 0; int k = 0; int l = 0; }
    { int i = 0; int j = 0; int k = 0; int l = 0; }
    { int i = 0; int j = 0; int k = 0; int l = 0; }
    { int i = 0; int j = 0; int k = 0; int l = 0; }
}

void noleaf(void) {
    { int i = 0; int j = 0; int k = 0; int l = 0; }
    { int i = 0; int j = 0; int k = 0; int l = 0; }
    { int i = 0; int j = 0; int k = 0; int l = 0; }
    { int i = 0; int j = 0; int k = 0; int l = 0; }
    donothing();
}

int main(void) {
    leaf();
    noleaf();

    return 0;
}

If we agree that the difference of stack usage is only explained by the callq and the alignment, we should not observe a difference greater than 16 bytes between the two functions. However, here is the results of -fstack-usage:

su.c:1:6:donothing  16  static
su.c:3:6:leaf       16  static
su.c:10:6:noleaf    80  static
su.c:18:5:main      16  static

We can see that noleaf uses 80 bytes which seems normal to me (64 bytes for the local variables and 16 bytes for the stack pointers, saved rip and rbp). However, the stack size for leaf is still 16 bytes. The local variables are not taken into account.

Here is the assembly code:

000000000000112c <leaf>:
    112c:   55                      push   %rbp
    112d:   48 89 e5                mov    %rsp,%rbp
    1130:   c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)
    1137:   c7 45 f8 00 00 00 00    movl   $0x0,-0x8(%rbp)
    113e:   c7 45 f4 00 00 00 00    movl   $0x0,-0xc(%rbp)
    1145:   c7 45 f0 00 00 00 00    movl   $0x0,-0x10(%rbp)
    114c:   c7 45 ec 00 00 00 00    movl   $0x0,-0x14(%rbp)
    1153:   c7 45 e8 00 00 00 00    movl   $0x0,-0x18(%rbp)
    115a:   c7 45 e4 00 00 00 00    movl   $0x0,-0x1c(%rbp)
    1161:   c7 45 e0 00 00 00 00    movl   $0x0,-0x20(%rbp)
    1168:   c7 45 dc 00 00 00 00    movl   $0x0,-0x24(%rbp)
    116f:   c7 45 d8 00 00 00 00    movl   $0x0,-0x28(%rbp)
    1176:   c7 45 d4 00 00 00 00    movl   $0x0,-0x2c(%rbp)
    117d:   c7 45 d0 00 00 00 00    movl   $0x0,-0x30(%rbp)
    1184:   c7 45 cc 00 00 00 00    movl   $0x0,-0x34(%rbp)
    118b:   c7 45 c8 00 00 00 00    movl   $0x0,-0x38(%rbp)
    1192:   c7 45 c4 00 00 00 00    movl   $0x0,-0x3c(%rbp)
    1199:   c7 45 c0 00 00 00 00    movl   $0x0,-0x40(%rbp)
    11a0:   90                      nop
    11a1:   5d                      pop    %rbp
    11a2:   c3                      retq   

00000000000011a3 <noleaf>:
    11a3:   55                      push   %rbp
    11a4:   48 89 e5                mov    %rsp,%rbp
    11a7:   48 83 ec 40             sub    $0x40,%rsp
    11ab:   c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)
    11b2:   c7 45 f8 00 00 00 00    movl   $0x0,-0x8(%rbp)
    11b9:   c7 45 f4 00 00 00 00    movl   $0x0,-0xc(%rbp)
    11c0:   c7 45 f0 00 00 00 00    movl   $0x0,-0x10(%rbp)
    11c7:   c7 45 ec 00 00 00 00    movl   $0x0,-0x14(%rbp)
    11ce:   c7 45 e8 00 00 00 00    movl   $0x0,-0x18(%rbp)
    11d5:   c7 45 e4 00 00 00 00    movl   $0x0,-0x1c(%rbp)
    11dc:   c7 45 e0 00 00 00 00    movl   $0x0,-0x20(%rbp)
    11e3:   c7 45 dc 00 00 00 00    movl   $0x0,-0x24(%rbp)
    11ea:   c7 45 d8 00 00 00 00    movl   $0x0,-0x28(%rbp)
    11f1:   c7 45 d4 00 00 00 00    movl   $0x0,-0x2c(%rbp)
    11f8:   c7 45 d0 00 00 00 00    movl   $0x0,-0x30(%rbp)
    11ff:   c7 45 cc 00 00 00 00    movl   $0x0,-0x34(%rbp)
    1206:   c7 45 c8 00 00 00 00    movl   $0x0,-0x38(%rbp)
    120d:   c7 45 c4 00 00 00 00    movl   $0x0,-0x3c(%rbp)
    1214:   c7 45 c0 00 00 00 00    movl   $0x0,-0x40(%rbp)
    121b:   e8 05 ff ff ff          callq  1125 <donothing>
    1220:   90                      nop
    1221:   c9                      leaveq 
    1222:   c3                      retq   

Therefore, I think the callq statement is not enough to explain the difference of stack usage between the two functions.


Solution

  • The examples shown are consistent with GCC counting each routine as using 16 bytes (the stack frame used to call it, including the initial push %rbp in each routine) plus the amount by which it decrements the stack pointer.

    Use of the red zone (the portion of the stack that may be used but is below where %rsp points) is not counted.

    This is a reasonable accounting method since the total change in %rsp between the initial start routine and any other routine will equal the sum of the stack use GCC reports for each of the routines in the call stack.