Why adding one integer to a function does not cut recursion stack depth in half

Edit1: As advised by some comments, I printed the address of variable.

Edit2: As advised by some comments, I added a bit manipulation so that compilers cannot simply throw my variable away

Edit3: As advised by one answer, I revised the way a variable's is printed.

Edit4: Added quite a bit of non-sense to make life difficult for gcc

Edit5: Add a snippet three to test @4386427's theory--the result looks to support his thought that a compiler may reserve 32 Bytes by default. Therefore, we may need to define at least 5 variable to see the difference.

I have some basic understanding of stack memory and heap memory. Take C as an example, if I define a local variable in a function, it occupies stack memory; if I define a pointer and allocate a few memory blocks to it, these memory blocks occupy heap memory. If a function calls itself recursively, the stack will be full and overflow will occur. So I did a simple test and the only difference between snippets one and two is that snippet two has one more integer defined:

snippet one:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int function(int depth) {
  int tmp = rand() % 65536;
  tmp = tmp - 1;
  printf("val: %d; addr: %p; depth: %d\n", tmp, (void*)&tmp, depth);
  tmp = function(++depth) + 1;
  return tmp;
}
int main() {
  srand(time(NULL));
  int res = function(0);
  printf("%d\n", res);
  return 0;
}

output one:

...
val: 57227; addr: 0x7fff00dff78c; depth: 174626
val: 8288; addr: 0x7fff00dff75c; depth: 174627
val: 24194; addr: 0x7fff00dff72c; depth: 174628
Segmentation fault

snippet two:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int function(int depth) {
  int tmp0 = rand() % 65536;
  int tmp1 = rand() % 65536;
  tmp0 = tmp0 - 1;
  printf("val: %d, %d; addr: %p, %p; depth: %d\n", tmp0, tmp1, (void*)&tmp0, (void*)&tmp1, depth);
  tmp1 = function(++depth);
  return tmp1 - tmp0;
}
int main() {
  srand(time(NULL));
  int res = function(0);
  printf("%d\n", res);
  return 0;
}

output two:

...
val: 40745, 32446; addr: 0x7ffcb80b079c, 0x7ffcb80b0798; depth: 174528
val: 34014, 57470; addr: 0x7ffcb80b076c, 0x7ffcb80b0768; depth: 174529
val: 56801, 34478; addr: 0x7ffcb80b073c, 0x7ffcb80b0738; depth: 174530
Segmentation fault

I compiled both code using gcc and as expected both of them cause stack overflow. However, what I initially expected is that the depth of snippet two will be much shallower given that function in snippet two uses 2x memory. However, while snippet two does segfault a bit earlier, the depth of two stacks are actually very close...

If everything works as my naive theory, function in snippet one calls itself 174,616 times, it needs to occupy 4 Bytes * 174,616 / 1,024 = 682 KBytes; function in snippet two calls itself 174,539 times, it need to occupy (4 + 4) Bytes * 174,539 = 1,363 KBytes.

So why is it like this?

snippet three

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int function(int depth) {
  int tmp0 = rand() % 65536;
  int tmp1 = rand() % 65536;
  int tmp2 = rand() % 65536;
  int tmp3 = rand() % 65536;
  int tmp4 = rand() % 65536;
  int tmp5 = rand() % 65536;
  tmp0 = tmp0 - 1;
  tmp1 = tmp1 + 1;
  tmp2 = tmp2 - 2;
  tmp3 = tmp3 + 2;
  tmp4 = tmp4 - 3;
  tmp5 = tmp5 + 3;
  printf("val: %d, %d, %d; addr: %p, %p, %p; depth: %d\n", tmp0, tmp1, tmp2, (void*)&tmp0, (void*)&tmp1, (void*)&tmp2, depth);
  tmp1 = function(++depth);
  return tmp0 - tmp1 + tmp2 - tmp3 + tmp4;
}
int main() {
  srand(time(NULL));
  long res = function(0);
  printf("%d\n", res);
  return 0;
}

output three

val: 9366, 56113, 48970; addr: 0x7fff063fe830, 0x7fff063fe82c, 0x7fff063fe828; depth: 130920
val: 11924, 11633, 26004; addr: 0x7fff063fe7f0, 0x7fff063fe7ec, 0x7fff063fe7e8; depth: 130921
val: 13316, 42397, 45027; addr: 0x7fff063fe7b0, 0x7fff063fe7ac, 0x7fff063fe7a8; depth: 130922
val: 4285, 58053, 21693; addr: 0x7fff063fe770, 0x7fff063fe76c, 0x7fff063fe768; depth: 130923
Segmentation fault

Solution

So why is it like this?

Well, what you describe is in principle correct but... You are forgetting compiler optimization. The compiler is allowed to do all kind of optimizations as long as it doesn't change the observable behavior of your program.

For instance the compiler could decide to keep all your tmp variables in cpu registers. In that case there would be no stack memory assigned to them.

You can try to force them to memory by printing their address like:

printf("%p\n", (void*)&tmp);

Another thing that may happen is that the compiler changes the stack pointer with the same amount for both functions. On my system the compiler reserves 32 bytes in both cases. I had to use 5 tmp variables before the compiler change it to 48. In other words - don't expect that the function reserves exactly what it needs. It is allowed to reserve more. On my system it seems that it always change the stack pointer by N x 16.

To get a better understanding, you need to look at the generated machine code. For that you can use gcc -S

You can also check this https://godbolt.org/z/x5c3bj7M9

Both function starts with:

    push    rbp
    mov     rbp, rsp
    sub     rsp, 32   <------ Stack pointer change, i.e. reserving memory
    mov     DWORD PTR [rbp-20], edi

so they use the same amount of stack for both calls.

Same code compiled with -O2 gives a completely different result https://godbolt.org/z/f6jMEqaPd

The first function gives:

    push    rbx
    mov     ebx, edi
    sub     rsp, 48   <------ Stack pointer change, i.e. reserving memory

but the second gives:

    push    rbx
    mov     ebx, edi
    sub     rsp, 16   <------ Stack pointer change, i.e. reserving memory

The conclusion is: You can't just look at the C code and figure out what will happen. You need the machine code.