Question regarding tail call optimization

As far as I know, there is a prerequisite for performing tail call optimization is that the recursion point should be the last sentence in the function, and the result of the recursive call should be returned immediately. But why?

Here is a valid example for TCO:

int factorial(int num) {
  if (num == 1 || num == 0)
    return 1;
  return num * factorial(num - 1);
}

So, with the rule, can the below code be optimized too? Why not?

#include <stdio.h>
int factorial(int num) {
  if (num == 1 || num == 0)
    return 1;
  int temp = num * factorial(num - 1);
  printf("%d", temp);
  return temp;
}

I want to know how should I explain to others why the above rule is necessary for having a TCO. But not just simply follow.

Solution

the result of the recursive call should be returned immediately. But why?

That's because in order to optimize a tail call you need to convert the final recursive call into a simple "jump" instruction. When you do this, you are merely "replacing" function arguments and then re-starting the function.

This is only possible if you can "throw away" the current stack frame and re-use it for the same function again, possibly overwriting it. If you need to remember a value to do more calculations and then return, you cannot use the same stack frame for the recursive call (i.e. cannot turn the "call" into a "jump"), as it could possibly erase/modify the value you wanted to remember before returning.

Furthermore, if your function is very simple (like yours) chances are that it could be written without using the stack at all (except for the return address maybe), and only store data in registers. Even in this case, you don't want to make a jump to the same function (that uses the same registers) if you need to remember one of the values before returning.

Here is a valid example for TCO:

int factorial(int num) {
  if (num == 1 || num == 0)
    return 1;
  return num * factorial(num - 1);
}

This is not valid for TCO! You are doing return num * <recursive-call>. The recursive call is not the last thing that the function does, there is a multiplication before returning. It's the same as writing:

int factorial(int num) {
    if (num == 1 || num == 0)
        return 1;
    int tmp = factorial(num - 1);
    tmp *= num;
    return tmp;
}

can the below code be optimized too?

Nope! Again there simply is no tail call there, and it's even more obvious. You are first doing the recursive call, then some other stuff (multiplication and printf), and then returning. This cannot be optimized as a tail call by the compiler.

On the other hand, the following code can be optimized as a tail call:

int factorial(int n, int x) {
    if (n == 1)
        return x;
    int tmp = factorial(n - 1, n * x);
    return tmp;
}

You don't necessarily have to make the recursive call right on the last line of the function. The important thing is that you don't do work in the middle (between the recursive call and the return statement), like for example calling other functions or doing additional calculations.

IMPORTANT: note that just the fact that a classical TCO cannot be performed does not mean that the compiler will not be able to optimize your code in some other way. In fact, your first function is so simple that when compiled with GCC on x86-64 with at least -O2 it just gets converted from recursive to iterative (it basically becomes a single loop). The same goes for my example above, the compiler just doesn't care to do TCO, it sees an even better optimization to make in this case.

Here's the assembler dump of your first function generated by GCC 11 on x86-64 (Godbolt link if you want to play with it). In case you are not familiar with x86: the num argument is in edi, and eax is used for the return value.

factorial:
        mov     eax, 1
        cmp     edi, 1
        jbe     .L1
.L2:
        mov     edx, edi
        sub     edi, 1
        imul    eax, edx
        cmp     edi, 1
        jne     .L2
.L1:
        ret