recursion erlang bytecode tail-recursion beam

Tail-call recursive behavior of the BEAM bytecode instruction call_last

We were recently reading the BEAM Book as part of a reading group.

In appendix B.3.3 it states that the call_last instruction has the following behavior

Deallocate Deallocate words of stack, then do a tail recursive call to the function of arity Arity in the same module at label Label

Based on our current understanding, tail-recursive would imply that the memory allocated on the stack can be reused from the current call.

As such, we were wondering what is being deallocated from the stack.

Additionally, we were also wondering why there is a need to deallocate from the stack before doing the tail-recursive call, instead of directly doing the tail recursive call.

Solution

In asm for CPUs, an optimized tailcall is just a jump to the function entry point. I.e. running the whole function as a loop body in the case of tail-recursion. (Without pushing a return address, so when you reach the base-case it's just one return back to the ultimate parent.)

I'm going to take a wild guess that Erlang / BEAM bytecode is remotely similar, even though I know nothing about it specifically.

When execution reaches the top of a function, it doesn't know whether it got there by recursion or a call from another function and would thus have to allocate more space if it needed any.

If you want to reuse already-allocated stack space, you'd have to further optimize the tail-recursion into an actual loop inside the function body, not recursion at all anymore.

Or to put it another way, to tailcall anything, you need the callstack in the same state it was in on function entry. Jumping instead of calling loses the opportunity to do any cleanup after the called function returns, because it returns to your caller, not to you.

But can't we just put the stack-cleanup in the recursion base-case that does actually return instead of tailcalling? Yes, but that only works if the "tailcall" is to a point in this function after allocating new space is already done, not the entry point that external callers will call. Those 2 changes are exactly the same as turning tail-recursion into a loop.