c#f#tail-recursion cil tail-call-optimization

What is the purpose of the extra ldnull and tail. in F# implementation vs C#?

The following C# function:

T ResultOfFunc<T>(Func<T> f)
{
    return f();
}

compiles unsurprisingly to this:

IL_0000:  ldarg.1     
IL_0001:  callvirt    05 00 00 0A 
IL_0006:  ret

But the equivalent F# function:

let resultOfFunc func = func()

compiles to this:

IL_0000:  nop         
IL_0001:  ldarg.0     
IL_0002:  ldnull      
IL_0003:  tail.       
IL_0005:  callvirt    04 00 00 0A 
IL_000A:  ret

(Both are in release mode). There's an extra nop at the beginning which I'm not too curious about, but the interesting thing is the extra ldnull and tail. instructions.

My guess (probably wrong) is that ldnull is necessary in case the function is void so it still returns something (unit), but that doesn't explain what is the purpose of the tail. instruction. And what happens if the function does push something on the stack, isn't it stuck with an extra null that doesn't get popped?

Solution

The C# and F# versions have an important distinction: C# function doesn't have any parameters, but F# version has one parameter of type unit. That unit value is what shows up as ldnull (because null is being used as representation of the only unit value, ()).

If you were to translate the second function to C#, it would look like this:

T ResultOfFunc<T>( Func<Unit, T> f ) {
   return f( null );
}

As for the .tail instruction - that is so called "tail call optimization".
During a regular function call, a return address gets pushed on the stack (the CPU stack) and then the function is called. When the function is done, it executes the "return" instruction, which pops the return address off the stack and transfers control there.
However, when function A calls function B, and then immediately returns function B's return value, without doing anything else, the CPU can skip pushing the extra return address on the stack, and perform a "jump" to B instead of a "call". That way, when B executes the "return" instruction, the CPU will pop return address from the stack, and that address will point not to A, but to whoever called A in the first place.
Another way to think about it is: function A calls function B not before returning, but instead of returning, and thus delegates the honor of returning to B.

So in effect, this magic technique allows us to make a call without consuming a spot on the stack, which means that you can perform arbitrarily many such calls without risking stack overflow. This is very important in functional programming, because it allows to efficiently implement recursive algorithms.

It is called "tail call", because call to B happens, so to say, at the tail of A.