c#.net yield-return tail-call-optimization

Will a properly implemented recursive lazy iterator function never stack overflow?

tl;dr;

In C#, do you have guarantees that a lazy iterator function that calls nothing but itself and does have a valid recursion exit condition will not cause a stack overflow?

Detailed question:

I know that as a rule you don't get guarantees of the Tail Call Optimization (TCO) instruction being generated by the C# compiler (or the JIT), so while you may get TCO, there are no guarantees.

Given this recognition of TCO, I'm wondering if lazy iterator functions (using yield return etc) because of their nature as a coroutine - does each tail call in one even take up stack space? My intuition of coroutines because of their re-entrancy is that each tail call is optimized by default as the ability to jump out of the function and into the next one from the parent's frame instead of creating a new frame seems natural.

Is this behaviour in C#, or do C# iterator functions' recursive calls create a new frame from the current rather than popping out to the parent frame and re-entering with the new parameters?

Example:

public static IEnumerable<IEnumerable<T>> GeneratePermutations<T>(this IEnumerable<T> choices, int numberToChoose)
{
    if (numberToChoose == 1)
    {
        foreach (var choice in choices)
            yield return new T[] { choice };
        yield break;
    }

    var subPermutations = choices.SelectMany(choice =>
        choices.Where(elem => !EqualityComparer<T>.Default.Equals(elem, choice))
            .GeneratePermutations(numberToChoose - 1)
            .Select(permutation => (new T[] { choice }).Concat(permutation)));
    foreach (var perm in subPermutations)
        yield return perm;
}

My intuition is based off in the above example subPermutations is simply a heaped computation, it seems upon call to iterate it, it can know it's a heaped computation (it is a part of the functions sig that it's an iterator function), and therefore immediately jump out of it's current frame and expanding the heaped computation into a new frame - costing no extra stack space over what was there before the recursive call was attempted...

This intuition may be totally unfounded...

Solution

So, let's open with an example method, so that we have something to reference:

public static IEnumerable<int> Foo()
{
    yield return 1;
    foreach (var n in Foo())
        yield return n;
}

Here's our recursive iterator block. I just want to take a moment to call out a few properties of this method that may (or may not) end up being relevant.

There is a recursive call, but the recursive call is after a yield.
When we do reach our recursive call, the only thing we do after that point is yield all of its results. There is no projection on each item, no finally block, nothing after those yields, etc.

So, what happens when some code goes and writes the following?

foreach(var n in Foo())
    Console.WriteLine(n);

Well, the first thing that happens when we reach this statement is to evaluate Foo() to a value. In this case, this creates the state machine that represents the generator of the sequence. We've not actually executed any of the code in the method body though.

Next, we call MoveNext. We hit our first yield block, yield a value, and print it out.

After that, the outer-most level calls MoveNext again. Here our state machine's MoveNext method reaches it's own foreach block. It will, like the Main method, evaluate Foo() to a value, creating a second state machine. It will then immediately call MoveNext on that state machine. That second state machine will reach it's first yield, it will yield the value to the first iterator, which will yield that back up to the main method, that will print it.

Then the main method calls MoveNext again. The first iterator asks the second iterator for it's second item, the second iterator reaches it's foreach method, creates a third iterator, and gets a value from it. The value gets passed all the way up.

As we can see here each time we as our top level iterator for another item the stack is always one level deeper than before. Despite the fact that we're using state machines, and that creating the iterators doesn't consume a lot of stack space, getting the next item in the sequence will consume more and more stack space, until we run out.

When running the code we can see that things work out exactly as described here, and the stack will overflow.

So, how could this possibly be optimized?

Well, what we're hoping to do here is for that top level iterator to realize that when it gets to the foreach that "from now on, the rest of the items in my sequence are identical to all of the items in the recursive call". This does sound a lot like a typical TCO situation.

So at this point we have two issues to solve. First, if we recognize that we're in this situation, can we actually avoid the creation of additional state machines, and thus the continually increasing stack space. It wouldn't be all that easy, likely not quite as easy as traditional non-iterator block TCO. You'd need to set all of the instance fields of the state machine to whatever the instance fields would be of the state machine that would be created if you had called Foo. I'm just going to wave my hands at this point and say that this sounds possible, but not exactly super each.

Then we have the other problem. How can we recognize that we're actually in this position where TCO is valid? We need to be recursively calling ourselves, we need to be doing nothing with that method call other than iterating the whole thing and yielding each item exactly as it stands, we need to not be in a try or using block (else the finally block would be lost), and there can't be any methods after that iteration.

Now, if there were a yield foreach operator then this wouldn't be so bad. You'd just set up the rule that if the very last statement in the iterator block is a yield foreach operator with a recursive call to the method at the very end, apply TCO. Sadly, in C# (unlike some other .NET languages) we have no yield foreach operator. We need to type out the whole foreach operator, while also not doing anything other than yielding the item exactly as it stands. That seems...a bit awkward.

To recap:

Is it possible for the compiler to use Tail Call Optimization for recursive iterator blocks?
- Most likely.
Is it done by the compiler ever?
- It doesn't appear so.
Would it be particularly feasible to add this support into the compiler?
- Probably not.