Search code examples
c#compilationiteratorstate-machine

Why the compiler-generated state machine restores repeatedly the state to -1?


I am trying to understand how the iterators work internally, to mitigate some concerns I have about thread-safety. Lets consider for example the following simple iterator:

using System.Collections.Generic;

public class MyClass
{
    public static IEnumerable<int> MyMethod()
    {
        yield return 10;
        yield return 20;
        yield return 30;
    }
}

I can see the compiler-generated state machine that is created behind the scenes, after copy-pasting this code to SharpLab.io. It is a class that implements the interfaces IEnumerable<int> and IEnumerator<int>, and contains the MoveNext method below:

private bool MoveNext()
{
    switch (<>1__state)
    {
        default:
            return false;
        case 0:
            <>1__state = -1;
            <>2__current = 10;
            <>1__state = 1;
            return true;
        case 1:
            <>1__state = -1;
            <>2__current = 20;
            <>1__state = 2;
            return true;
        case 2:
            <>1__state = -1;
            <>2__current = 30;
            <>1__state = 3;
            return true;
        case 3:
            <>1__state = -1;
            return false;
    }
}

The identifiers <>1__state and <>2__current are private fields of this class:

private int <>1__state;
private int <>2__current;

I noticed a pattern in this code. At first the value of the <>1__state field is restored to -1, then the <>2__current is assigned to the next iteration value, then the <>1__state is advanced to the next state. My question is: what is the purpose of the <>1__state = -1; line? I compiled this code (after painfully renaming all the illegal identifiers) and confirmed that this line can be commented-out without affecting the functionality of the class. I don't believe that the C# compiler team just forgot this seemingly purposeless piece of code hanging around there. Surely there must be a purpose for its existence, and I would like to know what this purpose is.


Solution

  • There isn't one definitive answer as to why you need a state variable and set it to -1 each time you enter your switch statement. But I can think of one example where you would really need the variable.

    Like I said in the comment section, the compiler isn't aware and doesn't really care what <>2__current does.

    It might be a long-running web request to download a file. It might be the result of a calculation or it might just be an integer as in your example. But here lies the problem, because the compiler isn't aware of what your code does, it might throw an exception. Let's look at an example of what would happen if you omitted the _state variable and you would run into an exception trying to download something.

    1) MoveNext is called.
    2) this.<>2_current = WebRequest.GetFileAsync() throws HttpRequestException.
    3) The exception is caught somewhere and the execution of the program is resumed.
    4) The caller invokes MoveNext method.
    5) this.<>2_current = WebRequest.GetFileAsync() throws HttpRequestException
    

    So in this case, we would be stuck in a loop because the state would be changed only after successfully downloading that data.

    When we introduce the _state variable the result looks a lot different.

    1) MoveNext is called.
    2) this.<>2_current = WebRequest.GetFileAsync() throws HttpRequestException.
    3) The exception is caught somewhere and execution of the program is resumed.
    4) The caller invokes MoveNext method.
    
    5) Since there’s no switch case for -1, the default block is reached which informs about the end of a sequence.