Search code examples
c#language-designbackwards-compatibilityyield-return

Could using yield as a contextual keyword ever cause an issue


In Essential C# it states:

After C# 1.0, no new reserved keywords were introduced to C#. However, some constructs in later versions use contextual keywords, which are significant only in specific locations. Outside these designated locations, contextual keywords have no special significance.* By this method, most C# 1.0 code is compatible with the later standards.

*For example, early in the design of C# 2.0, the language designers designated yield as a keyword, and Microsoft released alpha versions of the C# 2.0 compiler, with yield as a designated keyword, to thousands of developers. However, the language designers eventually determined that by using yield return rather than yield, they could ultimately avoid adding yield as a keyword because it would have no special significance outside its proximity to return.

Now I don't understand this, as before c# 2.0, every method returning an IEnumerable would have had to have a return statement in it, whereas yield could be used as a contextual keyword only inside a method that returned an IEnumerable but had no return statement. eg

public IEnumerable<int> GetInts()
{
    for (int i = 0; i < 1000; i++)
        yield i;
}

Since this method would not have compiled pre-C# 2.0, I don't see how this could break backwards compatability.

So my question is:

Are there any situations where using yield instead of yield return in C# would have broken backwards compatibility, or otherwise caused issues?


Solution

  • Problem

    for (int i = 0; i < 1000; i++)
        yield i;
    

    This is indeed not valid without a yield keyword, but what if we add parentheses around the i?

    for (int i = 0; i < 1000; i++)
        yield (i);
    

    Now this a perfectly valid call of a method named yield. So if we interpreted yield (i); as a use of the contextual keyword yield, the meaning of this valid code would change, breaking backwards compatibility.

    A more formal way to look at this would be like this: If we change the grammar of C# 2 to replace statement: 'yield' 'return' expression ';' with statement: 'yield' expression ';', then there'll be an ambiguity between that rule and the rule for function calls because expression can be derived to '(' expression ')' and 'yield' '(' expression ')' ';' could also be a function call in an expression statement.

    Possible Solution 1

    You could of course say that only yield i; (or any other expression that does not start with an opening parenthesis) should be interpreted as a use of the contextual keyword while yield (i); would still be seen as a method call. However that'd be quite inconsistent and surprising behavior - adding parentheses around an expression shouldn't change the semantics like that.

    Also this would mean changing the above grammar rule to something like statement: 'yield' expressionNoStartingParen ';' and then defining expressionNoStartingParen, which would duplicate most of the actual definition of expression. That'd make the grammar pretty complicated (though you could work around that by just describing the no-starting-parenthesis requirement in words instead of in the grammar and then use a flag to track this in actual implementations (though that would probably not be an option using most parser generators)).

    Possible Solution 2

    Another way to resolve this ambiguity, which you've mentioned in comments, would be to only interpret yield expression; as a yield statement when inside non-void methods that do not have a return statement. This would maintain backwards-compatibility because such methods would be invalid in C# 1 anyway. However this would be somewhat inconsistent because now you could define a method named yield and call it in methods that don't use yield-statements, but not methods that do.

    More importantly this isn't what contextual keyword are usually like. Normally a contextual keyword acts as an identifier whenever it's used in any place where identifiers are valid and can only be used as a keyword in places where identifiers could not occur. This would not be the case here. That's not only inconsistent with how contextual keywords usually work and would make it more difficult for readers to distinguish yield-as-a-keyword from yield-as-an-identifier, it would also make it much more difficult to implement:

    Not only wouldn't you be able to tell whether yield(x); is a yield statement just by looking at that line (you'd need to look at the whole method); the parser wouldn't either - it would have to know whether the method contains a return statement. This would require two distinct definitions for bodies with and without return in the grammar - and a separate definition of what's allowed as an identifier in each one. That would be a horrible grammar to look at and also to implement.

    In practice one would most likely create an ambiguous grammar and then parse yield (x); into a placeholder AST that contains both the possibility that it's a yield statement or a function call. Then you'd try to typecheck both and throw away the one that doesn't typecheck. This would work, but it's pretty uncommon to do and would have required extensive changes to how parsing works in the compiler and how it then works with the AST. Any other implementations of the language (Mono, Roslyn) would then also have had to deal with this complexity, making it more difficult to create new implementations.

    Conclusion

    So in conclusion, both ways to work around this issue lead to some inconsistencies and the latter is also significantly difficult to implement. Only treating yield as special when used together with return avoids the ambiguity without creating any inconsistencies and is easy to implement.