c#variable-assignment language-specifications

Is it ever possible for this loop to fail to run?

A question came up recently that was a learning experience for me. Something like the following was giving a "use of undefined" error:

int a;
for(int i = 0; i < 1; i++)
  a = 2;
a /= 2;

It's a contrived example and doesn't make sense but it gives the required error. I was aware that it's perfectly OK to use inner scopes to set variable values so long as the compiler can work out that all flows result in a definite assignment:

int a;
if(someboolean)
  a=2;
else
  a=4;

But I hadn't formerly realised that inner scoped blocks that are contingent on some variable value will error, even when there is no perceptible way the variable could be "wrong":

int a;
bool alwaysTrue = true;
if(alwaysTrue)
  a = 2;
a /= 2; //error

Resolving this with a compile time constant is fine:

int a;
if(true)
  a = 2;
a /= 2; //fine

I wondered if it might be because the compiler was removing the if entirely, but a more involved statement is also fine:

int a;
for(int i = 0; true; i++){
  a = 2;
  if(i >= 10)
    break;
}
a /= 2; //fine

Perhaps this is being inlined/optimised too, but the essence of my question is, for that first simple loop for(int i = 0; i < 1; i++) is there actually any conceivable way that the loop will NOT run and hence the "variable a may be unassigned" is a valid assertion, or is the static flow analysis just running on a simple "any conditionally controlled code block that sets variable a is automatically deemed to have a situation where it might not run and we short cut straight to showing an error on the subsequent use" rule?

Solution

is there actually any conceivable way that the loop will NOT run and hence the "variable a may be unassigned" is a valid assertion

In your example, assuming a is a local variable, the loop must run. Local variables cannot be modified except in the thread where they are instantiated. It's just that the compiler isn't required to determine that's the case, nor will it.

I will point out that your final example isn't a case of optimization. It works just like the while (true) case which you've already established allows the compiler to see the variable as definitely assigned.

In terms of "why", there are two ways to interpret that question. The easy way is "why does the compiler do this?" and the answer is "because the language specification says so".

Language specifications aren't always the easiest thing to read, and the rules of definite assignment are a particularly stark example of that statement, but you can find the answers to this first interpretation of "why" here: https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/language-specification/variables#precise-rules-for-determining-definite-assignment

You'll note that in general, the only way a loop control structure will lead to definite assignment is if the expression that controls the loop itself is participating in definite assignment. This hits the "Definitely assigned after true expression" and "Definitely assigned after false expression" sub-states scenario. You'll also note that this part of the specification doesn't apply to your examples.

So you're left with the main point of the definite assignment rules for loops (there are other qualifications, but none apply in the simple cases):

v has the same definite assignment state at the beginning of expr as at the beginning of stmt.

I.e. whatever v was before the loop, it's the same after. The loop itself is ignored.

So, if loops don't generally create definite assignment, why do loops controlled by literal values (i.e. "constant expressions") allow for definite assignment? This is because of a different part of the specification, referenced by the rules for definite assignment: https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/language-specification/statements#end-points-and-reachability

The flow analysis takes into account the values of constant expressions (Constant expressions) that control the behavior of statements, but the possible values of non-constant expressions are not considered.

The flow analysis is done to determine reachability for a statement or loop end point, but this becomes directly applicable for definite assignment:

The definite assignment state of v at the end point of a block, checked, unchecked, if, while, do, for, foreach, lock, using, or switch statement is determined by checking the definite assignment state of v on all control flow transfers that target the end point of that statement. If v is definitely assigned on all such control flow transfers, then v is definitely assigned at the end point of the statement. Otherwise; v is not definitely assigned at the end point of the statement. The set of possible control flow transfers is determined in the same way as for checking statement reachability [emphasis mine]

In other words, the compiler will apply the same analysis it uses for statement reachability when determining definite assignment. Hence, loops controlled by constant expressions get analyzed while those that are not, don't.

The harder way to interpret "why" is "why did the language authors write the specification this way?" That's where you start to get into opinion-based answers, unless you're actually talking to one of the language authors (who may in fact at some point post an answer, so…not remotely out of the realm of possibility :) ).

But, it seems to me that there are a couple of ways to address that question:

They probably wrote the specification that way because, as complicated as the definite assignment rules are now, they would have been even more complicated if the the compiler were required to do static flow analysis on variables, never mind how much more complicated actually writing the compiler would have been.
More theoretically, it comes down to the Halting Problem. I.e. as soon as you start asking the compiler to do non-trivial flow analysis, you open the door for someone to write some C# code the effectively makes the compiler determine whether the C# code can halt or not. Since that's impossible to do in all cases, it's probably a bad idea to include that requirement in the specification.

Dealing with constant expressions, which not only can but must be computed at compile-time is one thing. Making the compiler essentially run your program just to compile it, is a whole 'nother ball o' wax.