Search code examples
c#.netperformancecil

Reuse of for loop iteration variable


I have seen a lot of questions about whether to declare variables inside or outside a for loop scope. This is discussed at length, for example here, here, and here. The answer is that there is absolutely no performance difference (same IL), but for clarity, declaring variables in the tightest scope is preferred.

I was curious about a slightly different situation:

int i;

for (i = 0; i < 10; i++) {
    Console.WriteLine(i);
}

for (i = 0; i < 10; i++) {
    Console.WriteLine(i);
}

versus

for (int i = 0; i < 10; i++) {
    Console.WriteLine(i);
}

for (int i = 0; i < 10; i++) {
    Console.WriteLine(i);
}

I expected both methods to compile to the same IL in Release mode. However, this is not the case. I'll spare you the full IL and just point out the difference. The first method has one local:

.locals init (
    [0] int32 i
)

while the second has just two locals, one for each for loop counter:

.locals init (
    [0] int32 i,
    [1] int32 i
)

So there is a difference between these two which is not optimized away, which is surprising to me.

Why am I seeing this, and is there actually a performance difference between the two methods?


Solution

  • To answer your question, you've actually declared one local variable in the first case, and two in the second. The C# compiler apparently does not reuse the local variables even though I think it would be permitted to do so. My guess is that this is just not a performance gain that is worth writing a complex analysis to handle and might not even be useful if the JIT is smart enough to handle it anyway. However, the optimization you are expecting to see is done, just not at the IL level. It is done by the JIT compiler in the emitted machine code.

    This is a simple enough case where inspecting the emitted machine code is actually informative. The summary is that these two methods will JIT compile to the same machine code (x86 shown below, but x64 machine code is the same as well) and thus there is no performance gain from using fewer local variables.

    A quick note on conditions, I took both of these fragments and put them into different methods. Then I looked at the disassembly in Visual Studio 2015, with a .NET 4.6.1 runtime, x86 Release build (i.e. optimizations on) and attaching the debugger after the JIT has compiled the methods (at least on invocation without the debugger attached). I disabled method inlining to keep things consistent between both methods. To view the disassembly, place a break point in the desired method, attach, go to Debug > Windows > Disassembly. Hit F5 to run to the break point.

    Without further ado, the first method disassembles to

                for (i = 0; i < 10; i++)
    010204A2  in          al,dx  
    010204A3  push        esi  
    010204A4  xor         esi,esi  
                {
                    Console.WriteLine(i);
    010204A6  mov         ecx,esi  
    010204A8  call        71686C0C  
                for (i = 0; i < 10; i++)
    010204AD  inc         esi  
    010204AE  cmp         esi,0Ah  
    010204B1  jl          010204A6  
                }
    
                for (i = 0; i < 10; i++)
    010204B3  xor         esi,esi  
                {
                    Console.WriteLine(i);
    010204B5  mov         ecx,esi  
    010204B7  call        71686C0C  
                for (i = 0; i < 10; i++)
    010204BC  inc         esi  
    010204BD  cmp         esi,0Ah  
    010204C0  jl          010204B5  
    010204C2  pop         esi  
    010204C3  pop         ebp  
    010204C4  ret  
    

    The second method disassembles to

                for (int i = 0; i < 10; i++)
    010204DA  in          al,dx  
    010204DB  push        esi  
    010204DC  xor         esi,esi  
                {
                    Console.WriteLine(i);
    010204DE  mov         ecx,esi  
    010204E0  call        71686C0C  
                for (int i = 0; i < 10; i++)
    010204E5  inc         esi  
    010204E6  cmp         esi,0Ah  
    010204E9  jl          010204DE  
                }
    
                for (int i = 0; i < 10; i++)
    010204EB  xor         esi,esi  
                {
                    Console.WriteLine(i);
    010204ED  mov         ecx,esi  
    010204EF  call        71686C0C  
                for (int i = 0; i < 10; i++)
    010204F4  inc         esi  
    010204F5  cmp         esi,0Ah  
    010204F8  jl          010204ED  
    010204FA  pop         esi  
    010204FB  pop         ebp  
    010204FC  ret  
    

    As you can see, aside from different offsets for the appropriate jumps, the code is identical.

    These methods are quite simple so the work of keeping track of the loop counter is done with the esi register.

    It is left as an exercise for the reader to verify in x64.