Search code examples
c#performance.net-7.0

Performance issue with for loop on the initial run on .NET 7


I'm working on a performance sensitive application and considering moving from .NET 6 to .NET 7.

During comparing these two versions I've found that .NET 7 is slower executing a for loop on the initial run.

Testing is done with two separate console applications with identical code, one on .NET 6 and the other on .NET 7, running in release mode, any CPU.

Test code:

using System.Diagnostics;

int size = 1000000;
Stopwatch sw = new();

//create array
float[] arr = new float[size];
for (int i = 0; i < size; i++)
    arr[i] = i;

Console.WriteLine(AppDomain.CurrentDomain.SetupInformation.TargetFrameworkName);

Console.WriteLine($"\nForLoop1");
ForLoop1();
ForLoop1();
ForLoop1();
ForLoop1();
ForLoop1();

Console.WriteLine($"\nForLoopArray");
ForLoopArray();
ForLoopArray();
ForLoopArray();
ForLoopArray();
ForLoopArray();

Console.WriteLine($"\nForLoop2");
ForLoop2();
ForLoop2();
ForLoop2();
ForLoop2();
ForLoop2();

void ForLoop1()
{
    sw.Restart();

    int sum = 0;
    for (int i = 0; i < size; i++)
        sum++;

    sw.Stop();
    Console.WriteLine($"{sw.ElapsedTicks} ticks ({sum})");
}

void ForLoopArray()
{
    sw.Restart();

    float sum = 0f;
    for (int i = 0; i < size; i++)
        sum += arr[i];

    sw.Stop();
    Console.WriteLine($"{sw.ElapsedTicks} ticks ({sum})");
}

void ForLoop2()
{
    sw.Restart();

    int sum = 0;
    for (int i = 0; i < size; i++)
        sum++;

    sw.Stop();
    Console.WriteLine($"{sw.ElapsedTicks} ticks ({sum})");
}

The console output for the .NET 6 version:

.NETCoreApp,Version=v6.0

ForLoop1
2989 ticks (1000000)
2846 ticks (1000000)
2851 ticks (1000000)
3180 ticks (1000000)
2841 ticks (1000000)

ForLoopArray
8270 ticks (4.9994036E+11)
8443 ticks (4.9994036E+11)
8354 ticks (4.9994036E+11)
8952 ticks (4.9994036E+11)
8458 ticks (4.9994036E+11)

ForLoop2
2842 ticks (1000000)
2844 ticks (1000000)
3117 ticks (1000000)
2835 ticks (1000000)
2992 ticks (1000000)

And the .NET 7 version:

.NETCoreApp,Version=v7.0

ForLoop1
19658 ticks (1000000)
2921 ticks (1000000)
2967 ticks (1000000)
3190 ticks (1000000)
3722 ticks (1000000)

ForLoopArray
20041 ticks (4.9994036E+11)
8342 ticks (4.9994036E+11)
9212 ticks (4.9994036E+11)
8501 ticks (4.9994036E+11)
9726 ticks (4.9994036E+11)

ForLoop2
14016 ticks (1000000)
3008 ticks (1000000)
2885 ticks (1000000)
2882 ticks (1000000)
2888 ticks (1000000)

As you can see, the .NET 6 timings are very similar, whereas the .NET 7 timings show an initial high value (19658, 20041 and 14016).

Fiddling with the environment variables DOTNET_ReadyToRun and DOTNET_TieredPGO just makes things worse.

Why is this and how can it be rectified?


Solution

  • My guess would be that this can be connected to the new On-Stack Replacement feature introduced in .NET 7. Enabling DOTNET_JitDisasmSummary "on my machine" (Windows Powershell - $env:DOTNET_JitDisasmSummary=1) results in the following output:

    ForLoop1
       9: JIT compiled Program:<<Main>$>g__ForLoop1|0_0(byref) [Tier0, IL size=118, code size=291]
      10: JIT compiled Program:<<Main>$>g__ForLoop1|0_0(byref) [Tier1-OSR @0x19, IL size=118, code size=571]
    13420 ticks (1000000)
    2431 ticks (1000000)
    ...
    
    ForLoopArray
      11: JIT compiled Program:<<Main>$>g__ForLoopArray|0_1(byref) [Tier0, IL size=129, code size=339]
      12: JIT compiled Program:<<Main>$>g__ForLoopArray|0_1(byref) [Tier1-OSR @0x24, IL size=129, code size=609]
      13: JIT compiled System.SpanHelpers:SequenceCompareTo(byref,int,byref,int) [Tier1, IL size=632, code size=329]
    19380 ticks (4.9994036E+11)
    10694 ticks (4.9994036E+11)
    ...
    
    ForLoop2
      14: JIT compiled Program:<<Main>$>g__ForLoop2|0_2(byref) [Tier0, IL size=118, code size=291]
      15: JIT compiled Program:<<Main>$>g__ForLoop2|0_2(byref) [Tier1-OSR @0x19, IL size=118, code size=549]
    11720 ticks (1000000)
    2549 ticks (1000000)
    ...
    

    Setting DOTNET_TC_QuickJitForLoops to 0 (env:DOTNET_TC_QuickJitForLoops=0) "reverts" this behaviour (not sure why, because the docs state that default is false, maybe something was changed in .NET 7):

    ForLoop1
       8: JIT compiled Program:<<Main>$>g__ForLoop1|0_0(byref) [Tier-0 switched to FullOpts, IL size=118, code size=577]
    2590 ticks (1000000)
    2535 ticks (1000000)
    ...
    
    ForLoopArray
       9: JIT compiled Program:<<Main>$>g__ForLoopArray|0_1(byref) [Tier-0 switched to FullOpts, IL size=129, code size=618]
      10: JIT compiled System.SpanHelpers:SequenceCompareTo(byref,int,byref,int) [Tier1, IL size=632, code size=329]
    10759 ticks (4.9994036E+11)
    10816 ticks (4.9994036E+11)
    ...
    
    ForLoop2
      11: JIT compiled Program:<<Main>$>g__ForLoop2|0_2(byref) [Tier-0 switched to FullOpts, IL size=118, code size=555]
    2446 ticks (1000000)
    2509 ticks (1000000)
    ...
    

    Possibly related discussion on github

    P.S.

    If your code is performance-sensitive especially startup performance-sensitive possibly it is worth considering to look into Native AOT.