Search code examples
c#performanceforeachsum

Why does List.Sum() perform poorly as compared to a foreach?


Question: Why does the execution time of Sum() take much longer than a foreach() in the following scenario?

public void TestMethod4()
{
    List<int> numbers = new List<int>();
    for (int i = 0; i < 1000000000; i++)
    {
        numbers.Add(i);
    }

    Stopwatch sw = Stopwatch.StartNew();
    long totalCount = numbers.Sum(num => true ? 1 : 0); // simulating a dummy true condition
    sw.Stop();
    Console.WriteLine("Time taken Sum() : {0}ms", sw.Elapsed.TotalMilliseconds);

    sw = Stopwatch.StartNew();
    totalCount = 0;
    foreach (var num in numbers)
    {
        totalCount += true ? 1 : 0; // simulating a dummy true condition
    }
    sw.Stop();
    Console.WriteLine("Time taken foreach() : {0}ms", sw.Elapsed.TotalMilliseconds);
}

Sample run1

Time taken Sum()     : 21443.8093ms
Time taken foreach() : 4251.9795ms

Solution

  • TL;DR: The difference in times is caused by the CLR applying two separate optimizations in the second case, but not the first case:

    • Linq's Sum operates on IEnumerable<T>, not List<T>.
      • The CLR/JIT does have a special-case optimization for foreach with List<T>, but not if a List<T> is passed as IEnumerable<T>.
        • Which means it's using IEnumerator<T> and incurring the cost of all of virtual-calls associated with that.
        • Whereas using List<T> directly uses static calls (instance method calls are still "static" calls, provided they're not virtual).
    • Linq's Sum accepts a delegate Func<T,Int64>.
      • Functions passed as a delegate Func<T,Int64> are not inlined, even with MethodImplOptions.AggressiveInline.
      • The cost of a delegate invocation are slightly more expensive than virtual calls.

    I've reimplemented your SUM program using a variety of different approaches which you access here: https://gist.github.com/daiplusplus/1a4fcd2e70374d3694c3a105061a6d1c

    My benchmark results (Release build, x64, .NET Core 5, i7-7700HQ):

    Approach Time (ms)
    Test_Sum_Delegate 118ms
    Test_MySum_DirectFunc_IEnum 112ms
    Test_MySum_IndirectFunc_IEnum 114ms
    Test_MySum_DirectCall_IEnum 89ms
    Test_MySum_DirectFunc_List 58ms
    Test_MySum_IndirectFunc_List 58ms
    Test_MySum_DirectCall_List 37ms
    Test_Sum_DelegateLambda 109ms
    Test_For_Inline 4ms
    Test_For_Delegate 3ms
    Test_ForUnrolled_Inline 4ms
    Test_ForUnrolled_Delegate 4ms
    Test_ForEach_Inline 38ms
    Test_ForEach_Delegate 37ms

    We can isolate the different behaviour by changing one thing at a time (e.g. foreach vs for, IEnumerable<T> vs List<T>, Func<T> vs direct function calls).

    The System.Linq.Enumerable.Sum approach (Test_Sum_Delegate) is identical to Test_MySum_IndirectFunc_IEnum (disregard the 4ms difference between them). Both of these approaches iterate over the List<T> using IEnumerable<T>.

    Changing the method to pass the List<T> as a List<T> instead of IEnumerable<T> (in Test_MySum_IndirectFunc_List) eliminates the virtual-calls from foreach using IEnumerator<T> which causes a reduction from ~114ms to 58ms, a 50% reduction in time already.

    Then changing the Func<Int64,Int64> (a delegate) call to the GetValue func to a "static" call to GetValue (as in Test_MySum_DirectCall_List) brings the time down to 37ms - which is the same as Test_ForEach_Delegate. This approach is the same as your hand-written foreach loop.

    The only way to get faster performance is with a for loop without any virtual calls. (In debug builds the Unrolled loop is even faster than the normal for loop, but in Release builds there's no observed difference).