Question: Why does the execution time of Sum()
take much longer than a foreach()
in the following scenario?
public void TestMethod4()
{
List<int> numbers = new List<int>();
for (int i = 0; i < 1000000000; i++)
{
numbers.Add(i);
}
Stopwatch sw = Stopwatch.StartNew();
long totalCount = numbers.Sum(num => true ? 1 : 0); // simulating a dummy true condition
sw.Stop();
Console.WriteLine("Time taken Sum() : {0}ms", sw.Elapsed.TotalMilliseconds);
sw = Stopwatch.StartNew();
totalCount = 0;
foreach (var num in numbers)
{
totalCount += true ? 1 : 0; // simulating a dummy true condition
}
sw.Stop();
Console.WriteLine("Time taken foreach() : {0}ms", sw.Elapsed.TotalMilliseconds);
}
Sample run1
Time taken Sum() : 21443.8093ms
Time taken foreach() : 4251.9795ms
TL;DR: The difference in times is caused by the CLR applying two separate optimizations in the second case, but not the first case:
Sum
operates on IEnumerable<T>
, not List<T>
.
foreach
with List<T>
, but not if a List<T>
is passed as IEnumerable<T>
.
IEnumerator<T>
and incurring the cost of all of virtual-calls associated with that.List<T>
directly uses static calls (instance method calls are still "static" calls, provided they're not virtual).Sum
accepts a delegate Func<T,Int64>
.
Func<T,Int64>
are not inlined, even with MethodImplOptions.AggressiveInline
.I've reimplemented your SUM
program using a variety of different approaches which you access here: https://gist.github.com/daiplusplus/1a4fcd2e70374d3694c3a105061a6d1c
My benchmark results (Release build, x64, .NET Core 5, i7-7700HQ):
Approach | Time (ms) |
---|---|
Test_Sum_Delegate | 118ms |
Test_MySum_DirectFunc_IEnum | 112ms |
Test_MySum_IndirectFunc_IEnum | 114ms |
Test_MySum_DirectCall_IEnum | 89ms |
Test_MySum_DirectFunc_List | 58ms |
Test_MySum_IndirectFunc_List | 58ms |
Test_MySum_DirectCall_List | 37ms |
Test_Sum_DelegateLambda | 109ms |
Test_For_Inline | 4ms |
Test_For_Delegate | 3ms |
Test_ForUnrolled_Inline | 4ms |
Test_ForUnrolled_Delegate | 4ms |
Test_ForEach_Inline | 38ms |
Test_ForEach_Delegate | 37ms |
We can isolate the different behaviour by changing one thing at a time (e.g. foreach
vs for
, IEnumerable<T>
vs List<T>
, Func<T>
vs direct function calls).
The System.Linq.Enumerable.Sum
approach (Test_Sum_Delegate
) is identical to Test_MySum_IndirectFunc_IEnum
(disregard the 4ms
difference between them). Both of these approaches iterate over the List<T>
using IEnumerable<T>
.
Changing the method to pass the List<T>
as a List<T>
instead of IEnumerable<T>
(in Test_MySum_IndirectFunc_List
) eliminates the virtual-calls from foreach
using IEnumerator<T>
which causes a reduction from ~114ms
to 58ms
, a 50% reduction in time already.
Then changing the Func<Int64,Int64>
(a delegate) call to the GetValue
func to a "static" call to GetValue
(as in Test_MySum_DirectCall_List
) brings the time down to 37ms
- which is the same as Test_ForEach_Delegate
. This approach is the same as your hand-written foreach
loop.
The only way to get faster performance is with a for
loop without any virtual calls. (In debug builds the Unrolled loop is even faster than the normal for
loop, but in Release builds there's no observed difference).