When optimizing a site, I tried to benchmark the code with Benchmark.Net. But I was surprised to find that some benchmarked code used 40,000 times more memory. After, too much, benchmarking I found that the memory allocation was because of a foreach over a SortedList<int, int>.
using BenchmarkDotNet.Attributes;
namespace NetCollectionsBenchmarks
{
[MemoryDiagnoser]
public class CollectionsBenchmarks
{
private Dictionary<int, int> DictionaryData = new();
private SortedList<int, int> SortedListData = new();
private Dictionary<int, int> DictionaryCheck = new();
private SortedList<int, int> SortedListCheck = new();
[GlobalSetup]
public void Setup()
{
for (int x = 0; x < 15; x++)
this.DictionaryData.Add(x, x);
this.SortedListData = new SortedList<int, int>(this.DictionaryData);
this.DictionaryCheck = new Dictionary<int, int>(this.DictionaryData);
this.SortedListCheck = new SortedList<int, int>(this.DictionaryData);
}
[Benchmark(Baseline = true)]
public long ForLoopDictionaryBenchmark()
{
var count = 0L;
var res = 0L;
for (int x = 0; x < 1_000_000; x++)
{
for (int i = 0; i < 15; i++)
{
if (this.DictionaryCheck.TryGetValue(x, out var value) || value < x)
res += value;
count++;
}
}
return res;
}
[Benchmark]
public long ForLoopSortedListBenchmark()
{
var res = 0L;
for (int x = 0; x < 1_000_000; x++)
{
for (int i = 0; i < 15; i++)
{
if (this.SortedListCheck.TryGetValue(x, out var value) || value < x)
res += value;
}
}
return res;
}
[Benchmark]
public long ForeachDictionaryBenchmark()
{
var res = 0L;
for (int x = 0; x < 1_000_000; x++)
{
foreach (var needle in this.DictionaryData)
{
if (this.DictionaryCheck.TryGetValue(needle.Key, out var value) || value < needle.Value)
res += value;
}
}
return res;
}
[Benchmark]
public long ForeachSortedListBenchmark()
{
var res = 0L;
for (int x = 0; x < 1_000_000; x++)
{
foreach (var needle in this.SortedListData)
{
if (this.SortedListCheck.TryGetValue(needle.Key, out var value) || value < needle.Value)
res += value;
}
}
return res;
}
[Benchmark]
public long ForeachNoTryGetValueDictionaryBenchmark()
{
var res = 0L;
for (int x = 0; x < 1_000_000; x++)
{
foreach (var needle in this.DictionaryData)
{
}
}
return res;
}
[Benchmark]
public long ForeachNoTryGetValueSortedListBenchmark()
{
var res = 0L;
for (int x = 0; x < 1_000_000; x++)
{
foreach (var needle in this.SortedListData)
{
}
}
return res;
}
}
}
The benchmark methods with foreach() over SortedList uses 40,000 times more memory, than the other methods, even when there is no TryGetValue() in the loop.
Why are SortedList so memory expensive when looping it's enumerator?
The benchmarks has been tested in .NET 6.0 and .NET 7.0, with the same result.
Since no one seemed to know the answer to this, I continued the investigation to try to find the reason.
As Hans Passant pointed out, the Enumerator in SortedList<> in 4.8 was a class. But I've had already looked at that and the Enumerator in .NET 6 has changed to struct, and has been a struct since at least 2016 according to git. But it is still allocated on the heap.
Since the SortedList<>
Enumerator
has been a struct since 2016, there could be no way that that code is not included in .NET 6 (or .NET 7). But to be sure I created my own copies of Dictionary<>
and SortedList<>
from the git repository. Still the same result.
When I was about to change the code in the classes to find what generated the difference, I found the diverging code. It was in GetEnumerator()
in the two classes.
GetEnumerator()
in SortedList<>
:
public IEnumerator<KeyValuePair<TKey, TValue>> GetEnumerator()
GetEnumerator()
in Dictionary<>
:
public Enumerator GetEnumerator()
The return type IEnumerator<KeyValuePair<TKey, TValue>>
makes the enumerator to be allocated on the heap, because of interface boxing. Changing the return type to Enumerator
removed all the extra allocation reported by Benchmark.NET.
And for the second note from Hans Passant, about it being cheap gen#0 memory. That might be so, but in the benchmarking I have done shows the current implementation of GetEnumerator()
takes twice as long time as the one with return type to Enumerator
.
And the benchmarking I've done is quite close to the production code I'm currently running.