c#performance dictionary .net-8.0 concurrentdictionary

Why is ConcurrentDictionary always faster than normal Dictionary?

I was wondering if I should use a List or Dictionary knowing that my structure would have very few items. So I did a simple benchmark 📊 to test for 1, 3, 10, 20 and 50 what was fastest data structure to find the element.

To my surprise, the ConcurrentDictionary was FASTER than the normal Dictionary. I had the feeling that it would be slower because of locks to allow multi-thread access.

Is there a problem with my benchmark?
Is there a logical reason?
Should I always use the ConcurrentDictionary

BenchmarkDotNet=v0.13.5, OS=Windows 10 (10.0.19045.3803/22H2/2022Update)
AMD Ryzen 9 5900X, 1 CPU, 24 logical and 12 physical cores
.NET SDK=8.0.100
  [Host]     : .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2

Method	N	Mean	Error	StdDev	Allocated
ListLoopSearch	1	1.416 ns	0.0153 ns	0.0143 ns	-
DictionarySearch	1	4.253 ns	0.1054 ns	0.1128 ns	-
ConcurrentDictionarySearch	1	2.891 ns	0.0144 ns	0.0127 ns	-
ListLoopSearch	3	6.237 ns	0.0281 ns	0.0234 ns	-
DictionarySearch	3	10.849 ns	0.1968 ns	0.1745 ns	-
ConcurrentDictionarySearch	3	6.225 ns	0.0760 ns	0.0711 ns	-
ListLoopSearch	10	36.174 ns	0.5519 ns	0.5162 ns	-
DictionarySearch	10	33.303 ns	0.6607 ns	0.7608 ns	-
ConcurrentDictionarySearch	10	18.787 ns	0.1348 ns	0.1261 ns	-
ListLoopSearch	20	118.554 ns	1.5678 ns	1.4665 ns	-
DictionarySearch	20	67.028 ns	0.4722 ns	0.3943 ns	-
ConcurrentDictionarySearch	20	36.108 ns	0.6998 ns	0.6546 ns	-
ListLoopSearch	50	979.931 ns	6.9250 ns	5.4066 ns	-
DictionarySearch	50	161.658 ns	0.5295 ns	0.4953 ns	-
ConcurrentDictionarySearch	50	87.737 ns	1.2800 ns	1.0689 ns	-

Here is the full code for the benchmark.

using BenchmarkDotNet.Attributes;
using System.Collections.Concurrent;

public class Item {
    public int Id { get; set; }
    public int Value { get; set; }
}

public class SearchBenchmark {
    [Params(1, 3, 10, 20, 50)] // Number of elements in the structure
    public int NumberOfElements;
    private List<Item> _itemList;
    private Dictionary<int, Item> _itemDictionary;
    private ConcurrentDictionary<int, Item> _itemConcurrentDictionary;
    private List<int> _searchIds; // List of IDs to search

    [GlobalSetup]
    public void Setup() {
        _itemList = Enumerable.Range(1, NumberOfElements).Select(i => new Item { Id = i, Value = i }).ToList();
        _itemDictionary = _itemList.ToDictionary(item => item.Id, item => item);
        _itemConcurrentDictionary = new ConcurrentDictionary<int, Item>(_itemList.Select(item => new KeyValuePair<int, Item>(item.Id, item)));

        // Prepare a list of IDs to search for, covering all elements
        _searchIds = Enumerable.Range(1, NumberOfElements).ToList();
    }

    [Benchmark]
    public void ListLoopSearch() {
        foreach (var id in _searchIds) {
            foreach (var item in _itemList) {
                if (item.Id == id)
                    break; // Found the item, move to the next ID
            }
        }
    }

    [Benchmark]
    public void DictionarySearch() {
        foreach (var id in _searchIds) {
            _itemDictionary.TryGetValue(id, out var result);
        }
    }
    [Benchmark]
    public void ConcurrentDictionarySearch() {
        foreach (var id in _searchIds) {
            _itemConcurrentDictionary.TryGetValue(id, out var result);
        }
    }
}

@jonasH asked if the fact the elements were in consecutive order in the structure and search could be related. So I've modified the code to randomized the elements in the structures and the search. Also added a case with 1000 items. I've get pretty much the same result.

Result with randomized

Method	NumberOfElements	Mean	Error	StdDev
ListLoopSearch	1	1.784 ns	0.0163 ns	0.0153 ns
DictionarySearch	1	4.178 ns	0.0557 ns	0.0465 ns
ConcurrentDictionarySearch	1	3.361 ns	0.0110 ns	0.0102 ns
ListLoopSearch	10	42.138 ns	0.1538 ns	0.1363 ns
DictionarySearch	10	33.073 ns	0.0994 ns	0.0929 ns
ConcurrentDictionarySearch	10	18.996 ns	0.1257 ns	0.1176 ns
ListLoopSearch	50	1,007.447 ns	4.6698 ns	4.1397 ns
DictionarySearch	50	161.110 ns	0.5328 ns	0.4984 ns
ConcurrentDictionarySearch	50	87.332 ns	1.4004 ns	1.3100 ns
ListLoopSearch	1000	324,126.481 ns	1,032.4962 ns	965.7976 ns
DictionarySearch	1000	3,233.511 ns	11.0090 ns	10.2978 ns
ConcurrentDictionarySearch	1000	1,825.452 ns	19.3568 ns	18.1064 ns

Solution

I had the feeling that it would be slower because of locks to allow multi-thread access.

As written in the ConcurrentDictionary<TKey,TValue>:

For modifications and write operations to the dictionary, ConcurrentDictionary<TKey,TValue> uses fine-grained locking to ensure thread safety. (Read operations on the dictionary are performed in a lock-free manner)

So since you are testing read-only operation locking should not have effect (thought note that there are so called lock-free approaches to synchronization which can induce some performance cost if used).

As for performance - after quick scanning through the current implemntation I was not able to find something which could clearly explain the difference, at least at the first glance. Dictionary.TryGetValue has an shared method call which maybe not inlined which can have some minor effect, also it uses slightly different approach to handle collisions buckets - it uses 2 arrays (one for array containing all dictionary records and another to store index of first element in the bucket, if I got everything right, see more here) while concurrent version uses linked list for every bucket (see more here).

Is there a problem with my benchmark?

Probably. It is quite synthetic, tests only one concrete key type with one fill scenario. Different key types and/or different collision percentage can (in theory) produce different results.

Should I always use the ConcurrentDictionary

By default - no. I would argue that following the principle of least surprise when developing code will outweigh potential performance benefits (if any) in most scenarios. If you don't expect dictionary usage to be concurrent - do not use concurrent dictionary. And do not forget about dangers of premature optimization.

If you consider a path involving the dictionary usage to be a "critical" one then first of all you need to verify that Dictionary.TryGet is actually that influential in it and then thoroughly test your actual scenario (with appropriate data and usage patterns especially including modification operations if any, and also actual hardware) and setup infrastructure to continually benchmark the corresponding path. Which can be unjustified from resources/costs point of view.

Also if you need readonly dictionary - you should consider FrozenDictionary<TKey,TValue>:

Provides an immutable, read-only dictionary optimized for fast lookup and enumeration.

Used @ThatGuy's benchmark as base to compare with FrozenDictionary. "On my machine" it produces the following results (for 50 items dictionary):

Method	Mean	Error	StdDev
DictionarySearch	355.7 ns	2.54 ns	2.12 ns
FrozenDictionarySearch	272.3 ns	1.82 ns	1.42 ns
ConcurrentDictionarySearch	206.9 ns	2.60 ns	2.43 ns

Note that with string keys in my benchmarks FrozenDictionary showed better performance than both alternatives.

P.S.

Submitted an issue @github for more knowledgeable people to reply.