I need a data structure like Dictionary<string,T>
where I could do both case sensitive and insensitive searchs.
I am looking to improve the O(n) time that I can get with a List<Tuple<string,T>>
by iterating with foreach with a case sensitive or insensitive StringComparer
.
This is for a library where I want the end user to select case sensitivity on the Search method call. (otherwise I could create a different Dictionary with sensitivity on/off in the class constructor)
Any ideas?
I recently ran across this question since I needed to solve for the same scenario, with the following guidelines:
O(n)
for case-sensitive path)..ToLower()
) when performing lookups.I didn't feel that any of the current answers met the above and, as a result, I came up with the following dictionary definition:
Dictionary<string, (string Key, T Value)> _dict =
new Dictionary<string, (string Key, T Value)>(StringComparer.OrdinalIgnoreCase);
This allows for case-insensitive hashing of the key, while using a ValueTuple to store the actual key in its raw string form for additional case-sensitive comparisons, if need be, alongside the value. The full implementation looks like the following:
public class MultiCaseDictionary<T>
{
public MultiCaseDictionary() : this(0) { }
public MultiCaseDictionary(int capacity)
{
_dict = new(capacity, StringComparer.OrdinalIgnoreCase);
}
private readonly Dictionary<string, (string Key, T Value)> _dict;
public bool TryGetValue(string key, out T value, bool isCaseInsensitive = true)
{
if (_dict.TryGetValue(key, out (string Key, T Value) match))
{
if (isCaseInsensitive)
{
value = match.Value;
return true;
}
if (key.Equals(match.Key))
{
value = match.Value;
return true;
}
}
value = default;
return false;
}
public void Add(string key, T value)
{
_dict[key] = (key, value);
}
}
This structure results in excellent performance for both the case sensitive and insensitive paths, and the case-sensitive path is slightly slower than the case-sensitive (optimal) path in the accepted answer due to needing to perform extra work.
However, it is orders of magnitude faster for the case-insensitive path, due to the accepted answer having a runtime complexity of O(n)
. In addition, the case-insensitive search of MultiCaseDictionary
is actually faster than the case-sensitive search of CaseSensitiveDictionary
.
Finally, the Add path is slower in the MultiCaseDictionary
due to needing to construct a ValueTuple
. The following benchmark demonstrates both implementations, running on .NET 6:
Implementations
public class MultiCaseDictionary<T>
{
public MultiCaseDictionary() : this(0) { }
public MultiCaseDictionary(int capacity)
{
_dict = new(capacity, StringComparer.OrdinalIgnoreCase);
}
private readonly Dictionary<string, (string Key, T Value)> _dict;
public bool TryGetValue(string key, out T value, bool isCaseInsensitive = true)
{
if (_dict.TryGetValue(key, out (string Key, T Value) match))
{
if (isCaseInsensitive)
{
value = match.Value;
return true;
}
if (key.Equals(match.Key))
{
value = match.Value;
return true;
}
}
value = default;
return false;
}
public void Add(string key, T value)
{
_dict[key] = (key, value);
}
}
public class CaseSensitiveDictionary<T>
{
public CaseSensitiveDictionary() : this(0) { }
public CaseSensitiveDictionary(int capacity)
{
_dict = new(capacity);
}
private readonly Dictionary<string, T> _dict;
public bool TryGetValue(string key, out T value, bool isCaseInsensitive = true)
{
if (!isCaseInsensitive)
{
if (_dict.TryGetValue(key, out value))
{
return true;
}
}
else
{
key = _dict.Keys.FirstOrDefault(k =>
string.Compare(key, k, StringComparison.OrdinalIgnoreCase) == 0);
if (key == null)
{
value = default;
return false;
}
value = _dict[key];
return true;
}
value = default;
return false;
}
public void Add(string key, T value)
{
_dict[key] = value;
}
}
Benchmarks
The benchmarks preload 1000 entries for TryGetValue tests to demonstrate O(n) behavior. In addition, they insert 10 entries for Add tests to check space requirements and add performance degradation.
[SimpleJob(RuntimeMoniker.Net60, invocationCount: 50000)]
[MemoryDiagnoser]
[GroupBenchmarksBy(BenchmarkLogicalGroupRule.ByCategory)]
[CategoriesColumn]
public class CaseSensitiveDictionaryBenchmarks
{
// use longer key to simulate expected hash costs
private const string Key = "100000500";
private const int Capacity = 1000;
private const int Iterations = 9;
private MultiCaseDictionary<int> _multiCaseDictionary;
private CaseSensitiveDictionary<int> _caseSensitiveDictionary;
[IterationSetup(Targets = new string[] {
nameof(MultiCaseDictionary_Search),
nameof(MultiCaseDictionary_InsensitiveSearch)
})]
public void SetupMultiCase()
{
_multiCaseDictionary = new MultiCaseDictionary<int>(Capacity);
for (int i = 100000000; i < 100001000; i++)
{
_multiCaseDictionary.Add(i.ToString(), i);
}
}
[IterationSetup(Targets = new string[] {
nameof(CaseSensitiveDictionary_Search),
nameof(CaseSensitiveDictionary_InsensitiveSearch),
})]
public void SetupCaseSensitive()
{
_caseSensitiveDictionary = new CaseSensitiveDictionary<int>(Capacity);
for (int i = 100000000; i < 100001000; i++)
{
_caseSensitiveDictionary.Add(i.ToString(), i);
}
}
[Benchmark(Baseline = true), BenchmarkCategory("TryGetValue")]
public int CaseSensitiveDictionary_Search()
{
int result;
for (int i = 0; i < Iterations; i++)
{
_caseSensitiveDictionary.TryGetValue(Key, out result, false);
}
_caseSensitiveDictionary.TryGetValue(Key, out result, false);
return result;
}
[Benchmark, BenchmarkCategory("TryGetValue")]
public int CaseSensitiveDictionary_InsensitiveSearch()
{
int result;
for (int i = 0; i < Iterations; i++)
{
_caseSensitiveDictionary.TryGetValue(Key, out result, true);
}
_caseSensitiveDictionary.TryGetValue(Key, out result, true);
return result;
}
[Benchmark, BenchmarkCategory("TryGetValue")]
public int MultiCaseDictionary_Search()
{
int result;
for (int i = 0; i < Iterations; i++)
{
_multiCaseDictionary.TryGetValue(Key, out result, false);
}
_multiCaseDictionary.TryGetValue(Key, out result, false);
return result;
}
[Benchmark, BenchmarkCategory("TryGetValue")]
public int MultiCaseDictionary_InsensitiveSearch()
{
int result;
for (int i = 0; i < Iterations; i++)
{
_multiCaseDictionary.TryGetValue(Key, out result, true);
}
_multiCaseDictionary.TryGetValue(Key, out result, true);
return result;
}
[Benchmark(Baseline = true), BenchmarkCategory("Add")]
public CaseSensitiveDictionary<int> CaseSensitiveDictonary_Add()
{
_caseSensitiveDictionary = new CaseSensitiveDictionary<int>(Capacity);
for (int i = 100000000; i < 100000010; i++)
{
_caseSensitiveDictionary.Add(i.ToString(), i);
}
return _caseSensitiveDictionary;
}
[Benchmark, BenchmarkCategory("Add")]
public MultiCaseDictionary<int> MultiCaseDictionary_Add()
{
_multiCaseDictionary = new MultiCaseDictionary<int>(Capacity);
for (int i = 100000000; i < 100000010; i++)
{
_multiCaseDictionary.Add(i.ToString(), i);
}
return _multiCaseDictionary;
}
}
BenchmarkDotNet=v0.13.2, OS=macOS Catalina 10.15.7 (19H1417) [Darwin 19.6.0]
Intel Core i5-1038NG7 CPU 2.00GHz, 1 CPU, 8 logical and 4 physical cores
.NET SDK=6.0.400
[Host] : .NET 6.0.8 (6.0.822.36306), X64 RyuJIT AVX2
Job-NQOSHT : .NET 6.0.8 (6.0.822.36306), X64 RyuJIT AVX2
Runtime=.NET 6.0 InvocationCount=50000
Method | Categories | Mean | Error | StdDev | Median | Ratio | RatioSD | Gen0 | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|---|---|---|
CaseSensitiveDictonary_Add | Add | 2,423.0 ns | 24.67 ns | 20.60 ns | 2,418.3 ns | 1.00 | 0.00 | 10.0000 | 31440 B | 1.00 |
MultiCaseDictionary_Add | Add | 3,100.2 ns | 61.03 ns | 59.94 ns | 3,109.2 ns | 1.27 | 0.02 | 12.8200 | 40264 B | 1.28 |
CaseSensitiveDictionary_Search | TryGetValue | 251.8 ns | 6.47 ns | 18.67 ns | 246.1 ns | 1.00 | 0.00 | 0.0600 | 240 B | 1.00 |
CaseSensitiveDictionary_InsensitiveSearch | TryGetValue | 95,573.4 ns | 1,888.58 ns | 2,768.26 ns | 95,401.3 ns | 378.77 | 29.36 | 0.4000 | 1280 B | 5.33 |
MultiCaseDictionary_Search | TryGetValue | 265.5 ns | 5.11 ns | 10.43 ns | 262.2 ns | 1.05 | 0.07 | - | - | 0.00 |
MultiCaseDictionary_InsensitiveSearch | TryGetValue | 233.1 ns | 4.83 ns | 14.16 ns | 226.2 ns | 0.93 | 0.09 | - | - | 0.00 |
Memory consumption will be slightly higher with MultiCaseDictionary
, depending on the size of your string key, since both the hash of key and the raw key itself will need to be stored. The larger your key, the more memory will be used. For the current benchmark (smaller keys), the 28% additional memory is reasonable for my use case.