Search code examples
c#data-structurescase-sensitivecase-insensitive

O(1) Dictionary both case sensitive and insensitive


I need a data structure like Dictionary<string,T> where I could do both case sensitive and insensitive searchs.

I am looking to improve the O(n) time that I can get with a List<Tuple<string,T>> by iterating with foreach with a case sensitive or insensitive StringComparer.

This is for a library where I want the end user to select case sensitivity on the Search method call. (otherwise I could create a different Dictionary with sensitivity on/off in the class constructor)

Any ideas?


Solution

  • I recently ran across this question since I needed to solve for the same scenario, with the following guidelines:

    • Both types of lookups should be fast (avoid O(n) for case-sensitive path).
    • Use a single dictionary rather than two separate dictionaries (avoid double the space requirement).
    • Avoid needless allocations via string conversion (.ToLower()) when performing lookups.
    • Trade a reasonable amount of additional needed memory for satisfying the above.

    I didn't feel that any of the current answers met the above and, as a result, I came up with the following dictionary definition:

    Dictionary<string, (string Key, T Value)> _dict = 
        new Dictionary<string, (string Key, T Value)>(StringComparer.OrdinalIgnoreCase);
    

    This allows for case-insensitive hashing of the key, while using a ValueTuple to store the actual key in its raw string form for additional case-sensitive comparisons, if need be, alongside the value. The full implementation looks like the following:

    public class MultiCaseDictionary<T>
    {
        public MultiCaseDictionary() : this(0) { }
    
        public MultiCaseDictionary(int capacity)
        {
            _dict = new(capacity, StringComparer.OrdinalIgnoreCase);
        }
    
        private readonly Dictionary<string, (string Key, T Value)> _dict;
    
        public bool TryGetValue(string key, out T value, bool isCaseInsensitive = true)
        {
            if (_dict.TryGetValue(key, out (string Key, T Value) match))
            {
                if (isCaseInsensitive)
                {
                    value = match.Value;
                    return true;
                }
    
                if (key.Equals(match.Key))
                {
                    value = match.Value;
                    return true;
                }
            }
    
            value = default;
            return false;
        }
    
        public void Add(string key, T value)
        {
            _dict[key] = (key, value);
        }
    }
    

    This structure results in excellent performance for both the case sensitive and insensitive paths, and the case-sensitive path is slightly slower than the case-sensitive (optimal) path in the accepted answer due to needing to perform extra work.

    However, it is orders of magnitude faster for the case-insensitive path, due to the accepted answer having a runtime complexity of O(n). In addition, the case-insensitive search of MultiCaseDictionary is actually faster than the case-sensitive search of CaseSensitiveDictionary.

    Finally, the Add path is slower in the MultiCaseDictionary due to needing to construct a ValueTuple. The following benchmark demonstrates both implementations, running on .NET 6:

    Implementations

    public class MultiCaseDictionary<T>
    {
        public MultiCaseDictionary() : this(0) { }
    
        public MultiCaseDictionary(int capacity)
        {
            _dict = new(capacity, StringComparer.OrdinalIgnoreCase);
        }
    
        private readonly Dictionary<string, (string Key, T Value)> _dict;
    
        public bool TryGetValue(string key, out T value, bool isCaseInsensitive = true)
        {
            if (_dict.TryGetValue(key, out (string Key, T Value) match))
            {
                if (isCaseInsensitive)
                {
                    value = match.Value;
                    return true;
                }
    
                if (key.Equals(match.Key))
                {
                    value = match.Value;
                    return true;
                }
            }
    
            value = default;
            return false;
        }
    
        public void Add(string key, T value)
        {
            _dict[key] = (key, value);
        }
    }
    
    public class CaseSensitiveDictionary<T>
    {
        public CaseSensitiveDictionary() : this(0) { }
    
        public CaseSensitiveDictionary(int capacity)
        {
            _dict = new(capacity);
        }
    
        private readonly Dictionary<string, T> _dict;
    
        public bool TryGetValue(string key, out T value, bool isCaseInsensitive = true)
        {
            if (!isCaseInsensitive)
            {
                if (_dict.TryGetValue(key, out value))
                {
                    return true;
                }
            }
            else
            {
                key = _dict.Keys.FirstOrDefault(k =>
                    string.Compare(key, k, StringComparison.OrdinalIgnoreCase) == 0);
                if (key == null)
                {
                    value = default;
                    return false;
                }
    
                value = _dict[key];
                return true;
            }
    
            value = default;
            return false;
        }
    
        public void Add(string key, T value)
        {
            _dict[key] = value;
        }
    } 
    

    Benchmarks

    The benchmarks preload 1000 entries for TryGetValue tests to demonstrate O(n) behavior. In addition, they insert 10 entries for Add tests to check space requirements and add performance degradation.

    [SimpleJob(RuntimeMoniker.Net60, invocationCount: 50000)]
    [MemoryDiagnoser]
    [GroupBenchmarksBy(BenchmarkLogicalGroupRule.ByCategory)]
    [CategoriesColumn]
    public class CaseSensitiveDictionaryBenchmarks
    {
        // use longer key to simulate expected hash costs
        private const string Key = "100000500";
        private const int Capacity = 1000;
        private const int Iterations = 9;
    
        private MultiCaseDictionary<int> _multiCaseDictionary;
        private CaseSensitiveDictionary<int> _caseSensitiveDictionary;
    
        [IterationSetup(Targets = new string[] {
            nameof(MultiCaseDictionary_Search),
            nameof(MultiCaseDictionary_InsensitiveSearch)
        })]
        public void SetupMultiCase()
        {
            _multiCaseDictionary = new MultiCaseDictionary<int>(Capacity);
            for (int i = 100000000; i < 100001000; i++)
            {
                _multiCaseDictionary.Add(i.ToString(), i);
            }
        }
    
        [IterationSetup(Targets = new string[] {
            nameof(CaseSensitiveDictionary_Search),
            nameof(CaseSensitiveDictionary_InsensitiveSearch),
        })]
        public void SetupCaseSensitive()
        {
            _caseSensitiveDictionary = new CaseSensitiveDictionary<int>(Capacity);
            for (int i = 100000000; i < 100001000; i++)
            {
                _caseSensitiveDictionary.Add(i.ToString(), i);
            }
        }
    
        [Benchmark(Baseline = true), BenchmarkCategory("TryGetValue")]
        public int CaseSensitiveDictionary_Search()
        {
            int result;
            for (int i = 0; i < Iterations; i++)
            {
                _caseSensitiveDictionary.TryGetValue(Key, out result, false);
            }
            _caseSensitiveDictionary.TryGetValue(Key, out result, false);
            return result;
        }
    
        [Benchmark, BenchmarkCategory("TryGetValue")]
        public int CaseSensitiveDictionary_InsensitiveSearch()
        {
            int result;
            for (int i = 0; i < Iterations; i++)
            {
                _caseSensitiveDictionary.TryGetValue(Key, out result, true);
            }
            _caseSensitiveDictionary.TryGetValue(Key, out result, true);
            return result;
        }
    
        [Benchmark, BenchmarkCategory("TryGetValue")]
        public int MultiCaseDictionary_Search()
        {
            int result;
            for (int i = 0; i < Iterations; i++)
            {
                _multiCaseDictionary.TryGetValue(Key, out result, false);
            }
            _multiCaseDictionary.TryGetValue(Key, out result, false);
            return result;
        }
    
        [Benchmark, BenchmarkCategory("TryGetValue")]
        public int MultiCaseDictionary_InsensitiveSearch()
        {
            int result;
            for (int i = 0; i < Iterations; i++)
            {
                _multiCaseDictionary.TryGetValue(Key, out result, true);
            }
            _multiCaseDictionary.TryGetValue(Key, out result, true);
            return result;
        }
    
        [Benchmark(Baseline = true), BenchmarkCategory("Add")]
        public CaseSensitiveDictionary<int> CaseSensitiveDictonary_Add()
        {
            _caseSensitiveDictionary = new CaseSensitiveDictionary<int>(Capacity);
            for (int i = 100000000; i < 100000010; i++)
            {
                _caseSensitiveDictionary.Add(i.ToString(), i);
            }
    
            return _caseSensitiveDictionary;
        }
    
        [Benchmark, BenchmarkCategory("Add")]
        public MultiCaseDictionary<int> MultiCaseDictionary_Add()
        {
            _multiCaseDictionary = new MultiCaseDictionary<int>(Capacity);
            for (int i = 100000000; i < 100000010; i++)
            {
                _multiCaseDictionary.Add(i.ToString(), i);
            }
    
            return _multiCaseDictionary;
        }
    } 
    

    Results benchmarkresults

    
    BenchmarkDotNet=v0.13.2, OS=macOS Catalina 10.15.7 (19H1417) [Darwin 19.6.0]
    Intel Core i5-1038NG7 CPU 2.00GHz, 1 CPU, 8 logical and 4 physical cores
    .NET SDK=6.0.400
      [Host]     : .NET 6.0.8 (6.0.822.36306), X64 RyuJIT AVX2
      Job-NQOSHT : .NET 6.0.8 (6.0.822.36306), X64 RyuJIT AVX2
    
    Runtime=.NET 6.0  InvocationCount=50000  
    
    
    Method Categories Mean Error StdDev Median Ratio RatioSD Gen0 Allocated Alloc Ratio
    CaseSensitiveDictonary_Add Add 2,423.0 ns 24.67 ns 20.60 ns 2,418.3 ns 1.00 0.00 10.0000 31440 B 1.00
    MultiCaseDictionary_Add Add 3,100.2 ns 61.03 ns 59.94 ns 3,109.2 ns 1.27 0.02 12.8200 40264 B 1.28
    CaseSensitiveDictionary_Search TryGetValue 251.8 ns 6.47 ns 18.67 ns 246.1 ns 1.00 0.00 0.0600 240 B 1.00
    CaseSensitiveDictionary_InsensitiveSearch TryGetValue 95,573.4 ns 1,888.58 ns 2,768.26 ns 95,401.3 ns 378.77 29.36 0.4000 1280 B 5.33
    MultiCaseDictionary_Search TryGetValue 265.5 ns 5.11 ns 10.43 ns 262.2 ns 1.05 0.07 - - 0.00
    MultiCaseDictionary_InsensitiveSearch TryGetValue 233.1 ns 4.83 ns 14.16 ns 226.2 ns 0.93 0.09 - - 0.00

    Memory consumption will be slightly higher with MultiCaseDictionary, depending on the size of your string key, since both the hash of key and the raw key itself will need to be stored. The larger your key, the more memory will be used. For the current benchmark (smaller keys), the 28% additional memory is reasonable for my use case.