Search code examples
.netgenericsdictionarybig-oiequalitycomparer

What is the AccessTime for Dictionary<Dictionary<char,int>,List<string>> , is it still O(1)?


I wanted to implement an algorithm with Dictionary<Dictionary<char,int>, List<string>> to find the anagram words in a dictionary.

As i need to implement my custom EqualityComparer for this Dictionary, is the access time still O(1) i.e big O (1) ?

Second question, As part of the EqualityComparer I also need to implement the GetHashCode(). What is the efficient way of determining GetHashCode() for Dictionary<Dictionary<char,int>, List<string>>?

i just came up with this method, is there any better alternative?

public int GetHashCode(Dictionary<char, int> obj)
    {
        unchecked
        {
            int hashCode = 17;
            foreach (var item in obj)
            {
                hashCode += 23 * item.Key.GetHashCode();
            }
            return hashCode;
        }
    }

Any word of advice is appreciated. Thanks!


Solution

  • How about converting the word "need" into the string "d1e2n1" instead of using a Dictionary as key? In order to build this string, you could use a binary tree. A char would be used as key and the character count as value. The binary tree is automatically sorted by the key, which is not the case for a dictionary.

    You can calculate a combined hash value from single hash values by combining their binary representation with the XOR operation. With C#, you would do something like this:

    public override int GetHashCode()
    {
        // Combine hashcode of a and b
        return a.GetHashCode() ^ b.GetHashCode();
    }
    

    Finding an entry in an unsorted list is an O(n) operation. Finding an entry in a sorted list is an O(log(n)) operation, if a binary search is used.

    Finding a word in a list in a dictionary is an O(1 + n) operation, which is the same as an O(n) operation, or an O(1 + log(n)) operation, which is the same as an O(log(n)) operation.


    EDIT:

    Here is a possible implementation:

    var anagrams = new Dictionary<string, List<string>>();
    foreach (string word in words) {
        string key = GetFrequency(word);
        List<string> list;
        if (anagrams.TryGetValue(key, out list)) {
            list.Add(word);
        } else {
            list = new List<string> { word };
            anagrams.Add(key, list);
        }
    }
    

    It uses this method to get the key:

    private string GetFrequency(string word)
    {
        var dict = new SortedDictionary<char, int>(); // Implemented as binary tree
        foreach (char c in word.ToLower()) {
            int count;
            if (dict.TryGetValue(c, out count)) {
                dict[c] += 1;
            } else {
                dict[c] = 1;
            }
        }
        return dict.Aggregate(new StringBuilder(), (sb, item) => sb.Append(item.Key).Append(item.Value), sb => sb.ToString());
    }
    

    Using this definition for the words ...

    var words = new List<string> { "need", "eden", "team", "meat", "meta", "Nat", "tan" };
    

    This test ...

    foreach (var item in anagrams.OrderBy(x => x.Key)) {
        Console.WriteLine();
        Console.WriteLine(item.Key + ":");
        foreach (string word in item.Value.OrderBy(w => w)) {
            Console.WriteLine("    " + word);
        }
    }
    

    ... produces this output

    a1e1m1t1:
        meat
        meta
        team
    
    a1n1t1:
        Nat
        tan
    
    d1e2n1:
        eden
        need
    

    EDIT #2:

    Here is the frequency calculation as suggest by Ben Voigt

    private string GetFrequencyByBenVoigt(string word)
    {
        char[] chars = word.ToLower().ToCharArray();
        Array.Sort(chars);
        return new string(chars);
    }
    

    The test result would be

    aemt:
        meat
        meta
        team
    
    ant:
        Nat
        tan
    
    deen:
        eden
        need