I wanted to implement an algorithm with Dictionary<Dictionary<char,int>, List<string>>
to find the anagram words in a dictionary.
As i need to implement my custom EqualityComparer
for this Dictionary, is the access time still O(1) i.e big O (1) ?
Second question, As part of the EqualityComparer
I also need to implement the GetHashCode()
. What is the efficient way of determining GetHashCode()
for Dictionary<Dictionary<char,int>, List<string>>
?
i just came up with this method, is there any better alternative?
public int GetHashCode(Dictionary<char, int> obj)
{
unchecked
{
int hashCode = 17;
foreach (var item in obj)
{
hashCode += 23 * item.Key.GetHashCode();
}
return hashCode;
}
}
Any word of advice is appreciated. Thanks!
How about converting the word "need" into the string "d1e2n1" instead of using a Dictionary as key? In order to build this string, you could use a binary tree. A char would be used as key and the character count as value. The binary tree is automatically sorted by the key, which is not the case for a dictionary.
You can calculate a combined hash value from single hash values by combining their binary representation with the XOR operation. With C#, you would do something like this:
public override int GetHashCode()
{
// Combine hashcode of a and b
return a.GetHashCode() ^ b.GetHashCode();
}
Finding an entry in an unsorted list is an O(n) operation. Finding an entry in a sorted list is an O(log(n)) operation, if a binary search is used.
Finding a word in a list in a dictionary is an O(1 + n) operation, which is the same as an O(n) operation, or an O(1 + log(n)) operation, which is the same as an O(log(n)) operation.
EDIT:
Here is a possible implementation:
var anagrams = new Dictionary<string, List<string>>();
foreach (string word in words) {
string key = GetFrequency(word);
List<string> list;
if (anagrams.TryGetValue(key, out list)) {
list.Add(word);
} else {
list = new List<string> { word };
anagrams.Add(key, list);
}
}
It uses this method to get the key:
private string GetFrequency(string word)
{
var dict = new SortedDictionary<char, int>(); // Implemented as binary tree
foreach (char c in word.ToLower()) {
int count;
if (dict.TryGetValue(c, out count)) {
dict[c] += 1;
} else {
dict[c] = 1;
}
}
return dict.Aggregate(new StringBuilder(), (sb, item) => sb.Append(item.Key).Append(item.Value), sb => sb.ToString());
}
Using this definition for the words ...
var words = new List<string> { "need", "eden", "team", "meat", "meta", "Nat", "tan" };
This test ...
foreach (var item in anagrams.OrderBy(x => x.Key)) {
Console.WriteLine();
Console.WriteLine(item.Key + ":");
foreach (string word in item.Value.OrderBy(w => w)) {
Console.WriteLine(" " + word);
}
}
... produces this output
a1e1m1t1:
meat
meta
team
a1n1t1:
Nat
tan
d1e2n1:
eden
need
EDIT #2:
Here is the frequency calculation as suggest by Ben Voigt
private string GetFrequencyByBenVoigt(string word)
{
char[] chars = word.ToLower().ToCharArray();
Array.Sort(chars);
return new string(chars);
}
The test result would be
aemt:
meat
meta
team
ant:
Nat
tan
deen:
eden
need