Search code examples
data-structureshashanagram

get list of anagrams from a dictionary


Basically, Anagrams are like permutation of string.E.g stack ,sackt ,stakc all are anagrams of stack (thought above words aren't meaningful). Anyways you could have understood what I basically meant.

Now, I want a list of anagrams given million words or simply say from a dictionary.

My basic question is Find total number of unique anagrams in a dictionary?

Sorting and comparing won't work as it's time complexity is pretty bad.

I thought of using hash table, string as key.

But the problem is what should be the hash function ? It would be helpful if some pseudocode provided. Some other approaches better than mentioned approaches would also be helpful.

Thanks.


Solution

  • The obvious solution is to map each character to a prime number and multiply the prime numbers. So if 'a'' -> 2 and 'b' -> 3, then

    • 'ab' -> 6
    • 'ba' -> 6
    • 'bab' -> 18
    • 'abba' -> 36
    • 'baba' -> 36

    To minimise the chance of overflow, the smallest primes could be assigned to the more frequent letters (e,t,i,a,n). Note: The 26th prime is 101.

    UPDATE: an implementation can be found here