Search code examples
algorithmanagram

All anagrams in a File


Source : Microsoft Interview Question

We are given a File containing words.We need to determine all the Anagrams Present in it .

Can someone suggest most optimal algorithm to do this.

Only way i know is Sorting all the words,then checking .


Solution

  • It would be good to know more about data before suggesting an algorithm, but lets just assume that the words are in English in the single case.

    Lets assign each letter a prime number from 2 to 101. For each word we can count it's "anagram number" by multiplying its letter corresponding numbers.

    Lets declare a dictionary of {number, list} pairs. And one list to collect resulting anagrams into.

    Then we can collect anagrams in two steps: simply traverse through the file and put each word to a dictionary's list according to its "anagram number"; traverce the map and for every pairs list with length more then 1 store it's contents in a single big anagram list.

    UPDATE:

    import operator
    
    words = ["thore", "ganamar", "notanagram", "anagram", "other"]
    
    letter_code = {'a':2, 'b':3, 'c':5, 'd':7, 'e':11, 'f':13, 'g':17, 'h':19, 'i':23, 'j':29, 'k':31, 'l':37, 'm':41, 'n':43, 
                'o':47, 'p':53, 'q':59, 'r':61, 's':67, 't':71, 'u':73, 'v':79, 'w':83, 'x':89, 'y':97, 'z':101}
    
    def evaluate(word):
        return reduce( operator.mul, [letter_code[letter] for letter in word] )
    
    anagram_map = {}
    anagram_list = []
    for word in words:
        anagram_number = evaluate(word)
        if anagram_number in anagram_map:
            anagram_map[ anagram_number ] += [word]
        else:
            anagram_map[ anagram_number ] = [word]
    
        if len(anagram_map[ anagram_number ]) == 2:
            anagram_list += anagram_map[ anagram_number ] 
        elif len(anagram_map[ anagram_number ]) > 2:
            anagram_list += [ word ]
    
    print anagram_list
    

    Of course the implementation can be optimized further. For instance, you don't really need a map of anagrams, just a counters would do fine. But I guess the code illustrates the idea best as it is.