Search code examples
performancealgorithmhashmapprime-factoring

Using primes to determine anagrams faster than looping through?


I had a telephone recently for a SE role and was asked how I'd determine if two words were anagrams or not, I gave a reply that involved something along the lines of getting the character, iterating over the word, if it exists exit loop and so on. I think it was a N^2 solution as one loop per word with an inner loop for the comparing.

After the call I did some digging and wrote a new solution; one that I plan on handing over tomorrow at the next stage interview, it uses a hash map with a unique prime number representing each character of the alphabet. I'm then looping through the list of words, calculating the value of the word and checking to see if it compares with the word I'm checking. If the values match we have a winner (the whole mathematical theorem business).

It means one loop instead of two which is much better but I've started to doubt myself and am wondering if the additional operations of the hashmap and multiplication are more expensive than the original suggestion.

I'm 99% certain the hash map is going to be faster but...

Can anyone confirm or deny my suspicions? Thank you.

Edit: I forgot to mention that I check the size of the words first before even considering doing anything.


Solution

  • An anagram contains all the letters of the original word, in a different order. You are on the right track to use a HashMap to process a word in linear time, but your prime number idea is an unnecessary complication.

    Your data structure is a HashMap that maintains the counts of various letters. You can add letters from the first word in O(n) time. The key is the character, and the value is the frequency. If the letter isn't in the HashMap yet, put it with a value of 1. If it is, replace it with value + 1.

    When iterating over the letters of the second word, subtract one from your count instead, removing a letter when it reaches 0. If you attempt to remove a letter that doesn't exist, then you can immediately state that it's not an anagram. If you reach the end and the HashMap isn't empty, it's not an anagram. Else, it's an anagram.

    Alternatively, you can replace the HashMap with an array. The index of the array corresponds to the character, and the value is the same as before. It's not an anagram if a value drops to -1, and it's not an anagram at the end if any of the values aren't 0.

    You can always compare the lengths of the original strings, and if they aren't the same, then they can't possibly be anagrams. Including this check at the beginning means that you don't have to check if all the values are 0 at the end. If the strings are the same length, then either something will produce a -1 or there will be all 0s at the end.