Search code examples
pythonarraysstringanagram

find number of anagrams for one array of string in respect to another array of string


Suppose there are two arrays of strings. One array is named query, and the other one is named dictionary. For each string element of the query, you need to find how many anagrams of it exist among the elements of dictionary and push that number to another array. Your code has to return that array, and its size should be equal to that of the query (as expected).

My approach to solving this problem was:

  1. loop through each element of query and dictionary(in a nested loop);
  2. check if the length of an element of the query was equal to the nested element of the dictionary. If yes, then I checked they had the same characters using set(word)==set(st) (st refers to the dictionary).

My code was like this:

anagrams = list()
for word in query:
   ana = 0
   for st in dictionary:
      if(len(word)==len(st)):
          if(set(word)==set(st)):
             ana = ana + 1
   anagrams.append(ana)

This logic gives me the correct result, but it is not optimized. As a result, it exceeds the 10-sec time limit. The length of both query and dictionary can be up to 10^15.

My logic runs in O(n^2) time. Is there any way to optimize the code a bit more?


Solution

  • You could use Python dictionaries to speed things up:

    dict_sorted = {}
    
    for s in dictionary:  #  linear in terms of the size of `dictionary`
        sorted_s = sorted(s.lower())
        dict_sorted[sorted_s] = dict_sorted.get(sorted_s, 0) + 1
    
    anagrams = []
    
    for s in query:  #  linear in terms of the size of `query`
        sorted_s = sorted(s.lower())
        anagrams.append(dict_sorted.get(sorted_s, 0))
    

    Using collections.Counter to shorten things:

    from collections import Counter
    
    dict_sorted = Counter([sorted(s.lower()) for s in dictionary])
    
    anagrams = [ dict_sorted.get(sorted(s.lower()), 0) for s in query ]