Given an input string how to search for all the anagrams in O(k logN + W) time where W is the output size and k is the max characters in the string?

I am trying to write a program that given an input string by the user finds all anagrams available in a list? The O(klogN + W ) time complexity does not include time complexity for sorting.

My approach is to first sort each word alphabetically then sort the list alphabetically. For example, a list like this:

['act',bad','cat','tac']...

will become

['act','act','act','bad']

To meet the O(klogN) time complexity I decided to use a binary search. But I'm not sure how to really go about it? This is my current code so far but it only appends the first anagram of the word to anagramList?

def binarySearch(arr, lower, upper, target):
anagramList=[]
if upper >= lower:
    mid = lower + ((upper - lower) // 2)
    if areAnagrams(arr[mid],target):
        anagramList.append(arr[mid])
    elif arr[mid] > target:
        return binarySearch(arr, lower, mid - 1, target)
    else:
        return binarySearch(arr, mid + 1, upper, target)
return anagramList

areAnagrams check if 2 strings are anagrams of each other.

Solution

Sorting the characters in each word is likely the correct approach, but you will need to store the original word(s) and map each sorted character sequence to a list of one or more words, so that you can show all valid results. You will need a mapping like this (to the left is a sorted sequence of characters, to the right are all valid words that are anagrams of those characters ):

"art" -> [ "art", "rat" ]
"acr" -> [ "car" ]

...

Once you have this mapping, you can search through it either with a binary search or use Python's hash mechanism directly, by employing a Python dict object (which, for a dictionary of size N is no less efficient than log2(N) for binary search and is coded in the interpreter, so it is very fast).

Once you've built the dictionary, finding anagrams requires sorting the input sequence (at worst, O(k)) and then locating a matching string (O(log(N)), for binary search). It doesn't depend on the output size at all (the output is ready-made in each dictionary entry already).

If you decide not to use a dict and stick with binary search, most likely the best data structure would be a list of lists, with each element containing ["sorted-characters", "word1", "word2", ...etc ]. The outer list is sorted by the first item (the sorted characters) in each of inner lists, e.g., with the example anagrams above:

["art", "art", "rat" ]
["acr", "car" ]