Search code examples
pythondictionaryhashmap

Python dictionary, constant complexity way to return all keys in dict contain certain string


I have a dictionary like: mydict = {'A B C':0, 'A B E':1, 'E F':0}

Then I have a search key search_string = 'A B'

where I would like to find all keys and values that the search_string is part of the mydict.keys(). So in this can 'A B C' and 'A B E' will satisfy.

Since mydict can be very large. Is there a constant time complexity to search this rather than:

result = [search_string in key for key, val in mydict.items()]

I am also open to restructure the dictionary if needed.


Solution

  • If the search is always for prefixes of string then you can use a prefix tree or Trie which is an existing Python module.

    Trie allows finding matches in O(M) time, where M is the maximum string length reference (i.e. depends upon max key length rather than number of keys).

    Code

    from pytrie import StringTrie 
    
    def create_prefix(dict):
    " Creates a prefix tree based upon a dictionary "
      # create empty trie 
      trie = StringTrie() 
    
      for k in dict:
        trie[k] = k
    
      return trie
    

    Test 1

    # Preprocess to create prefix tree
    mydict = {'A B C':0, 'A B E':1, 'E F':0}
    prefix_tree = create_prefix(mydict)
    
    # Now you can use search tree multile times to speed individual searches
    for search_string in ['A B', 'A B C', 'E', 'B']:
      results = prefix_tree.values(search_string) # # .values resturn list that has this as a prefix
      if results:
        print(f'Search String {search_string} found in keys {results}')
      else:
        print(f'Search String {search_string} not found')
    

    Output

    Search String A B found in keys ['A B C', 'A B E']
    Search String A B C found in keys ['A B C']
    Search String E found in keys ['E F']
    Search String B not found
    

    Test 2 (added to answer question from OP)

    mydict = {'A B C':0, 'A B C D':0, 'A B C D E':0}
    prefix_tree = create_prefix(mydict)
    # Now you can use search tree multile times to speed individual searches
    for search_string in ['A B', 'A B C', 'A B C D', 'A B C D E', 'B C']:
      results = prefix_tree.values(search_string) # # .values resturn list that has this as a prefix
      if results:
        print(f'Search String {search_string} found in keys {results}')
      else:
        print(f'Search String {search_string} not found')
    

    Output

    Search String A B found in keys ['A B C', 'A B C D', 'A B C D E']
    Search String A B C found in keys ['A B C', 'A B C D', 'A B C D E']
    Search String A B C D found in keys ['A B C D', 'A B C D E']
    Search String A B C D E found in keys ['A B C D E']
    Search String B C not found