I have a dictionary like: mydict = {'A B C':0, 'A B E':1, 'E F':0}
Then I have a search key search_string = 'A B'
where I would like to find all keys and values that the search_string is part of the mydict.keys()
. So in this can 'A B C' and 'A B E' will satisfy.
Since mydict can be very large. Is there a constant time complexity to search this rather than:
result = [search_string in key for key, val in mydict.items()]
I am also open to restructure the dictionary if needed.
If the search is always for prefixes of string then you can use a prefix tree or Trie which is an existing Python module.
Trie allows finding matches in O(M) time, where M is the maximum string length reference (i.e. depends upon max key length rather than number of keys).
Code
from pytrie import StringTrie
def create_prefix(dict):
" Creates a prefix tree based upon a dictionary "
# create empty trie
trie = StringTrie()
for k in dict:
trie[k] = k
return trie
Test 1
# Preprocess to create prefix tree
mydict = {'A B C':0, 'A B E':1, 'E F':0}
prefix_tree = create_prefix(mydict)
# Now you can use search tree multile times to speed individual searches
for search_string in ['A B', 'A B C', 'E', 'B']:
results = prefix_tree.values(search_string) # # .values resturn list that has this as a prefix
if results:
print(f'Search String {search_string} found in keys {results}')
else:
print(f'Search String {search_string} not found')
Output
Search String A B found in keys ['A B C', 'A B E']
Search String A B C found in keys ['A B C']
Search String E found in keys ['E F']
Search String B not found
Test 2 (added to answer question from OP)
mydict = {'A B C':0, 'A B C D':0, 'A B C D E':0}
prefix_tree = create_prefix(mydict)
# Now you can use search tree multile times to speed individual searches
for search_string in ['A B', 'A B C', 'A B C D', 'A B C D E', 'B C']:
results = prefix_tree.values(search_string) # # .values resturn list that has this as a prefix
if results:
print(f'Search String {search_string} found in keys {results}')
else:
print(f'Search String {search_string} not found')
Output
Search String A B found in keys ['A B C', 'A B C D', 'A B C D E']
Search String A B C found in keys ['A B C', 'A B C D', 'A B C D E']
Search String A B C D found in keys ['A B C D', 'A B C D E']
Search String A B C D E found in keys ['A B C D E']
Search String B C not found