Search code examples
pythonalgorithmdata-structurestrie

Time complexity of Search operation in TRIE data structure


I am trying to implement a dictionary based trie data structure. Please find the python code below.

class Trie:
  def __init__(self):
     self.word_end = '_end_'
     self.root = {}

  def add_word(self, word):
     self.add_words([word])

  def add_words(self, words):
     for word in words:
         current_dict = self.root
         for letter in word:
             current_dict = current_dict.setdefault(letter, {})
         current_dict[self.word_end] = self.word_end


  def check_word(self, word):
     current_dict = self.root
     for letter in word:
         if letter in current_dict:
             current_dict = current_dict[letter]
         else:
             return False
     else:
         if self.word_end in current_dict:
             return True
         else:
             return False



tree = Trie()
tree.add_words(['foo', 'bar', 'baz', 'barz'])
print tree
"""
{'b': {'a': {'r': {'_end_': '_end_', 'z': {'_end_': '_end_'}}, 
             'z': {'_end_': '_end_'}}}, 
 'f': {'o': {'o': {'_end_': '_end_'}}}}
 """

print check_word('baz')
# True
print check_word('barz')
# True
print check_worf('barzz')
# False

I see the complexity of searching a word is O(m) where m is the length of the word being searched. Also, adding a word has very similar complexity - O(m) where m is the length of the word being added.

Question: These complexities are too good. Can someone confirm these complexities? Is there something wrong in my implementation of Trie?


Solution

  • The time complexity of searching in a TRIE is indeed O(k) where k is the length of the string to be searched. This is correct and proven fact. However, the storage requirements is where the penalty is seen. Every node of Trie consists of multiple branches. Each branch represents a possible character of keys. We need to mark the last node of every key as end of word node. A Trie node field isEndOfWord is used to distinguish the node as end of word node.

    In general, insert and search costs O(length_of_string), however the memory requirements of Trie is O(ALPHABET_SIZE * length_of_string* N) where N is number of keys in Trie. There are efficient representation of trie nodes (e.g. compressed trie, ternary search tree, etc.) to minimize memory requirements of trie.