Search code examples
pythontrie

Return strings that matches the prefix in a Trie


I managed to construct a Trie and now i want to return strings that matches a prefix from the Trie but I'm having issues writing the search function.

For example, if i have a prefix "aa", i want to have the strings "aa" and "aac" as an output.

class Node:
    def __init__(self):
        self.children = [None] * 26
        self.end = False
        self.value = ""

class Trie:
    def __init__(self):
        self.root = Node()

    def add_word(self, key):
        word_length = len(key)
        current = self.root
        for i in range(word_length):
            position = self.ord_char(key[i])

            if current.children[position] is None:
                current.children[position] = Node()
            current = current.children[position]
            current.value = key[i]
        current.end = True

    def ord_char(self,key):
        ord_rep = ord(key) - ord('a')
        return ord_rep

    def prefix_search(self, prefix):
        lst = []
        current = self.root
        prefix_length = len(prefix)
        for c in range(prefix_length):
            c_position = self.ord_char(prefix[c])
            current = current.children[c_position]
            lst.append(current.value)
        #doesnt seem like I'm doing it right

if __name__ == "__main__":
    trie = Trie()
    trie.add_word("aa")
    trie.add_word("aac")
    trie.add_word("b")
    trie.prefix_search("aa")

I was thinking of just joining up the alphabets together to form the final string through the search function but i just couldn't think of any better way to do so.


Solution

  • The lst value so far is just the prefix, split up into separate letters, but now you need to process each node you find in the children attribute that is not None, to find all nodes that have end set to True. Each time you find such a node, you have a complete word. Any node can have multiple child nodes again, branching out into still more words to output.

    You could use a stack to keep track of all the nodes you need to process to build the list, together with their prefix so far. Add child nodes to the stack with the prefix for that node, and process these nodes one by one (adding more child nodes to the stack as you do so).

    Note that to start with, you don't need to build a list of prefix characters, you already have that prefix as a variable. To get to your starting point it is simpler to just iterate over the prefix itself:

    def prefix_search(self, prefix):
        current = self.root
        # get to the starting point
        for c in prefix:
            current = current.children[self.ord_char(c)]
            if current is None:
                # prefix doesn't exist, abort with an empty list
                return []
    
        found = []
        stack = [(current, prefix)]
        while stack:
            current, prefix = stack.pop()
    
            if current.end:
                # this is a complete word, named by prefix
                found.append(prefix)
    
            # add the children to the stack, each with their letter added to the
            # prefix value.
            for child in current.children:
                if child is None:
                    continue
                stack.append((child, prefix + child.value))
    
        return found
    

    For your given sample trie and prefix, the stack starts with the node at 'aa'. The first while stack: iteration removes that node from the stack and because this node has end set to true, 'aa' is added to found. The node has just one non-None child node, for c, and so the node is added to the stack with 'aac'.

    Then the while loop repeats, finds that one element on the stack, sees end is set so 'aac' is added to found, and no more child nodes are located. The stack stays empty and the while loop ends.

    Demo:

    >>> trie = Trie()
    >>> trie.add_word("aa")
    >>> trie.add_word("aac")
    >>> trie.add_word("b")
    >>> trie.prefix_search("aa")
    ['aa', 'aac']
    >>> trie.prefix_search("b")
    ['b']
    >>> trie.add_word('abracadabra')
    >>> trie.add_word('abbreviation')
    >>> trie.add_word('abbreviated')
    >>> trie.add_word('abbrasive')
    >>> trie.prefix_search("ab")
    ['abracadabra', 'abbreviation', 'abbreviated', 'abbrasive']
    >>> trie.prefix_search("abr")
    ['abracadabra']
    >>> trie.prefix_search("abb")
    ['abbreviation', 'abbreviated', 'abbrasive']
    >>> trie.prefix_search("abbra")
    ['abbrasive']