Search code examples
pythontrie

Delete nodes of a word if they are not being used by another word in the Trie structure


When deleting a word from a trie, I'm trying to delete the nodes of that word if they are not being used for another word.

So I don't want to just mark a node when a word is deleted. Unused nodes should really be removed.

If the word can't be found in the trie, I want the delete method to return False and if the deletion works it should return True.

This is my Trie class:

class Trie(object):
    def __init__(self):
        self.children = {}
        self.end = "#"

    def append_word(self, word: str):
        node = self.children
        for c in word:
            node = node.setdefault(c, {})
        node[self.end] = self.end

Here's the delete method I tried to implement based on research:

    def delete(self, word):
        node = self
        parent = self
        for char in word:
            if char in node.children:
                parent = node
                node = node.children[char]
            else:
                return False
        if not node.children:
            del parent.children[char]
            del node
            return True
        else:
            node.end = "#"
            return True

What am I missing here?

I am calling the function like this, from an instance of the trie from another class:

self.trie.delete(user_input)

Solution

  • The issues with your attempt are related with these two points:

    • Your append_word method shows that nodes do not have a children property. They are dictionaries. The only object that has a children property is the Trie instance, and you only have one such instance. The rest of the structure is a nested dictionary that starts with that children property

    • With parent you only retain the last parent, not all ancestors. To do it right you would need to backtrack over potentially multiple ancestors until you bump into an ancestor that is still in use for another word. So actually you need a list of ancestors instead of just one parent reference.

    Here is the corrected implementation:

    def delete(self, word):
        node = self.children
        stack = []
        for char in word:
            if char not in node:  # Word is not in the Trie
                return False
            stack.append(node)  # Collect as ancestor
            node = node[char]
        if self.end not in node:  # End-of-word marker is missing, so word is not in Trie
            return False
        del node[self.end]   # Remove end-of-word marker
        for char in reversed(word):  # Backtrack in reversed order
            if len(node):  # Still in use for another word?
                break
            node = stack.pop()
            del node[char]
        return True