Search code examples
pythonalgorithmtrie

Find Compound Words in List of Words using Trie


Given a list of words, I am trying to figure out how to find words in that list that are made up of other words in the list. For example, if the list were ["race", "racecar", "car"], I would want to return ["racecar"].

Here is my general thought process. I understand that using a trie would be good for this sort of problem. For each word, I can find all of its prefixes (that are also words in the list) using the trie. Then for each prefix, I can check to see if the word's suffix is made up of one or more words in the trie. However, I am having a hard time implementing this. I have been able to implement the trie and and the function to get all prefixes of a word. I am just stuck on implementing the compound word detection.


Solution

  • You could present Trie nodes as defaultdict objects which have been extended to contain a boolean flag marking if the prefix is a word. Then you could have two pass processing where on the first round you add all the words to Trie and on second round check for each word if it's a combination or not:

    from collections import defaultdict
    
    class Node(defaultdict):
        def __init__(self):
            super().__init__(Node)
            self.terminal = False
    
    class Trie():
        def __init__(self, it):
            self.root = Node()
            for word in it:
                self.add_word(word)
    
        def __contains__(self, word):
            node = self.root
            for c in word:
                node = node.get(c)
                if node is None:
                    return False
    
            return node.terminal
    
        def add_word(self, word):
            node = self.root
            for c in word:
                node = node[c]
    
            node.terminal = True
    
        def is_combination(self, word):
            node = self.root
            for i, c in enumerate(word):
                node = node.get(c)
                if not node:
                    break
                # If prefix is a word check if suffix can be found
                if node.terminal and word[i+1:] in self:
                    return True
    
            return False
    
    lst = ["race", "racecar", "car"]
    t = Trie(lst)
    
    print([w for w in lst if t.is_combination(w)])
    

    Output:

    ['racecar']