Search code examples
javastringalgorithmdata-structurestrie

Why does this iterative trie-traversal not produce the correct word upon a query?


So, here’s my predicament: I’m trying to traverse through a trie data structure to find the nth word.

For those of you unfamiliar, a trie is a compression-based data structure that allows you to insert a series of words, and sort them lexicographically, but have each node be it’s own individual letter, and thus branching off and spelling into the respective words (if it’s unclear, someone who has a more concrete definition, please fix!).

Each node in the tree has an array of 26 nodes to represent the 26 letters of the alphabet. Once a word is spelled out, a Boolean value in the array (isWord) for the last character in the word is flagged as true. This is also the case for words within words such as {a, and, are, art}; “a” is a word, therefore, isWord for this letter is set to true. However, the letters within “and” are tacked onto “a”, and the “d” is flagged as a word.

Now that the introduction is set, here’s my problem: it’s very hard for me to do this recursively, so I tried to do it iteratively. I’m very, very close to the solution, but for some reason, some words are being skipped when I call nthWord(int n). In essence, the method is supposed to traverse through the tree (which is in alphabetical order by the property of the trie) and find the nth word as the name implies. But, as aforesaid, sometimes the method skips over words in the trie, even though it’s guaranteed they’re being added to the trie (and the isWord Boolean is also always correct). I’ve been at this problem for about 3 days now, and I’m so lost.

I expect the output to be the nth word in the sequence (from a very large .txt file of words), but sometimes, it skips certain words. If j is assigned to -1, words like "aardvark" that start with 2 of the same letter are accounted for, but others are skipped. Conversely, if it's assigned to 0, other words are accounted for, but words that start with two of the same letter are skipped.

EDIT: I should also state that the nthWord(...) method doesn’t process duplicate words. Trie’s store frequencies of each word in the last character of said word. Therefore, duplicate words aren’t a problem in this instance.


Solution

  • Here is a recursive solution to this question (which is more intuitive). Just treat this like a tree problem where you have to traverse the tree left to right and try find the Nth word.

    You can DFS from the root node. Keep a variable to store the number of words you have visited so far (the number of nodes with isWord you have visited). And return the word when you reach the Nth word.

    Code would be something like this. I have just written a template code -

    def findWord(TrieNode,word):
        global N
        if TrieNode.isWord:
            if N == 0:
                return word
            else:
                N -= 1
    
        for each in TrieNode.children:
            if each is not None:
                word += each.character
                res = findWord(N,each,word)
                if len(res) > 0:
                    return res
                word = word[:-1]
        return ''
    N = input()
    findWord(root,'')