dictionary data-structures nlp trie text-segmentation

Structure of a trie for a word with subwords

What will be the structure of a trie for words which have subwords like "icecream" (contains 'i', 'ice', 'cream', 'icecream'); "businessman" (contains 'bus', 'is', 'business', 'man', 'businessman').

I know how will it be for those which do not have subwords like "inn" but I am confused for the above words.

Thanks in advance.

Solution

You can just have a boolean 'isTerminal' in your trie node to indicate whether a word terminates at that node. So all the words 'bus', 'business' and 'businessman' will start with the node 'b' and be along the same path. The nodes at 's' for 'bus', 's' for 'business' and 'n' for 'businessman' will have isTerminal = true.

Although 'man' is contained in 'businessman', it should be treated as word starting from the 'm' child node from root and on a separate path.

Therefore all words start strictly from the top letter nodes (children of root) and terminate at different levels indicated by the boolean isTerminal=true.