string algorithm language-agnostic data-structures

How to break down a given text into words from the dictionary?

This is an interview question. Suppose you have a string text and a dictionary (a set of strings). How do you break down text into substrings such that each substring is found in the dictionary.

For example you can break down "thisisatext" into ["this", "is", "a", "text"] using /usr/share/dict/words.

I believe backtracking can solve this problem (in pseudo-Java):

void solve(String s, Set<String> dict, List<String> solution) {
   if (s.length == 0)
      return
   for each prefix of s found in dict
      solve(s without prefix, dict, solution + prefix)
}

List<String> solution = new List<String>()
solve(text, dict, solution)

Does it make sense? Would you optimize the step of searching the prefixes in the dictionary? What data structures would you recommend?

Solution

This solution assumes the existence of Trie data structure for the dictionary. Further, for each node in Trie, assumes the following functions:

node.IsWord() : Returns true if the path to that node is a word
node.IsChild(char x): Returns true if there exists a child with label x
node.GetChild(char x): Returns the child node with label x

Function annotate( String str, int start, int end, int root[], TrieNode node):
i = start
while i<=end:
    if node.IsChild ( str[i]):
        node = node.GetChild( str[i] )
        if node.IsWord():
            root[i+1] = start
        i+=1
    else:
        break;

end = len(str)-1
root = [-1 for i in range(len(str)+1)]
for start= 0:end:
    if start = 0 or root[start]>=0:
        annotate(str, start, end, root, trieRoot)

index  0  1  2  3  4  5  6  7  8  9  10  11
str:   t  h  i  s  i  s  a  t  e  x  t
root: -1 -1 -1 -1  0 -1  4  6 -1  6 -1   7

I will leave the part for you to list the words that make up the string by reverse traversing the root.

Time complexity is O(nk) where n is the length of the string and k is the length of the longest word in the dictionary.

PS: I am assuming following words in the dictionary: this, is, a, text, ate.