string algorithm time-complexity trie substring

Using a trie for string segmentation - time complexity?

Problem to be solved:

Given a non-empty string s and a string array wordArr containing a list of non-empty words, determine if s can be segmented into a space-separated sequence of one or more dictionary words. You may assume the dictionary does not contain duplicate words.

For example, given s = "leetcode", wordArr = ["leet", "code"].

Return true because "leetcode" can be segmented as "leet code".

In the above problem, would it work to build a trie that has each string in wordArr. Then, for each char in given string s, work down the trie. If a trie branch terminates, then this substring is complete so pass the remaining string up to the root and do the exact same thing recursively.

This should be O(N) time and O(N) space correct? I ask because the problem I'm working on says this will be O(N^2) time in the most optimal way and I'm not sure what's wrong with my approach.

For example, if s = "hello" and wordArr = ["he", "ll", "ee", "zz", "o"], then "he" will be completed in the first branch of the trie, "llo" will be passed up to the root recursively. Then, "ll" will be completed, so "o" gets passed up to root of trie. Then "o" is completed, which is the end of s, so return true. If the end of s isn't completed, return false.

Is this correct?

Solution

I suspect off-hand that the issue is backtracking. What if the word is not segmentable based on a particular dictionary, or what if there are multiple possible substrings with a common prefix? E.g., suppose the dictionary contains he, llenic, and llo. Failure down one branch of the trie would require backtracking, with some corresponding increase in time complexity.

This is similar to a regex-match problem: the example you give is like testing an input word against

^(he|ll|ee|zz|o)+$

(any number of dictionary members, in any order, and nothing else). I don't know the time complexity of regex matchers offhand, but I know backtracking can get you into serious time trouble.

I did find this answer which says:

Running a DFA-compiled regular expression against a string is indeed O(n), but can require up to O(2^m) construction time/space (where m = regular expression size).

So maybe it is O(n^2) with reduced construction effort.