During a recent job interview, I was asked to give a solution to the following problem:
Given a string s
(without spaces) and a dictionary, return the words in the dictionary that compose the string.
For example, s= peachpie, dic= {peach, pie}, result={peach, pie}
.
I will ask the the decision variation of this problem:
if
s
can be composed of words in the dictionary returnyes
, otherwise returnno
.
My solution to this was in backtracking (written in Java)
public static boolean words(String s, Set<String> dictionary)
{
if ("".equals(s))
return true;
for (int i=0; i <= s.length(); i++)
{
String pre = prefix(s,i); // returns s[0..i-1]
String suf = suffix(s,i); // returns s[i..s.len]
if (dictionary.contains(pre) && words(suf, dictionary))
return true;
}
return false;
}
public static void main(String[] args) {
Set<String> dic = new HashSet<String>();
dic.add("peach");
dic.add("pie");
dic.add("1");
System.out.println(words("peachpie1", dic)); // true
System.out.println(words("peachpie2", dic)); // false
}
What is the time complexity of this solution? I'm calling recursively in the for loop, but only for the prefix's that are in the dictionary.
Any idea's?
You can easily create a case where program takes at least exponential time to complete. Let's just take a word aaa...aaab
, where a
is repeated n
times. Dictionary will contain only two words, a
and aa
.
b
in the end ensure that function never finds a match and thus never exits prematurely.
On each words
execution, two recursive calls will be spawned: with suffix(s, 1)
and suffix(s, 2)
. Execution time, therefore, grows like fibonacci numbers: t(n) = t(n - 1) + t(n - 2)
. (You can verify it by inserting a counter.) So, complexity is certainly not polynomial. (and this is not even the worst possible input)
But you can easily improve your solution with Memoization. Notice, that output of function words
depends on one thing only: at which position in original string we're starting. E.e., if we have a string abcdefg
and words(5)
is called, it doesn't matter how exactly abcde
is composed (as ab+c+de
or a+b+c+d+e
or something else). Thus, we don't have to recalculate words("fg")
each time.
In the primitive version, this can be done like this
public static boolean words(String s, Set<String> dictionary) {
if (processed.contains(s)) {
// we've already processed string 's' with no luck
return false;
}
// your normal computations
// ...
// if no match found, add 's' to the list of checked inputs
processed.add(s);
return false;
}
PS Still, I do encourage you to change words(String)
to words(int)
. This way you'll be able to store results in array and even transform the whole algorithm to DP (which would make it much simpler).
edit 2
Since I have not much to do besides work, here's the DP (dynamic programming) solution. Same idea as above.
String s = "peachpie1";
int n = s.length();
boolean[] a = new boolean[n + 1];
// a[i] tells whether s[i..n-1] can be composed from words in the dictionary
a[n] = true; // always can compose empty string
for (int start = n - 1; start >= 0; --start) {
for (String word : dictionary) {
if (start + word.length() <= n && a[start + word.length()]) {
// check if 'word' is a prefix of s[start..n-1]
String test = s.substring(start, start + word.length());
if (test.equals(word)) {
a[start] = true;
break;
}
}
}
}
System.out.println(a[0]);