Search code examples
algorithmcomplexity-theorybacktracking

Create given string from dictionary entries


During a recent job interview, I was asked to give a solution to the following problem:

Given a string s (without spaces) and a dictionary, return the words in the dictionary that compose the string.

For example, s= peachpie, dic= {peach, pie}, result={peach, pie}.

I will ask the the decision variation of this problem:

if s can be composed of words in the dictionary return yes, otherwise return no.

My solution to this was in backtracking (written in Java)

public static boolean words(String s, Set<String> dictionary)
{
    if ("".equals(s))
        return true;

    for (int i=0; i <= s.length(); i++)
    {
        String pre = prefix(s,i); // returns s[0..i-1]
        String suf = suffix(s,i); // returns s[i..s.len]
        if (dictionary.contains(pre) && words(suf, dictionary))
            return true;
    }
    return false;
}

public static void main(String[] args) {
    Set<String> dic = new HashSet<String>();
    dic.add("peach");
    dic.add("pie");
    dic.add("1");

    System.out.println(words("peachpie1", dic)); // true
    System.out.println(words("peachpie2", dic)); // false
}

What is the time complexity of this solution? I'm calling recursively in the for loop, but only for the prefix's that are in the dictionary.

Any idea's?


Solution

  • You can easily create a case where program takes at least exponential time to complete. Let's just take a word aaa...aaab, where a is repeated n times. Dictionary will contain only two words, a and aa.

    b in the end ensure that function never finds a match and thus never exits prematurely.

    On each words execution, two recursive calls will be spawned: with suffix(s, 1) and suffix(s, 2). Execution time, therefore, grows like fibonacci numbers: t(n) = t(n - 1) + t(n - 2). (You can verify it by inserting a counter.) So, complexity is certainly not polynomial. (and this is not even the worst possible input)

    But you can easily improve your solution with Memoization. Notice, that output of function words depends on one thing only: at which position in original string we're starting. E.e., if we have a string abcdefg and words(5) is called, it doesn't matter how exactly abcde is composed (as ab+c+de or a+b+c+d+e or something else). Thus, we don't have to recalculate words("fg") each time.
    In the primitive version, this can be done like this

    public static boolean words(String s, Set<String> dictionary) {
        if (processed.contains(s)) {
            // we've already processed string 's' with no luck
            return false;
        }
    
        // your normal computations
        // ...
    
        // if no match found, add 's' to the list of checked inputs
        processed.add(s);
        return false;
    }
    

    PS Still, I do encourage you to change words(String) to words(int). This way you'll be able to store results in array and even transform the whole algorithm to DP (which would make it much simpler).

    edit 2
    Since I have not much to do besides work, here's the DP (dynamic programming) solution. Same idea as above.

        String s = "peachpie1";
        int n = s.length();
        boolean[] a = new boolean[n + 1];
        // a[i] tells whether s[i..n-1] can be composed from words in the dictionary
        a[n] = true; // always can compose empty string
    
        for (int start = n - 1; start >= 0; --start) {
            for (String word : dictionary) {
                if (start + word.length() <= n && a[start + word.length()]) {
                    // check if 'word' is a prefix of s[start..n-1]
                    String test = s.substring(start, start + word.length());
                    if (test.equals(word)) {
                        a[start] = true;
                        break;
                    }
                }
            }
        }
    
        System.out.println(a[0]);