algorithm recursion time-complexity memoization

Memoization algorithm time complexity

I read this article Retiring a Great Interview Problem, the author came up with a "word break" problem and gave three solutions. The efficient one uses memoization algorithm and the author said its worst case time complexity is O(n^2) since "the key insight is that SegmentString is only called on suffixes of the original input string, and that there are only O(n) suffixes".

However, I find it difficult for me to understand why it is O(n^2). Could someone please give me a hint or proof?

Word Break Problem:

    Given an input string and a dictionary of words,
    segment the input string into a space-separated
    sequence of dictionary words if possible. For
    example, if the input string is "applepie" and
    dictionary contains a standard set of English words,
    then we would return the string "apple pie" as output.

The memoization algorithm from Retiring a Great Interview Problem

Map<String, String> memoized;

String SegmentString(String input, Set<String> dict) {
      if (dict.contains(input)) 
          return input;
      if (memoized.containsKey(input) {
          return memoized.get(input);
      }
      int len = input.length();
      for (int i = 1; i < len; i++) {
          String prefix = input.substring(0, i);
          if (dict.contains(prefix)) {
              String suffix = input.substring(i, len);
              String segSuffix = SegmentString(suffix, dict);
              if (segSuffix != null) {
                  return prefix + " " + segSuffix;
              }
          }
      }
      memoized.put(input, null);
      return null;
}

Solution

Intuition

(it is clear that the worst case is the one where there are no solutions, so I focus on that)

Since the recursive call is made before putting values in the memoization-cache, the last (shortest) suffixes will get cached first. This is because the function is first called with a string of length N, which then calls itself with a string of length N-1, which then .... , with a string of len 0, which is cached and returns, then the string of length 1 is cached and returns, ..., length N is cached and returns.

As the hint suggests, only suffixes get cached, and there are only N of them. This means that by the time the top-level function gets the result of its first recursive call (i.e. on a suffix of length N-1), the cache is already populated with the N-1 suffixes.

Now, assuming the N-1 last suffixes are already cached, the for-loop needs to make N-1 recursive calls, each taking O(1) (since the answer is already cached), totalling O(N). However, (pre-) building the cache of the last N-1 takes O(N^2) (explained below), totalling with O(N)+O(N^2) = O(N^2).

Proof by mathematical induction

This explanation can easily be translated to a formal proof using induction. Here is the gist of it:

(f(N) being the number of operations required for the function to complete on an input of length N)

Induction hypothesis -- there exists a constant c s.t. f(N) < c*N^2.

The base case is trivial -- for a string of length 1, you can find a constant c such that f(1) < c*N^2 = c

Induction step

Recap of the order things happen:

Step 1: the function first calls itself on the suffix of length N-1, building a cache containing the answer for the last N-1 suffixes

Step 2: the function then calls itself O(N) more times, each taking O(1) (thanks to this test: if (memoized.containsKey(input)), and the fact that the cache has already been populated in Step 1).

So we get:

f(N+1) = f(N) + a*N <   (by the hypothesis)
       < c*N^2 + a*N <  (for c large enough)
       < c*N^2 + 2*c*N + c =
       = c*(N+1)^2

Thus we got f(N+1) < c*(N+1)^2, which completes the proof.