Search code examples
stringalgorithmdata-structurespattern-matchingknuth-morris-pratt

KMP prefix table intuition


The main function that builds the failure/prefix table in KMP as I have seen (in all online resources even in this answer_ is as follows:

int j = 0;  
    for (int i = 1; i < pattern.length(); i++) {  
      while (j > 0 && pattern.charAt(j) != pattern.charAt(i)) {   
        j = failure[j - 1];  
      }  
      if (pattern.charAt(j) == pattern.charAt(i)) {   
        j++;   
      }  
      failure[i] = j;  
    }

I can't understand this part:
j = failure[j - 1];

Why don't we do j-- instead to go back the string? How do we know that it is correct to update j using the failed table?


Solution

  • KMP string matching is correct if the i'th entry of the failure table is the length of the longest suffix-prefix match of the prefix of the pattern of length i.

    Notice that if A[0..k] is a suffix-prefix match of A[0..i], then A[0..k] is either the longest suffix-prefix match of A[0..i] or it's a suffix-prefix match of said longest suffix-prefix match.

    When you put these two things together, it turns out that we want failure[i] to be the length of the longest suffix-prefix match of pattern[0..i]. KMP preprocessing builds failure inductively, using the following fact:

    If the longest suffix-prefix match of A[0..i] is nonempty, chopping off the last character will give some suffix-prefix match of A[0..i-1].

    Consequently, the longest suffix-prefix match of A[0..i] is either empty or it's formed by extending the longest suffix-prefix match of A[0..i-1] by one character or by extending a suffix-prefix match of that longest suffix-prefix match by one character. The failure function for the preceding characters gives you a straightforward way to iterate over all suffix-prefix matches of pattern[0..i-1].