The main function that builds the failure/prefix table in KMP as I have seen (in all online resources even in this answer_ is as follows:
int j = 0;
for (int i = 1; i < pattern.length(); i++) {
while (j > 0 && pattern.charAt(j) != pattern.charAt(i)) {
j = failure[j - 1];
}
if (pattern.charAt(j) == pattern.charAt(i)) {
j++;
}
failure[i] = j;
}
I can't understand this part:
j = failure[j - 1];
Why don't we do j--
instead to go back the string? How do we know that it is correct to update j
using the failed table?
KMP string matching is correct if the i'th entry of the failure table is the length of the longest suffix-prefix match of the prefix of the pattern of length i.
Notice that if A[0..k]
is a suffix-prefix match of A[0..i]
, then A[0..k]
is either the longest suffix-prefix match of A[0..i]
or it's a suffix-prefix match of said longest suffix-prefix match.
When you put these two things together, it turns out that we want failure[i]
to be the length of the longest suffix-prefix match of pattern[0..i]
. KMP preprocessing builds failure
inductively, using the following fact:
If the longest suffix-prefix match of A[0..i]
is nonempty, chopping off the last character will give some suffix-prefix match of A[0..i-1]
.
Consequently, the longest suffix-prefix match of A[0..i]
is either empty or it's formed by extending the longest suffix-prefix match of A[0..i-1]
by one character or by extending a suffix-prefix match of that longest suffix-prefix match by one character. The failure function for the preceding characters gives you a straightforward way to iterate over all suffix-prefix matches of pattern[0..i-1]
.