Search code examples
stringalgorithmstring-matchingknuth-morris-pratt

Understanding the Knuth Morris Pratt(KMP) Failure Function


I've been reading the Wikipedia article about the Knuth-Morris-Pratt algorithm and I'm confused about how the values are found in the jump/partial match table.

  i  |  0  1  2  3  4  5  6
W[i] |  A  B  C  D  A  B  D
T[i] | -1  0  0  0  0  1  2

If someone can more clearly explain the shortcut rule because the sentence

"let us say that we discovered a proper suffix which is a proper prefix and ending at W[2] with length 2 (the maximum possible)"

is confusing. If the proper suffix ends at W[2] wouldn't it be size of 3?

Also I'm wondering why T[4] isn't 1 when there is a prefix and suffix of size 1: The A.

Thanks for any help that can be offered.


Solution

  • Notice that the failure function T[i] does not use i as an index, but rather as a length. Therefore, T[2] represents the length of the longest proper border (a string that is both a prefix and suffix) of the string formed from the first two characters of W, rather than the longest proper border formed by the string ending at character 2. This is why the maximum possible value of T[2] is 2 rather than 3 - the substring formed from the first two characters of W can't have length any greater than 2.

    Using this interpretation, it's also easier to see why T[4] is 0 rather than 1. The substring of W formed from the first four characters of W is ABCD, which has no proper prefix that is also a proper suffix.

    Hope this helps!