Search code examples
algorithmknuth-morris-pratt

Simpler KMP prefix table building. What would be wrong with this implementation?


The KMP algorithm needs a prefix table to then know after a failure how many chars it can safely skip. The general idea of the prefix table is that it will tell you for a given pattern P, at a given position i with a char C, how many chars in common the suffix up to C has with the prefix of P:

int[] T = new int[P.length()];
int i = 0;
for (int j = 1; j < P.length(); ++j) {
  if (P.charAt(j) == P.charAt(i)) {
    i++;
  } else {
    i = 0;
  }
  T[j] = i;
}

This is what I came up with. I've looked around and the implementations always seem to be distinct. I've tried toying with a couple of examples (such as ABABACA) but both mine implementation and for instance this KMP prefix table seem to yield the same result.

Could anyone please tell me what is the logic mistake in my implementation and with what kind of inputs would this fail in producing a correct prefix table for the KMP algorithm?

Thanks


Solution

  • It is a characteristic of your algorithm that each entry in the table is either 0, or 1 more than the previous entry. So the challenge is to find a string where an entry in the table is less than the previous entry, but not 0.

    One such string is "ABACABABC" (which comes from this wikipedia article).

    The prefix tables are

    {0,0,1,0,1,2,3,2,0}  from the linked answer
    {0,0,1,0,1,2,3,0,0}  your proposed code
                   ^------different here
    

    The entries of interest are the 3 followed by a 2.

    Consider what happens when 7 characters match. The input string looks like

    ABACABA?    
    

    where the ? is the character that didn't match, so the ? is not a B. The ABA? could be a match for ABAC, so the prefix length is 3.

    Now consider what happens when 8 characters match:

    ABACABAB?
    

    where the ? is not a C. In this case AB? could match ABA, so the prefix length is 2.

    This demonstrates that the prefix table could have an entry that is less than the previous entry, but not 0.