Search code examples
algorithmpattern-matchingknuth-morris-pratt

How to use KMP failure function to determine minimum length repeated substring?


I want to solve UVA 10298 -"Power Strings" problem using KMP algorithm. In this blog a technique is shown how failure function can be used to calculate minimum length repeated substring. The technique is as follows:

  1. Compute prefix-suffix table pi[ ] for the given string.
  2. Let len be the string length, and last_in_pi be the value stored at the last index of pi table.
  3. Check whether len % (len - last_in_pi) == 0 is true or not. If it is true then the length of the minimum length repeated substring is (len - last_in_pi), otherwise it is the length of the given string.

I understand what is failure function and how it is used to find pattern in a text but I am struggling to understand proof of correctness of this technique.


Solution

  • Remember that Pi[i] is defined as the (length of the) longest prefix of your_string that is a proper suffix (so not the whole string) of the substring your_string[0 ... i].

    There is an example on the blog post you linked to:

        0 1 2 3 4 5
    S : a b a b a b
    Pi: 0 0 1 2 3 4
    

    Where we have:

    a b a

    a b a b

    Etc. I hope this makes it clear what Pi (the prefix function / table) does.

    Now, the blog says:

    The last value of prefix table = 4.. Now If it is a repeated string than , It’s minimal length would be 2. (6(string length) – 4) , Now

    So you have to check if len % (len - last_in_pi) == 0. If yes, then len - last_in_pi is the length of the shortest repeated string (the period string).

    This works because, if you rotate a string with len(period) positions either way, it will match itself. len - last_in_pi tells you how much you'd need to rotate.