Search code examples
algorithmstring-matching

Prefix table for KMP algorithm


I am going through KMP algorithm. Even though this algorithm is clear to understand, I have one doubt here.

prefix table algorithm:

void prefixTable(char p[], int m){
     int i=1, j=0, F[0] = 0;
     while(i<m){
        if(p[i]==p[j]){
            F[i]=j+1;
            i++;
            j++;
        }else if(j>0){
            j=F[j-1];
        }else{
            F[i]=0;
            i++;
        }
     }
}

enter image description here

As shown in above image step 5, i=5, j=3, j=F[j-1] is executed as j>0 .

why should we take F[j-1]? why can't we use directly F[0]? How does it make sure correctness of algorithm?


Solution

  • j is the position in the pattern.

    if the pattern is processed up to a certain position > 0, then we cannot shift the pattern to the first (0) position if the pattern contains a prefix of itself.

    Applied to your example: The pattern is ababaca. Try to find it in a text abababaca:

    • the algorithm will process the text until ababa|baca where the pattern is ababa|c
    • setting j to F[0] = 0, means setting the pattern to |ababac which will never match baca (note that i will not be changed)
    • setting j to F[4] = 3, means setting the pattern to aba|bac which will match baca
    • after the match the pattern is in state ababac| and the text in state abababac|a and it gets clear that the found pattern is ab[ababac]a