Search code examples
stringalgorithmcomputer-sciencestring-matchingknuth-morris-pratt

What is the purpose of the second while loop in KMP when building the partial match table?


When building the partial match table for KMP:

void buildBackTable() { 
    int i = 0, j = -1; b[0] = -1; 
    while (i < m) { 
        while (j >= 0 && P[i] != P[j]) j = b[j]; //Why is this a while loop!?
        i++; j++;
        b[i] = j;
    }
}   

How come that second while loop is not a conditional instead? Such as:

void buildBackTable() { 
    int i = 0, j = -1; b[0] = -1; 
    while (i < m) { 
        if (j >= 0 && P[i] != P[j]) j = b[j]; //Why not?
        i++; j++;
        b[i] = j;
    }
}   

In most of the examples I've tried, this statement is used to reset j to -1 when there is no match, so when the next iteration comes, we compare the first character of the string (P[j]) with the character at position i (P[i]).


Solution

  • Just to clarify first, in OP's notations m stands for length of string P. Note that b is of length m+1. (this is one of the implementations, though by far not the most natural in terms of notations, which probably caused the confusion)

    Let's see what is buildBackTable() is purposed to result in. Essentially, for produced array b, b[i] will store the maximum value j, such that j < i, and P[0..j-1] == P[i-j..i-1], b[0]=-1.

    It is not so difficult to understand the following properties:

    1. b[i] <= b[i-1] + 1
    2. if b[i] = j, then for any k < j, s.t P[0..k-1] == P[i-k..i-1], we know that b[j] >= k. At the same time, obviously b[i] > b[j] or b[i] = 0.

    What this means is that we can easily enumerate the k values from largest to smallest, by going through b[i] -> b[b[i]] -> b[b[b[i]]]... and so on.

    This enumeration allows us to jump through the prefixes of P and find maximum value j, s.t. P[0..j-1] == P[i-j..i-1] and P[i] == P[j].

    As for ur question, when conditional check (without loop fails), just consider the string aaaac. Then b[4] = 3. But b[5] = 0. With only conditional check, you would get that b[5]=3.