The LPS (Longest Proper Prefix which is also a Suffix) algorithm goes as follows:
public static int[] constructLPSArray(String s) {
int n = s.length();
int[] arr = new int[n];
int j = 0;
for (int i = 1; i < n; ) {
if (s.charAt(i) == s.charAt(j)) {
arr[i] = j + 1;
i++;
j++;
} else {
if (j != 0) {
j = arr[j - 1];
} else {
i++;
}
}
}
return arr;
}
The if (s.charAt(i) == s.charAt(j))
part looks clear, but the else
part is unclear.
Why do we do:
if (j != 0) {
j = arr[j - 1];
} else {
i++;
}
More specifically, why does j = arr[j - 1]
work ? Or why do we even do it? How do we validate the correctness of this step?
Let's say we are parsing an array of characters with i
and j
positioned like this:
a b a b x x a b a b ...
^ ^
j i
with arr
holding:
0 0 1 2 0 0 1 2 3 4
i. e., the length of the longest prefix/suffix of each substring of s of that length until i
. You can probably guess how that was generated from the rest of the algorithm. Now, if the next character after i
does not match the next character after j
,
a b a b x x a b a b a ...
^ ^
j i
we don't have to retry the matching, because we know the longest prefix/suffix of our previous prefix/suffix! Looking up arr[j - 1]
yields 2 – so we essentially cached the information that the parts highlighted here
A B a b x x a b A B a ...
=== ^ === ^
j i
are identical, and don't need to be compared again!