I'm looking at the pseudo-code given in figure 3 of the original paper introducing suffix arrays "SUFFIX ARRAYS: A NEW METHOD FOR ON-LINE STRING SEARCHES".
I can't figure out the logic for lines 4 and 5 (indexing from 0). The lines reads:
else if r < P or wr ≤ aPos[N-1]+r then
LW ← N
W
is the pattern of length 'P' we're looking for and r
is lcp(A[pos[N-1]:], W)
. The problem is that in almost all cases, this lcp
will be less than length of 'W'. This conditional is meant to handle the case (I think) that the pattern is lexicographically larger than the lexicographically greatest suffix in the array, but it doesn't test this at all. Lines 2 & 3, on the other hand, which test if W
is less than the lexicographically smallest suffix seem to make perfect sense
if l = P or wl ≤ aPos[0]+l then
LW ← 0
I believe that the original lines should read something like:
else if r < P and wr > aPos[N-1]+r then
LW ← N
The only way that W
can be greater than A[pos[N-1]:]
is if it has an lcp
shorter than the length of the pattern (otherwise, all of W
matches and so W
cannot be greater, only lesser or equal to the thing with which we're sharing the lcp
) AND if the character after the lcp
is greater in W
than in A[pos[N-1]]
. Does this seem to make sense? Is this an error in the original paper? If not, can someone please explain to me how I'm misinterpreting the original code?
I think you understand the paper correct and in fact it has an error.
Consider the following example: let A = banana
, W = nana
. Then A[pos[N-1]:] = nana
. Algorithm sets LW = N
or even fails while in fact it's N-1
.