I am playing around with the KMP algorithm in f sharp. While it works for patterns like "ATAT" (result will be [|0; 0; 1; 2;|]) , the first while loop enters a deadlock when the first 2 characters of a string are the same and the 3rd is another, for example "AAT".
I understand why: first, i gets incremented to 1. now the first condition for the while loop is true, while the second is also true, because "A" <> "T". Now it sets i to prefixtable.[!i], which is 1 again, and here we go.
Can you guys give me a hint on how to solve this?
let kMPrefix (pattern : string) =
let (m : int) = pattern.Length - 1
let prefixTable = Array.create pattern.Length 0
// i : longest proper prefix that is also a suffix
let i = ref 0
// j: the index of the pattern for which the prefix value will be calculated
// starts with 1 because the first prefix value is always 0
for j in 1 .. m do
while !i > 0 && pattern.[!i] <> pattern.[j] do
i := prefixTable.[!i]
if pattern.[!i] = pattern.[j] then
i := !i+1
Array.set prefixTable j !i
prefixTable
I'm not sure how to repair the code with a small modification, since it doesn't match the KMP algorithm's lookup table contents (at least the ones I've found on Wikipedia), which are:
Therefore, I'd expect output for "ATAT"
to be [|-1; 0; 0; 1|]
, not [|0; 0; 1; 2;|]
.
This type of problem might be better to reason about in functional style. To create the KMP table, you could use a recursive function that fills the table one by one, keeping track of how many recent characters match the beginning, and start running it at the second character's index.
A possible implementation:
let buildKmpPrefixTable (pattern : string) =
let prefixTable = Array.zeroCreate pattern.Length
let rec run startIndex matchCount =
let writeIndex = startIndex + matchCount
if writeIndex < pattern.Length then
if pattern.[writeIndex] = pattern.[matchCount] then
prefixTable.[writeIndex] <- matchCount
run startIndex (matchCount + 1)
else
prefixTable.[writeIndex] <- matchCount
run (writeIndex + 1) 0
run 1 0
if pattern.Length > 0 then prefixTable.[0] <- -1
prefixTable
This approach isn't in danger of any endless loops/recursion, because all code paths of run
either increase writeIndex
in the next iteration or finish iterating.
Note on terminology: the error you are describing in the question is an endless loop or, more generally, non-terminating iteration. Deadlock refers specifically to a situation in which a thread waits for a lock that will never be released because the thread holding it is itself waiting for a lock that will never be released for the same reason.