Search code examples
f#knuth-morris-pratt

F sharp KMP Algorithm is stuck in the first while loop if i use a pattern with the same characters at the first two indces


I am playing around with the KMP algorithm in f sharp. While it works for patterns like "ATAT" (result will be [|0; 0; 1; 2;|]) , the first while loop enters a deadlock when the first 2 characters of a string are the same and the 3rd is another, for example "AAT".

I understand why: first, i gets incremented to 1. now the first condition for the while loop is true, while the second is also true, because "A" <> "T". Now it sets i to prefixtable.[!i], which is 1 again, and here we go.

Can you guys give me a hint on how to solve this?

let kMPrefix (pattern : string) = 

    let (m : int) = pattern.Length - 1
    let prefixTable = Array.create pattern.Length 0
    // i : longest proper prefix that is also a suffix 
    let i = ref 0

    // j: the index of the pattern for which the prefix value will be calculated
    // starts with 1 because the first prefix value is always 0

    for j in 1 .. m do

        while !i > 0 && pattern.[!i] <> pattern.[j] do

            i := prefixTable.[!i]

        if pattern.[!i] = pattern.[j] then

            i := !i+1           


        Array.set prefixTable j !i  

    prefixTable

Solution

  • I'm not sure how to repair the code with a small modification, since it doesn't match the KMP algorithm's lookup table contents (at least the ones I've found on Wikipedia), which are:

    • -1 for index 0
    • Otherwise, the count of consecutive elements before the current position that match the beginning (excluding the beginning itself)

    Therefore, I'd expect output for "ATAT" to be [|-1; 0; 0; 1|], not [|0; 0; 1; 2;|].

    This type of problem might be better to reason about in functional style. To create the KMP table, you could use a recursive function that fills the table one by one, keeping track of how many recent characters match the beginning, and start running it at the second character's index.

    A possible implementation:

    let buildKmpPrefixTable (pattern : string) =
        let prefixTable = Array.zeroCreate pattern.Length
    
        let rec run startIndex matchCount =
            let writeIndex = startIndex + matchCount
            if writeIndex < pattern.Length then
                if pattern.[writeIndex] = pattern.[matchCount] then
                    prefixTable.[writeIndex] <- matchCount
                    run startIndex (matchCount + 1)
                else
                    prefixTable.[writeIndex] <- matchCount
                    run (writeIndex + 1) 0
        run 1 0
        if pattern.Length > 0 then prefixTable.[0] <- -1
        prefixTable
    

    This approach isn't in danger of any endless loops/recursion, because all code paths of run either increase writeIndex in the next iteration or finish iterating.

    Note on terminology: the error you are describing in the question is an endless loop or, more generally, non-terminating iteration. Deadlock refers specifically to a situation in which a thread waits for a lock that will never be released because the thread holding it is itself waiting for a lock that will never be released for the same reason.