Search code examples
algorithmpattern-matchingknuth-morris-pratt

KMP Table building algorithm


I checked the KMP table-building algorithm from Wikipedia but I don't understand the logic behind the second case of the while loop

(second case: it doesn't, but we can fall back)
        else if cnd > 0 then
            let cnd ← T[cnd]

I tried to build a table using this algorithm and it works perfectly. I understand that cnd ← T[cnd] helps to find the proper suffix length. What I don't understand is "how" it does it?

An explanation with an example would be nice.

Thanks!

Edit: I just found out that my question is a duplicate of the question here: "Partial match" table (aka "failure function") in KMP (on wikipedia)

I think I get the answer now. Still, one more explanation would be helpful. Thanks!


Solution

  • Suppose you have a string Hello World!!! and you want to search for Head Up.

    Hello World!!!
    Head Up
      ^
    

    When you are in the first and second character the first condition apply (first case: the substring continues), in the case of the marked position, the character don't match but you are already inside the a sub-string match (2 character matched until there), this case correspond to the second conditional (second case: it doesn't, but we can fall back). The third case would be when you miss match the first character of the pattern.

    The second condition is necessary because you could use the information of the matched character until the miss match, to avoid unnecessary comparison that you could already known the result (skip characters of the string that you already know that the beginning part of the pattern would don't match).

    Example: With string HeHello World!!! and searching for Hello

    HeHello World!!!
    Hello
      ^ when you miss match this character using the table of KMP you known that 
        could skip 2 characters because
    
    HeHello World!!!
     Hello
     ^ this would miss match
    

    In the case of building the pattern table for pattern HeHello. Suppose ^ is cnd and * is pos. The start point is pos = 2 and cnd = 0 (but when checking in the pattern is with pos - 1 = 1).

    HeHeHello     T [-1,0,0,0,0,0,0,0,0]
    ^* comparing 0 with 1 go to condition 3        cnd = 0, pos = 2
                            _
    HeHeHello     T [-1,0,0,1,0,0,0,0,0]
    ^ * comparing 0 with 2 go to condition 1       cnd = 0, pos = 3
                              _
    HeHeHello     T [-1,0,0,1,2,0,0,0,0]
     ^ * comparing 1 with 3 go to condition 1      cnd = 1, pos = 4
                                _
    HeHeHello     T [-1,0,0,1,2,3,0,0,0]
      ^ * comparing 2 with 4 go to condition 1     cnd = 2, pos = 5
                                  _
    HeHeHello     T [-1,0,0,1,2,3,4,0,0]
       ^ * comparing 3 with 5 go to condition 1    cnd = 3, pos = 6
    
    HeHeHello     T [-1,0,0,1,2,3,4,0,0]
        ^ * comparing 4 with 6 go to condition 2 (cnd = T[cnd], cnd = T[4] = 2)
    
    HeHeHello     T [-1,0,0,1,2,3,4,0,0]
      ^   * comparing 2 with 6 go to condition 2 (cnd = T[cnd], cnd = T[2] = 0)
    
    ...