Search code examples
stringalgorithmsubstringbig-oknuth-morris-pratt

When is good to use KMP algorithm?


I understand that KMP algorithm depends on the helper array that there are prefixes that are similar to suffixes. It won't efficient when the above condition is not fulfilled as in the helper array contains all zeroes. Would the runtime be O(m + n) ? If I am right, what is a better substring algorithm in this case?


Solution

  • To understand when KMP is a good algorithm to use, it's often helpful to ask the question "what's the alternative?"

    KMP has the nice advantage that it is guaranteed worst-case efficient. The preprocessing time is always O(n), and the searching time is always O(m). There are no worst-case inputs, no probability of getting unlucky, etc. In cases where you are searching for very long strings (large n) inside of really huge strings (large m), this may be highly desirable compared to other algorithms like the naive one (which can take time Θ(mn) in bad cases), Rabin-Karp (pathological inputs can take time Θ(mn)), or Boyer-Moore (worst-case can be Θ(mn)). You're right that KMP might not be all that necessary in the case where there aren't many overlapping parts of the string, but the fact that you never need to worry about whether there's a bad case is definitely a nice thing to have!

    KMP also has the nice property that the processing can be done a single time. If you know you're going to search for the same substring lots and lots of times, you can do the O(n) preprocessing work once and then have the ability to search in any length-m string you'd like in time O(m).