Search code examples
javastringknuth-morris-pratt

Why does String.indexOf() not use KMP?


I read the source code of java.lang.String and I was surprised to find that String.indexof() does not use the Knuth–Morris–Pratt algorithm? As we know, KMP is more effective. So why isn't it used in String.indexOf()?

Someone around me told me that for short string KMP is good enough, but if you need performance and you intend to use with large strings then is not a good choice. However he didn't tell me the details.

So, here are my questions:

  1. why don't we use KMP in String.indexOf()?
  2. why is KMP not a good choice with large Strings?

Solution

  • KMP has better worst-case performance, but actually requires a little bit of up-front computation (to generate the table of offsets). It also requires an initial memory allocation, which could also impact performance.

    For (presumably) common use-cases of searching in relatively short strings, this might actually end up slower than the primitive implementation.

    This, bundled with the fact that for really huge data sets you will probably be using more specialized data structures than a simple String means that the increased implementation (and possibly runtime) cost is not worth investing.

    Note that this might change in future Java versions, as the actual algorithm is not specified.