Search code examples
algorithmhashrabin-karp

Is rabin-karp string search algorithm still correct if we neglect the modulo part and let hash int/long overflow?


I have a question: if we let rolling hash overflow, does it affect the correctness of Rabin-Karp algorithm? Could you give a solid example that the overflow indeed will affect correctness?

That is something like same string e.g. "abcd" will give different hash values when you directly compute from "abcd" or from "eabcd" (hash("eabc") - hash("e") * R^3) * R + hash("d")

hash("abcd") != (hash("eabc") - hash("e") * R^3) * R + hash("d") if we allow int/long overflow


Solution

  • In the case of using unsigned integers for rolling hash, unsigned overflow is equivalent to modding by 2^32 or 2^64, depending on the size of the unsigned type. So the answer to your question is yes, the algorithm will still be correct. (As an exercise, think about why will unsigned overflow be equivalent to modding?)

    In fact, you will see in many speedy implementations, they don't explicitly use modulo operations and use unsigned overflow as an implicit modulo operation for speed; as an example, see the sample implementation in C by Charras and Lecroq: https://www-igm.univ-mlv.fr/~lecroq/string/node5.html

    Still, the modulo operation is retained in pseudocode presentation simply because it is best to make such an operation explicit when presenting the algorithm for both ease of understanding and attention to detail.