Search code examples
javaalgorithmhashrabin-karp

Need help in understanding Rolling Hash computation in constant time for Rabin-Karp Implementation


I've been trying to implement Rabin-Karp algorithm in Java. I have hard time computing the rolling hash value in constant time. I've found one implementation at http://algs4.cs.princeton.edu/53substring/RabinKarp.java.html. Still I could not get how these two lines work.

txtHash = (txtHash + Q - RM*txt.charAt(i-M) % Q) % Q;
txtHash = (txtHash*R + txt.charAt(i)) % Q;  

I looked at couple of articles on modular arithmetic but no article could able to penetrate my thick skull. Please give some pointers to understand this.


Solution

  • This is the "rolling" aspect of the hash. It's eliminating the contribution of the oldest character (txt.charAt(i-M)), and incorporating the contribution of the newest character(txt.charAt(i)).

    The hash function is defined as:

                M-1
    hash[i] = ( SUM { input[i-j] * R^j } ) % Q
                j=0
    

    (where I'm using ^ to denote "to the power of".)

    But this can be written as an efficient recursive implementation as:

    hash[i] = (txtHash*R - input[i-M]*(R^M) + input[i]) % Q
    

    Your reference code is doing this, but it's using various techniques to ensure that the result is always computed correctly (and efficiently).

    So, for instance, the + Q in the first expression has no mathematical effect, but it ensures that the result of the sum is always positive (if it goes negative, % Q doesn't have the desired effect). It's also breaking the calculation into stages, presumably to prevent numerical overflow.