I've been trying to implement Rabin-Karp algorithm in Java. I have hard time computing the rolling hash value in constant time. I've found one implementation at http://algs4.cs.princeton.edu/53substring/RabinKarp.java.html. Still I could not get how these two lines work.
txtHash = (txtHash + Q - RM*txt.charAt(i-M) % Q) % Q;
txtHash = (txtHash*R + txt.charAt(i)) % Q;
I looked at couple of articles on modular arithmetic but no article could able to penetrate my thick skull. Please give some pointers to understand this.
This is the "rolling" aspect of the hash. It's eliminating the contribution of the oldest character (txt.charAt(i-M)
), and incorporating the contribution of the newest character(txt.charAt(i)
).
The hash function is defined as:
M-1
hash[i] = ( SUM { input[i-j] * R^j } ) % Q
j=0
(where I'm using ^
to denote "to the power of".)
But this can be written as an efficient recursive implementation as:
hash[i] = (txtHash*R - input[i-M]*(R^M) + input[i]) % Q
Your reference code is doing this, but it's using various techniques to ensure that the result is always computed correctly (and efficiently).
So, for instance, the + Q
in the first expression has no mathematical effect, but it ensures that the result of the sum is always positive (if it goes negative, % Q
doesn't have the desired effect). It's also breaking the calculation into stages, presumably to prevent numerical overflow.