Search code examples
javahashrabin-karp

Confusion Regarding Rolling Hash in Rabin-Karp Algorithm Java


I was trying to understand the Rabin-Karp algorithm here: http://algs4.cs.princeton.edu/53substring/RabinKarp.java.html.

I have looked through various articles and I now know that the general form of a polynomial hash is C1*A^k-1+C2*A^k-2+C3*A^k-3. Looking at the code, I understand how they add and subtract the digits in the string.

txtHash = (txtHash + Q - RM*txt.charAt(i-M) % Q) % Q; txtHash = (txtHash*R + txt.charAt(i)) % Q;

Here the program is subtracting the leading digit, multiplying the entire hash and then adding the new digit. However, when I was looking through the function that calculates the hash, it didn't follow the general form of a polynomial hash. It looked like this:

 private long hash(String key, int M) { 
    long h = 0; 
    for (int j = 0; j < M; j++) 
        h = (R * h + key.charAt(j)) % Q; 
    return h; 
} 

In this function they are multiplying the hash and the radix and then adding the key.charAt(). I would figure the function would be multiplying the key.charAt() with a base that starts out at R^k-1. Then as the for loop continues, the base would divided by R to provide for the decreasing power in the polynomial. Can someone please explain how this function works and how does it generate a hash in the form that I mentioned above? Thanks!


Solution

  • Suppose hash function need to transfer 3 digits. It would look like :

    {digits[0]*R^2+digits[1]*R^1+digits[2]}%Q  
    = {(digit[0]*R^1+digits[1])*R+digits[2]}%Q  
    

    This would make hash function mush more easier to calculate.

    Then apply to Rabin-Karp Algorithm,
    you can see

    RM = R^2 %Q;(M=2) 
    

    when you want to move next digit to validate,
    you need to delete left most digit and add next digit.

    txtHash = {[txtHash - R^2*most_left_digit(equal charAt(i-M))]*R+next_digit(equal charAt(i))}%Q  
    

    It's the same as

    txtHash = (txtHash + Q - RM*txt.charAt(i-M) % Q) % Q; 
    txtHash = (txtHash*R + txt.charAt(i)) % Q;
    

    Mod Q every steps prevent overflow.