Search code examples
javastringjava-8hashcode

String hashCode() documentation vs implementation


Below is the source code snippet of String.hashCode() method from Java 8 (1.8.0_131 to be precise)

/**
 * Returns a hash code for this string. The hash code for a
 * {@code String} object is computed as
 * <blockquote><pre>
 * s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
 * </pre></blockquote>
 * using {@code int} arithmetic, where {@code s[i]} is the
 * <i>i</i>th character of the string, {@code n} is the length of
 * the string, and {@code ^} indicates exponentiation.
 * (The hash value of the empty string is zero.)
 *
 * @return  a hash code value for this object.
 */
public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        char val[] = value;

        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];
        }
        hash = h;
    }
    return h;
}

You can see that, the documentation says, that hashCode() is computed using below formula

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

while the actual implementation is different

for (int i = 0; i < value.length; i++) {
    h = 31 * h + val[i];
}

Am I missing any obvious thing? Please help me.


Solution

  • The implementation is correct, with the caveat that integer overflow may occur (which is ok here, it doesn't harm anything). It's using Horner's method for polynomial evaluation.

    Here's the steps on a sample string "CAT".

    h = 0
    

    First loop:

    i = 0
    h = 31 * 0 + 'C' (67) = 67
    

    Second loop:

    i = 1
    h = 31 * 67 + 'A' (65) = 2142
    

    Third loop:

    i = 2
    h = 31 * 2142 + 'T' (84) = 66486
    

    Let's derive the formula from the code. Here, n is the index of i into the string s. Each iteration of the for loop performs this formula.

    hn = 31hn-1 + sn

    h0 /* after loop i = 0 */ = s[0]
    h1 /* after loop i = 1 */ = 31*h0 + s[1] = 31*s[0] + s[1]
    h2 /* after loop i = 2 */ = 31*h1 + s[2] = 31*(31*s[0] + s[1]) + s[2]
    h = 31*31*s[0] + 31*s[1] + s[2]
    

    The exponents you see for the powers of 31 arise because each loop multiplies in another factor of 31 before adding the value of the next character.