Search code examples
javamathintegerinteger-overflow

Reverting an hash code "sum"


So, rather than calculating the hash code every time, I thought I could keep an integer updated with all the changes. Imagine a setter for a property that contributes to the hash code:

public void setFoo(Foo newFoo){
    this.hashCalculator.remove(this.foo.hashCode()); // remove old hash code
    this.hashCalculator.add(newFoo.hashCode()); // add new hash code
    this.foo = newFoo; // set new foo
}

(I hope I'm not doing something stupid) I thought it would be simple math, but I am failing at implementing it. I think it has something to do with overflowing integers but I'm not sure. Clearly I'm missing something. The remainder from the division should probably be added back to the value, right?

This is the code I have. At the end the result should be 1, but it's not what I get.

class HashCode
{
    public int value = 1;

    public void add(int val){
        // as suggested by effective java
        value = value * 37 + val;
    }

    public void remove(int val){
        value = (value - val) / 37;
    }
}

HashCode o = new HashCode();

for(int a = 0; a < 1000; a++){
    o.add(a);
}

for(int r = 0; r < 1000; r++){
    o.remove(r);
}

System.out.println(o.value); // should be 1

Solution

  • First: there is no way to properly reverse an operation that results in an integer overflow. Basically the problem is that you cannot know if the integer overflow happened once, twice or even more often in the previous operation. Therefore you cannot get the original value present before the last hash calculation.

    Second: the hash calculation depends on the order of applied hash values.

    ((1 * 37 + 0) * 37 + 1) * 37 + 2 = 50692
    ((1 * 37 + 2) * 37 + 1) * 37 + 0 = 53428
               ^         ^         ^ the hash values.
    

    Since the hash value change of appending the last value depends on all previous hash values you cannot just change one intermediary hash value, there is no (well performing) way to remove the impact the 1 in my previous example had on all future computations.

    This should have been obvious if you tested your loops with 1, 2, 3, etc. instead of 1000 to begin with. Without integer overflow your loops would have only worked if you remove the hash values in the opposite order you added them. That is a restriction that does not make sense to apply in "real-life". What @JB Nizet wrote in his comment is correct in my opinion.