Search code examples
c#javaalgorithmhashgethashcode

Hash code non-zero initial value - note: I am not asking about primes


This is kind of an academic point, but I feel I don't fully understand hash codes if I don't understand why this is recommended by books such as Effective Java and many SO questions.

Suppose:

public sealed class Point
{
    private readonly int x;
    private readonly int y;

    //constructor ommited

    //equals ommited
    
    public override int GetHashcode()
    {
       int hash = 17; //why should the initial value be non-zero?
       unchecked
       {
         hash = hash * 31 + x; //do not tell me why I should use primes - that is not the question
         hash = hash * 31 + y;
         return hash;
       }
    }
}

Now, supposedly, the reason for the initial value is that it reduces collisions where one of the components is zero.

I'm struggling to find any example where this helps.

Here is one example of a collision, but having an initial value makes no odds.

x   y   Hash Without initial value     Hash With initial value  
0   31  31                             16368                
1   0   31                             16368                

Ideally, I'm looking for a concrete example where the initial value prevents a collision.

My theory on why an initial value can never make a difference

//Given a prime p, initial value i, fields a,b,c, calculate hash h
h = i;
h = h*p + a;
h = h*p + b;
h = h*p + c;

Therefore:

h = ((i*p + a)*p + b)*p + c
  = (ipp + ap + b   )*p + c
  = ippp + app + bp + c

Therefore the inital value i will effect all hash codes in the same way by producing a constant value, in this case i*p3.


Solution

  • The choice of initial value can never make a difference to the hash.

    Example:

    //Given a prime p, initial value i, fields a,b,c, calculate hash h
    h = i;
    h = h*p + a;
    h = h*p + b;
    h = h*p + c;
    h = h % 2^32;
    

    Therefore:

    h = (((ip  + a) * p + b) * p + c) % 2^32
      = (( ip² + ap     + b) * p + c) % 2^32
      = (  ip³ + ap²    + bp     + c) % 2^32
      = ip³ % 2^32 + (ap² + bp + c) % 2^32
    

    Therefore the initial value i will effect all hash codes in the same way by adding a constant value to the hash, in this case i*p³ % 2^32.