I'm studying Item 9, Effective Java [Always override hashcode() when you override equals].
I have a few queries regarding the points made by author :
A nonzero initial value is used in step 1 so the hash value will be affected by initial fields whose hash value, as computed in step 2.a, is zero. If zero were used as the initial value in step 1, the overall hash value would be unaffected by any such initial fields, which could increase collisions. The value 17 is arbitrary.
Step 2.a is:
For each significant field f in your object (each field taken into account by the equals method, that is), do the following: a. Compute an int hash code c for the field:
i. If the field is a boolean ,compute (f ? 1 : 0) .
ii. If the field is a byte , char , short , or int , compute (int) f .
iii. If the field is a long , compute (int) (f^ (f >>> 32)) .
iv. If the field is a float , compute Float.floatToIntBits(f) .
v. If the field is a double , compute Double.doubleToLongBits(f) , and then hash the resulting long as in step 2.a.iii.
vi. If the field is an object reference and this class’s equals method compares the field by recursively invoking equals , recursively invoke hashCode on the field. If a more complex comparison is required, compute a “canonical representation” for this field and invoke hashCode on the canonical representation. If the value of the field is null , return 0 (or some other constant, but 0 is traditional).
vii. If the field is an array, treat it as if each element were a separate field. That is, compute a hash code for each significant element by applying these rules recursively, and combine these values per step 2.b. If every element in an array field is significant, you can use one of the Arrays.hashCode methods added in release 1.5.
Suppose result is calculated as:
result = 31 * result + areaCode;
result = 31 * result + prefix;
result = 31 * result + lineNumber;
In case initial value of result is 0 and all given fields above are 0, result would remain 0. But, even if the result isn't 0 initially, result would amount to the same constant every time the initial fields are 0 which would be: 31*(31*(31*17)). How would this value help in decreasing collisions?
Many classes in the Java platform libraries, such as String , Integer , and Date , include in their specifications the exact value returned by their hashCode method as a function of the instance value. This is generally not a good idea, as it severely limits your ability to improve the hash function in future releases. If you leave the details of a hash function unspecified and a flaw is found or a better hash function discovered, you can change the hash function in a subsequent release, confident that no clients depend on the exact values returned by the hash function.
What does he means by saying that the exact value returned by hashCode is a function of the instance value?
Thanks in advance for any help.
Hash collision is primarily achieved by a good distribution across the whole hash range (here the integer type).
By defining 0 as the initial value for calculating the hash result, you have a somewhat restricted distribution in a small range. Objects that differ in a minor way - maybe in some field only - produce hash codes that are not far away from each other. This makes hash collisions more likely.
By defining a non-zero initial value, you simply increase the gaps between calculated hash codes for objects that differ only in a minor way. So you better utilize the hash range and effectively make hash collisions more unlikely.
It simply means that you should calculate the hash code by using the object's value, i.e. the values of its fields. You already did it in your example, and I think that you already implicitly understood it.
But: Joshua Bloch intended to say something else with this paragraph: He wanted to warn you about not documenting the exact function how the hash code is calculated. If you do so, you restrict yourself to not being able anymore to change the implementation in future releases because some users might expect a specific implementation, and you would break some code depending on yours.