Search code examples
javarandomhashhashcode

Random integer as hashCode


Is it a good idea to generate a random number in the constructor and return this value from hashCode method? There is a chance for collisions, but this applies when writing your own hashCode method. So what are the drawbacks? When using this object in a HashMap it will be stored with the random number as a hash and then retrieved by the same. If there are collisions they will be resolved by equals.


Solution

  • Basic answer

    In the typical use case, you want to retrieve not with the same (in terms of identity) key object that was used to insert it in the hashing collection, but with an object which is logically equivalent (in terms of equal).

    Thus, you want the following code to find the element in the map

    Key k1 = new Key("param1", "param2");
    Key k2 = new Key("param1", "param2");
    Value val1 = new Value();
    
    map.put(k1, value);
    
    Value val2 = map.get(k2);
    

    If hashCode is based solely on random number, than k1 and k2 have different hashes, and thus may point you to different buckets of the HashMap.

    We showed that for objects implementing equals, random hashCode is a poor idea. This is the reasoning behind the part of hash code contract:

    If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

    A bit more advanced part:

    Let's take a look at Object.hashCode implementation. Note that the identity hashCode() implementation is dependant on the JVM.

    Looking at JDK sources, get_next_hash offers six methods to calculate hashCode:

    0. A randomly generated number.
    1. A function of memory address of the object.
    2. A hardcoded 1 (used for sensitivity testing.)
    3. A sequence.
    4. The memory address of the object, cast to int.
    5. Thread state combined with xorshift (https://en.wikipedia.org/wiki/Xorshift)
    

    Code for reference:

    static inline intptr_t get_next_hash(Thread * Self, oop obj) {
      intptr_t value = 0 ;
      if (hashCode == 0) {
         // This form uses an unguarded global Park-Miller RNG,
         // so it's possible for two threads to race and generate the same RNG.
         // On MP system we'll have lots of RW access to a global, so the
         // mechanism induces lots of coherency traffic.
         value = os::random() ;
      } else
      if (hashCode == 1) {
         // This variation has the property of being stable (idempotent)
         // between STW operations.  This can be useful in some of the 1-0
         // synchronization schemes.
         intptr_t addrBits = cast_from_oop<intptr_t>(obj) >> 3 ;
         value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ;
      } else
      if (hashCode == 2) {
         value = 1 ;            // for sensitivity testing
      } else
      if (hashCode == 3) {
         value = ++GVars.hcSequence ;
      } else
      if (hashCode == 4) {
         value = cast_from_oop<intptr_t>(obj) ;
      } else {
         // Marsaglia's xor-shift scheme with thread-specific state
         // This is probably the best overall implementation -- we'll
         // likely make this the default in future releases.
         unsigned t = Self->_hashStateX ;
         t ^= (t << 11) ;
         Self->_hashStateX = Self->_hashStateY ;
         Self->_hashStateY = Self->_hashStateZ ;
         Self->_hashStateZ = Self->_hashStateW ;
         unsigned v = Self->_hashStateW ;
         v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ;
         Self->_hashStateW = v ;
         value = v ;
      }
    
      value &= markOopDesc::hash_mask;
      if (value == 0) value = 0xBAD ;
      assert (value != markOopDesc::no_hash, "invariant") ;
      TEVENT (hashCode: GENERATE) ;
      return value;
    }
    

    Both OpenJDK 7 and OpenJDK 6 use the first method, a random number generator.

    OpenJDK 8 changed the default to 5 - Thread state combined with xorshift.

    Note that all of these implementations of Object.hashCode are consistent with Object.equals (in terms of hash code contract)

    To sum up, you would be implementing one of the strategies behind Object.hashCode, but taking a risk of breaking the contract if you implement equals.