Search code examples
javacollectionshashmaphashcodeequality

How does overriding equals and hashCode impacts storing data in a Map in Java with a bad hashing function?


I have a class, Employee, let's say, and my hashCode function for this class is really bad (let's say it always return a constant). My code looks like the following.

public class Employee {
 private String name;

 public Employee(String name) {
  this.name = name;
 }

 @Override
 public int hashCode() { return 1; }

 @Override
 public boolean equals(Object object) {
  if(null == object || !(object instanceof Employee)) {
   return false;
  }
  Employee other = (Employee)object;
  return this.name.equals(other.name);
 }
}

Let's say I want to use Employee as the key in a Map, and so I can do something like the following.

public static void main(String[] args) {
 Map<Employee, Long> map = new HashMap<>();
 for(int i=0; i < 1000; i++) {
  map.put(new Employee("john"+i, 1L));
 }
 System.out.println(map.size());
}

How come when I run this code, I always get 1,000 as the size?

Using Employee as a key seems to be "good" in the following sense.

  • It is immutable
  • Two employees that are equals always generate the same hash code

What I expected was that since the output of hashCode is always 1, then map.size() should always be 1. But it is not. Why? If I have a Map<Integer,Integer>, and I do map.put(1, 1) followed by map.put(1, 2), I would only expect the size to be 1.

The equals method must somehow be coming into play here, but I'm not sure how.

Any pointers are appreciated.


Solution

  • Your loop

    for(int i=0; i < 1000; i++) {
        map.put(new Employee("john"+System.currentTimeMillis(), 1L));
    }
    

    executes within a couple of milliseconds, so System.currentTimeMillis() will be returning the same value for the vast majority of the iterations of your loop. So, several hundred of your johns will have the exact same name + number.

    Then, we have java's retarded Map which does not have an add() method, (which one would reasonably expect to throw an exception if the item already exists,) but instead it only has a put() method which will either add or replace items, without failing. So, most of your johns get overwritten by subsequent johns, without any increase in the map size, and without any exception being thrown to give you a hint about what you are doing wrong.

    Furthermore, you seem to be a bit confused as to exactly what the effect of a bad hashCode() function is on a map. A bad hashCode() simply results in collisions. Collisions in a hashmap do not cause items to be lost; they only cause the internal structure of the map to not be very efficient. Essentially, a constant hashCode() will result in a degenerate map which internally looks like a linked list. It will be inefficient both for insertions and for deletions, but no items will be lost due to that.

    Items will be lost due to a bad equals() method, or due to overwriting them with newer items. (Which is the case in your code.)