I was reading the implementation of equals() and hashCode() in Java and came to a conclusion that there is possibility of collision (I never faced the collision in my case) but I was thinking to find a better way to get some unique identifier like GUID for an object that I have -
I have [CustomClass1 cc1, CustomClass2 cc2 and a Set scc3] coming in from a method and I am wrapping them in an object1 and doing object1.hashCode() and the hashCode() is working fine (handling the order of objects in the SET as well, so I am getting unique hash code even when the order of object in set is different say 1,2,3 or say 2,3,1 as long as they are "1", "2" and "3" but irrespective of order)
But I was thinking to find a better way to this - convert object1 to a GUID instead of object1.hashCode().
Is there a better way? If yes, share your thoughts.
There isn't a pragmatic better way.
Given a hashing algorithm that just isn't designed well, clashes are common, and it doesn't matter if the hash value is 32-bit (as java's hashCode()
) or 128-bit (UUIds).
Java's built in types, such as String or List or HashSet or whatnot, do as well a job as is reasonable, so you shouldn't worry about them being badly designed. String's hashCode could be better, but changing it would be backwards incompatible. That isn't the say the current algorithm baked into java.lang.String
is particularly bad either, though. It's fine.
Once we've moved on from that, a clash in 32-bit space is highly unlikely. It happens once in, well, 4 billion entries. A clash is no problem - java doesn't flat out do the wrong thing. Try it!
class StupidHashCodeImpl {
private final String name;
public StupidHashCodeImpl(String name) {
if (name == null) throw new NullPointerException("name");
this.name = name;
}
@Override public String toString() {
return this.name;
}
@Override public int hashCode() {
return 1; // all StupidHashCodeImpl hashes collide!
}
@Override public boolean equals(Object other) {
if (other == null || other.getClass() != StupidHashCodeImpl.class) return false;
return this.name.equals(((StupidHashCodeImpl) other).name);
}
}
and then:
Set<StupidHashCodeImpl> set = new HashSet<>();
set.add(new StupidHashCodeImpl("foo"));
set.add(new StupidHashCodeImpl("bar"));
System.out.println(set.size()); // prints 2 as expected.
set.contains(new StupidHashCodeImpl("foo")); //true
set.contains(new StupidHashCodeImpl("bar")); //true
set.contains(new StupidHashCodeImpl("baz")); //false
See? Works great. No problems here, at all. The rules are clear:
a.equals(b)
, then it must hold that a.hashCode() == b.hashCode()
- failure to adhere to this rule means hashset starts handing out wrong answers.a.hashCode() == b.hashCode()
, then it's totally fine if !a.equals(b)
. HashSet deals with this just fine.The only downside to a 'bad hashing' algorithm as above, is that the map/set becomes inefficient. Normally, if you shove a million keys into a hashset and then do some set.contains()
operations, those are significantly faster versus shoving a million entries in a list and running .contains()
on that. However, if you shove a million entries in a set and they all have colliding hashes, then HashSet is no better (in fact, somewhat worse) in performance than list, for obvious reasons.
But, UUIDs don't solve this problem. At all:
You'd have to write it and do some performance testing with JMH to be sure, but I'd bet quite a lot on the idea that a UUID-based HashSet impl is significantly slower than java's own HashSet.
To write your own 'good' hash algorithm, it's simple:
Use Project Lombok's @Value
or @Data
or @EqualsAndHashCode
and call it a day. And if you don't want to do that, or you just want to know how it works, click the link and check the example code which shows precisely what's going on. Alternatively, ask your IDE - most of them (including the 2 most popular IDEs, IntelliJ and eclipse) have 'generate equals and hashCode' somewhere in their menu structures. They do something similar to lombok.
The general gist is:
someArray.hashCode()
which doesn't do what you think it does, instad call Arrays.deepHashCode(someArray)
. hash up a long
by XORing the higher 32 bits and the lower 32 bits together. Hash booleans by picking 2 arbitrary numbers (one for true
and one for false
).That's all there is to it.
This will for example ensure that 1,2,3 and 2,3,1 have different hashCodes.