I am looking for ways to compute a unique hash for a given String in Java. Looks like I cannot use MD5 or SHA1 because folks claim that they are broken and do not always guarantee uniqueness.
I should get the same hash (preferably a 32 character string like the MD5 Sum) for two String objects which are equal by the equals()
method. And no other String should generate this hash - that's the tricky part.
Is there a way to achieve this in Java?
If guaranteed unique hash code is required then it is not possible (possible theoretically but not practically). Hashes and hash codes are non-unique.
A Java String of length N has 65536 ^ N possible states, and requires an integer with 16 * N bits to represent all possible values. If you write a hash function that produces integer with a smaller range (e.g. less than 16 * N bits), you will eventually find cases where more than one String hashes to the same integer; i.e. the hash codes cannot be unique. This is called the Pigeonhole Principle, and there is a straight forward mathematical proof. (You can't fight math and win!)
But if "probably unique" with a very small chance of non-uniqueness is acceptable, then crypto hashes are a good answer. The math will tell you how big (i.e. how many bits) the hash has to be to achieve a given (low enough) probability of non-uniqueness.
Updated : check this another good answer : What is a good 64bit hash function in Java for textual strings?