I'm working on a problem where I want to store objects consisting of two integer arrays of equal length (e.g. int a[] ={1,2,3,4}
and int b[] ={1,2,2,6}
) in a data structure (e.g. hashmap) . However, for different objects, the length of the two arrays can vary. Both arrays are consisting of integers from a given interval (e.g. between 0-200).
For storing the objects with the two arrays, I want to assign a single hashkey that is fast to compute, preserves both sequences, and will lead to minimal collisions.
I have first tried using Arrays.deepHashCode(int[][])
but quickly found collisions. Second, I tried to distribute the values in the arrays more equally, by changing a[i] and b[i] to new values so that a_new[i] = Math.pow(31,a[i]) % Math.pow(2,16)
(actually using BigInteger to avoid overflow: BigInteger.valueOf(31).modPow(BigInteger.valueOf(a[i]), BigInteger.valueOf(Math.pow(2,16)))
; using BigInteger. Since the interval of values is limited I can pre-calculate it for each possible value. As a result I came up with following solution:
int result = 31;
for (int i = 0; i < a.length; i++) {
result = result * 31 * a_new[i];
result = result * 31 * b_new[i];
}
This solution seems to work when there are only smaller arrays, but once a[] and b[] can contain up to 10 values it also leads to collisions. Now I wonder, if there is a better way to achieve what i want with less collisions.
Edit: I fixed it to use a proper java code for the powers
Thanks for all the responses and ideas. Eventually, I came up with a different solution that seems to work for me and has so far not led to any collisions.
Instead of creating a single hash for both arrays, I decided to create two different hashes. One hash is based on the set of integers in the arrays and the other one is based on the sequences (same as in my question). Both hashes are then used as keys in a MultiKeyMap (Apache Commons Collections 4.4 API), where I store the data connected to the two arrays.
Sequence-based hash:
int result = 31;
for (int i = 0; i < a.length; i++) {
result = result * 31 * a_new[i];
result = result * 31 * b_new[i];
}
return result;
Set-based hash:
int resultA = 31;
int resultB = 179;
for (int i = 0; i < a.length; i++) {
resultA += a_new[i];
resultB += b_new[i];
}
return resultA *31 * resultB;