I am working on an assignment where I am asked to implement a shift fold function for hashing String type keys for records in a database and returning the position of that record in the database. To my understanding, this means that the sfold function has to produce a hash that matches the position of that record in the database.
The code for the method is as follows:
public static long sfold(String s, int M) {
int intLength = s.length() / 4;
long sum = 0;
for (int j = 0; j < intLength; j++) {
char c[] = s.substring(j * 4, (j * 4) + 4).toCharArray();
long mult = 1;
for (int k = 0; k < c.length; k++) {
sum += c[k] * mult;
mult *= 256;
}
}
char c[] = s.substring(intLength * 4).toCharArray();
long mult = 1;
for (int k = 0; k < c.length; k++) {
sum += c[k] * mult;
mult *= 256;
}
return (Math.abs(sum) % M);
}
This generates a random digit between 0 and the number of records-1 (M-1). The problem is the digits will not be unique and all possible digits might not be generated. So how can this be used to return the position of records in the database?
My idea was to generate the hashes from the String keys, get the hash digit, sort records by that digit and then insert them in the database but like I said the method does not produce unique digits and there is no guarantee of getting all the digits.
"Lack of uniqueness" is called a collision, and there are different ways to solve it. The details are explained well in Wikipedia: https://en.wikipedia.org/wiki/Hash_table#Collision_resolution
There are two main approaches:
Separate chaining uses extra storage space: if a string hashes to a number already in the table, it spills over to secondary storage space. In an in-memory data structure, the extra storage is typically a linked list.
Open addressing looks for unused space: if a string hashes to a number already in the table, it's stored somewhere else in the same table, according to a strategy you decide beforehand.
You have a pretty serious problem here:
mult *= 256;
Multiplying by a power of 2 means you are throwing away information: after only 8 characters mult = 0 and you're ignoring the rest of the string. Changing the multiplier to a prime number, such as 127, solves this.
You have another problem here:
return (Math.abs(sum) % M);
Math.abs
has a special case where it does not return a positive number: Long.MIN_VALUE. One way to fix it is taking the absolute value after the remainder:
return Math.abs(sum % M);