Search code examples
c++hash-function

Fast hash function for long string key


I am using an extendible hash and I want to have strings as keys. The problem is that the current hash function that I am using iterates over the whole string/key and I think that this is pretty bad for the program's performance since the hash function is called multiple times especially when I am splitting buckets.

Current hash function

int hash(const string& key)
{
    int seed = 131;
    unsigned long hash = 0;
    for(unsigned i = 0; i < key.length(); i++)
    {
        hash = (hash * seed) + key[i];
    }
    return hash;
}

The keys could be as long as 40 characters.

Example of string/key

string key = "from-to condition"

I have searched over the internet for a better one but I didn't find anything to match my case. Any suggestions?


Solution

  • You should prefer to use std::hash unless measurement shows that you can do better. To limit the number of characters it uses, use something like:

        const auto limit = min(key.length(), 16);
        for(unsigned i = 0; i < limit; i++)
    

    You will want to experiment to find the best value of 16 to use.

    I would actually expect the performance to get worse (because you will have more collisions). If your strings were several k, then limiting to the first 64 bytes might well be worth while.

    Depending on your strings, it might be worth starting not at the beginning. For example, hashing filenames you would probably do better using the characters between 20 and 5 from the end (ignore the often constant pathname prefix, and the file extension). But you still have to measure.