Search code examples
stringhashswift3

Swift String hash should be used to index persistent data?


I stumbled upon a bug in my (relatively) old code and find out the String hash property proved to be not densely unique: many different string has the same hash value.

Referring to the doc, I only found "An unsigned integer that can be used as a hash table address" which bring no information at all.

My code snippet is as simple as follows:

func getCacheIndex(sUrl: String) -> Int {
   return sUrl.hash
}

And produces the following, for the given different strings (heading param is different while the XXXXXXX denotes the substituted key string):

FileCache hash is -4052854053573130360 for url
 https://maps.googleapis.com/maps/api/streetview?size=675x900&location=46.414382,10.013988&heading=135&pitch=-0.76&key=XXXXXXXXXXXXXXXXXXX 

FileCache hash is -4052854053573130360 for url
 https://maps.googleapis.com/maps/api/streetview?size=675x900&location=46.414382,10.013988&heading=180&pitch=-0.76&key=XXXXXXXXXXXXXXXXXXX

There is a hashValue for String but it's clearly stated that we should not use it to persist anything between to runs.

How would you solve this with Swift? Should I provide my own hash code?


Solution

  • I momentarily replace native String.hash by a custom func in my app. This solve the issue, with better apparent distribution:

    public func hash(_ string: String) -> Int {
        func djb(_ string: String) -> Int {
    
            return string.utf8
                .map {return $0}
                .reduce(5381) {
                    ($0 << 5) &+ $0 &+ Int($1)
            }
        }
    
        return djb(string)
    }
    

    Note: the djb hash func could be replaced at any time whenever I have time to work on distribution.