Search code examples
swiftdictionaryhash-collisionhashableequatable

How does Dictionary use the Equatable protocol in Swift?


In order to solve this question, I have been playing around with a custom struct that implements the Hashable Protocol. I'm trying to see how many times the equivalency operator overload (==) gets called depending on if there is a hash collision or not when populating a Dictionary.

Update

@matt wrote a much cleaner example of a custom struct that implements the Hashable protocol and shows how often hashValue and == get called. I am copying his code below. To see my original example, check out the edit history.

struct S : Hashable {
    static func ==(lhs:S,rhs:S) -> Bool {
        print("called == for", lhs.id, rhs.id)
        return lhs.id == rhs.id
    }
    let id : Int
    var hashValue : Int {
        print("called hashValue for", self.id)
        return self.id
    }
    init(_ id:Int) {self.id = id}
}
var s = Set<S>()
for i in 1...5 {
    print("inserting", i)
    s.insert(S(i))
}

This produces the results:

/*
inserting 1
called hashValue for 1
inserting 2
called hashValue for 2
called == for 1 2
called hashValue for 1
called hashValue for 2
inserting 3
called hashValue for 3
inserting 4
called hashValue for 4
called == for 3 4
called == for 1 4
called hashValue for 2
called hashValue for 3
called hashValue for 1
called hashValue for 4
called == for 3 4
called == for 1 4
inserting 5
called hashValue for 5
*/

Since Hashable uses Equatable to differentiate hash collisions (I assume anyway), I would expect func ==() only to be called when there are hash collisions. However, there are no hash collisions at all in @matt's example above and yet == is still being called. In my other experiments forcing hash collisions (see this question's edit history), == seemed to be called a random number of times.

What is going on here?


Solution

  • Well, there's your answer:

    https://bugs.swift.org/browse/SR-3330?focusedCommentId=19980&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-19980

    What's actually happening:

    • We hash a value only once on insertion.
    • We don't use hashes for comparison of elements, only ==. Using hashes for comparison is only reasonable if you store the hashes, but that means more memory usage for every Dictionary. A compromise that needs evaluation.
    • We try to insert the element before evaluating if the Dictionary can fit that element. This is because the element might already be in the Dictionary, in which case we don't need any more capacity.
    • When we resize the Dictionary, we have to rehash everything, because we didn't store the hashes.

    So what you're seeing is:

    • one hash of the search key
    • some =='s (searching for a space)
    • hashes of every element in the collection (resize)
    • one hash of the search key (actually totally wasteful, but not a big deal considering it only happens after an O👎 reallocation)
    • some =='s (searching for a space in the new buffer)

    We all had it totally wrong. They don't use the hashes at all — only == — to decide whether this is a distinct key. And then there is a second round of calls in the situation where the collection is grown.