Search code examples
algorithmhashhashmaplookupbinary-search

Which is faster, Hash lookup or Binary search?


When given a static set of objects (static in the sense that once loaded it seldom if ever changes) into which repeated concurrent lookups are needed with optimal performance, which is better, a HashMap or an array with a binary search using some custom comparator?

Is the answer a function of object or struct type? Hash and/or Equal function performance? Hash uniqueness? List size? Hashset size/set size?

The size of the set that I'm looking at can be anywhere from 500k to 10m - incase that information is useful.

While I'm looking for a C# answer, I think the true mathematical answer lies not in the language, so I'm not including that tag. However, if there are C# specific things to be aware of, that information is desired.


Solution

  • Ok, I'll try to be short.

    C# short answer:

    Test the two different approaches.

    .NET gives you the tools to change your approach with a line of code. Otherwise use System.Collections.Generic.Dictionary and be sure to initialize it with a large number as initial capacity or you'll pass the rest of your life inserting items due to the job GC has to do to collect old bucket arrays.

    Longer answer:

    An hashtable has ALMOST constant lookup times and getting to an item in an hash table in the real world does not just require to compute an hash.

    To get to an item, your hashtable will do something like this:

    • Get the hash of the key
    • Get the bucket number for that hash (usually the map function looks like this bucket = hash % bucketsCount)
    • Traverse the items chain (basically it's a list of items that share the same bucket, most hashtables use this method of handling bucket/hash collisions) that starts at that bucket and compare each key with the one of the item you are trying to add/delete/update/check if contained.

    Lookup times depend on how "good" (how sparse is the output) and fast is your hash function, the number of buckets you are using and how fast is the keys comparer, it's not always the best solution.

    A better and deeper explanation: http://en.wikipedia.org/wiki/Hash_table