performance dictionary hash hashtable hashset

Can Set be faster than Hashtable?

Sets are typically implemented as hashtables where keys are set members and values are some constant (True or something). Since hashtable is a generalization of set, it could be possible to have a faster implementation of sets, but is there a faster implementation? Maybe such implementation is not based on hashtable at all? Or maybe it is is, but optimized in some other way?

Solution

Can Set be faster than Hashtable?

A "set" is any container that tracks distinct keys, supporting operations "insert(Key)", "erase(Key)", and "has(Key)".

A hash table can be used to do this - e.g. C++'s std::unordered_set.

A balanced binary tree can be used to do this - e.g. C++'s std::set.

While not efficient for more than a few tens or hundreds of Keys, contiguous memory (i.e. an array or std::vector) can store sorted elements for fairly fast (but still O(log N)) binary search lookup, with much slower insertion and erasure. This can be optimal sometimes because it's CPU cache friendly. But, because it quickly gets inefficient, C++ doesn't provide a contiguous-storage container with those three set operations (insert(Key), erase(key), find(key) as above)... you'd have to download an extra library or write your own. C++ does have std::binary_search, std::lower_bound and std::upper_bound which will make it easy to implement.

Returning to your question:

Can Set be faster than Hashtable?

They're not necessarily distinct things. A set may be implemented using a hashtable, and sometimes that may be the best choice (particularly when there are a lot of keys, it's not expensive to hash the keys well enough to avoid excessive collisions, and it's cheap to compare keys. Even then, there are different types of hash tables, and sometimes unordered_set won't be the fastest - you can google "fastest C++ hash table" and do some reading, e.g. https://engineering.fb.com/2019/04/25/developer-tools/f14/

Sometimes those other implementation choices - balanced binary trees or contiguous memory - will be faster than using a hash table.

Balanced binary trees tend to work well when there's a middling (few thousand to a million say) elements and it's cheap to work out whether one key is "less than" another key.

Turning to your other questions / assertions:

Sets are typically implemented as hashtables where keys are set members and values are some constant (True or something).

Wrong: sets only store keys, they don't have "values" (whether boolean or otherwise). If you want to keys and values, then that's called a map. Maps can be implemented using all the same data structures as sets (hash tables, binary trees, contiguous storage), with the same performance compromises. Maps are easily implemented akin to set<std::pair<Key, Value>>, where the needed lookup operations (hash, equals and or less-than) only consider the key.

I think I've adequately addressed your other speculation / questions....