While messing around with type-punning iterators, I came across the ability to do
std::vector<int> vec{ 3, 7, 1, 8, 4 };
int* begin_i = (int*)(void*)&*vec.begin();
std::cout << "1st: " << begin_i << " = " << *begin_i << std::endl;
begin_i++;
std::cout << "2nd: " << begin_i << " = " << *begin_i << std::endl;
Then I tried to do the same kind of thing with an std::unordered_set
:
std::unordered_set<int> set{ 3, 7, 1, 8, 4 };
for (auto& el : set)
{ // Display the order the set is currently in
std::cout << el << ", ";
}
std::cout << '\n' <<std::endl;
int* begin_i = (int*)(void*)&*set.begin();
std::cout << "1st: " << begin_i << " = " << *begin_i << std::endl;
begin_i++;
std::cout << "2nd: " << begin_i << " = " << *begin_i << std::endl;
But the output I got was:
4, 8, 1, 7, 3,
1st: [address] = 4
2nd: [address] = 0
I'm supposing this is because and an unordered set's elements are located in different parts of memory? I was confused here considering that I also printed the order the elements were stored in using a range-based loop.
My question is how does an std::unordered_set
store its elements in memory? What happens when an element is added to the set? Where does it go in memory and how is that kept track of if it's not stored in an array-like container where the elements are one-right-after-the-other?
An unordered_set
is implemented as a hash table using external chaining.
That basically means you have an array of linked lists (which are usually called "buckets"). So, to add an item to an unordered_set
you start by hashing the new item you're doing to insert. You then take that hash and reduce it to the range of the current size of the array (which can/will expand as you add more items). You then add the new item at the tail of that linked list.
So, depending on the value produced by the hash, two consecutively inserted items may (and often will) be inserted in the linked lists at completely different parts of the table. Then the node in the linked list will typically be dynamically allocated, so even two consecutive items in the same linked list may be at completely unrelated addresses.
As I noted in an earlier answer, however, quite a bit more about this is actually specified in the standard than most people seem to realize. As I outlined there, it might be (barely) possible to violate the expectation and still (sort of) meet the requirements in the standard, but even at best, doing so would be quite difficult. For most practical purposes, you can assume it's something quite a bit like a vector of linked lists.
Most of the same things apply to an unordered_multiset
--the only fundamental difference is that you can have multiple items with the same key instead of only one item with a particular key.
Likewise, there are also unordered_map
and unordered_multimap
, which are pretty similar again, except that they separate the things being stored into a key and a value associated with that key, and when they do hashing, the only look at the key part, not the value part).