Search code examples
c++g++standardsclang++unordered-map

Is the order of an unordered map always the same for a given compiler?


I just found a bug in my code that depends on the order the elements are stored in an unordered_map. Ok not a big deal, I'm going to fix it. My question is only out of curiosity to understand different implementations in different compilers.

When compiled on my computer (linux g++) my unit tests always works because the unordered_map is ordered always the same and the order does not trigger my bug.

When compiled on another computer and architecture (mac M1, clang++) my bug is always triggered because the elements always come in the same order, which is different from with g++.

When compiler on my computer with clang++: no bug.

In my specific case the key of the unordered_map is an int and the hash is the default one for an int I did not used a custom hash function.

My questions are thus: how the order of an unorderd_map depend on the compiler implementation? I've always assumed it was somewhat random, but I was wrong, it is stable for each compiler I tested. What the standard says about that? What are the typical implementations to order an unordered_map?


Solution

  • Is the order of an unordered map alway the same for a given compiler?

    No.

    There are a few things at play here:

    • the bucket to which an element hashes depends on:
      • the hash function used, and while std::hash<int> is implementation defined, to the best of my knowledge all versions of GCC, Clang and Visual C++ over the last couple decades have used identity hashes
      • the bucket_count, as the size_t value returned by std::hash is generally %-ed or &-ed into the current bucket_count(); GCC uses prime numbers for bucket counts, while Visual C++ uses powers of two (more collision prone, but bitwise-AND is faster than a mod operation); while the default max_load_factor() at which rehashing occurs is specified in the Standard (at 1.0), the initial size of the table and amount by which it grows during rehashing are unspecified
    • once a bucket is identified, it's up to the implementation whether it inserts elements at the start, the end, or somewhere in the middle of the chain of elements associated with that bucket
    • it's up to the implementation how iteration is orchestrated; in the case of GCC, there's actually one singly-linked list of all the key/value nodes, and that's iterated without reference to the buckets; all nodes in that list with keys hashing to the same bucket are grouped together in the singly-linked list (in unspecified order), with buckets effectively holding iterators into the singly-linked list, but other choices could be made (though achieving O(1) iterator advancement in a large table in which most elements have been erased can't be done by checking every bucket, so some cleverness is needed)
    • what relative order elements with the same hash have after rehashing/resizing of the table is unspecified (i.e. how they're linked into the new larger table)

    So, even with the hash values the same, a lot of things could cause iteration to vary. There's no particular reason to expect iteration order to vary for the same Standard Library implementation (more relevant here than the compiler per se) after the same insert/erase operations are done with the same keys, but no guarantees either.