Search code examples
c++memorystlunordered-mapunordered-set

C++: Clarification about "valid pointers and references"


I'm having an hard time understanding this description of stl unordered map's implementation:

The standard template library (STL) for C++ provides hash tables via std::unordered_map and std::unordered_set. The standard guarantees reference stability: References and pointers to the keys and values in the hash table must remain valid until the corresponding key is removed. In practice, this means the entries must be indirect and individually allocated, which adds a substantial CPU overhead.

Two questions:

  1. When the author says "references and pointers to the keys and values ... must remain valid" does that mean that after I insert an item, the pointer and reference must live for the entirety of the program? When I use "new" to allocate the object on the heap and the pointer to it goes out of scope and is no longer valid. I'm not sure why the reference/pointer needs to stay valid given that the hash table has reference to the object on the heap.

  2. How do "indirect and individually allocated" entries "add substantial CPU overhead"?


Solution

  • When the author says "references and pointers to the keys and values ... must remain valid" does that mean that after I insert an item, the pointer and reference must live for the entirety of the program?

    No. It means that if you take the address of an object in the map, or bind a reference to such an object, that pointer, or that reference, must not become dangling because of unrelated elements in the map being added or removed. It is a requirement on the implementation of std::unordered_map.

    This is different to how e.g. std::vector behaves. Adding elements to a vector can cause it to reallocate, and all previously acquired pointers and references to elements are left dangling.

    When I use new to allocate the object on the heap and the pointer to it goes out of scope and is no longer valid.

    No, the pointer ceases to exist, other pointers with the same value are still pointing to the object you newed.

    How do "indirect and individually allocated" entries "add substantial CPU overhead"?

    Modern CPUs are much faster at accessing things that are adjacent in memory. E.g. they start loading later elements before they are asked to. Such prediction is harder if the target addresses don't follow a simple pattern.

    The article that you linked is highlighting design choices that the standards committee made that make std::unordered_map unsuitable for the author's purposes. They have written a pair of classes that have the same public members (with the same meanings) as those of std::unordered_map, as a "drop in" replacement.