Search code examples
c++pointersstlc++17stdset

Why would dereferencing a pointer to an extracted node of std::set be undefined behaviour?


I did a lightning talk in my company on the new (C++17) splicing interface of the associative containers. I demonstrated std::set::extract and was then asked what would happen to iterators and pointers to the extracted element. They caught me on the wrong foot and I could not answer the question but looked it up right after the talk.

[associative.reqmts] 21.2.6.10 in the current draft of the standard reads as follows:

The extract members invalidate only iterators to the removed element; pointers and references to the removed element remain valid. However, accessing the element through such pointers and references while the element is owned by a node_­type is undefined behavior. References and pointers to an element obtained while it is owned by a node_­type are invalidated if the element is successfully inserted.

(The proposal P0083R3 already contains this wording)

Now the emphasized part really disconcerts me. I understand the concept of a valid but not dereferenciable pointer (nullptr) or iterator (end iterator). I found a "definition" of valid pointers by David Vandevoorde and learned that there are also valid but not dereferenciable pointers that are not nullptr. (namely a pointer one past an existing object)

With all this my mental model of what happens is the following:

  1. One retrieves a pointer to an element in a set, the pointer is therefore valid. Dereferencing that pointer is defined and yields access to the element in the set.
  2. Now one extracts that same element from the set. Conceptually the element is not copied, moved or otherwise changed, its associated node is merely removed from the internal tree with which set manages its data. The remaining tree might need to be rebalanced. The returned node_handle takes ownership of the orphaned tree node.

As by the standard the pointer retrieved in 1) remains valid and cannot be change by extract, so this also supports this mental model. However with this model there is no reason why dereferencing the pointer would suddenly be undefined. Consequently with g++ on coliru it seems to work as I would have expected. (this is not intended as proof of any kind)

The leeway that the standard gives library implementers seems unnecessarily big. What am I missing? I only see that constness of set values is dropped when extracting them, but don't see how that would have any effect.

The same reasoning applies to the insertion case mentioned in the last quoted sentence.


Solution

  • Your mental model is missing the fact that removing the const-ness, strictly speaking, must be implementation-defined.

    The node_handle must take ownership of a different object, but by some implementation-defined magic, that mutable object springs into existence without being constructed, having the same value and storage as the original, const object.

    Similarly, when it is inserted into a set with a compatible allocator, that set transmutes the mutable object owned by the node_handle back into the original const object.

    It's undefined behaviour because the const object has ceased to exist while "it" is owned by the node_handle, but then it starts existing again when it is re-inserted.

    It's the same sort of reasoning that makes it undefined behaviour to use node_handle from a map if you have a user-defined specialisation of std::pair<const K, V> or std::pair<K, V>. You don't want to constrain the implementation on what manner of "magic" it does to achieve all this, so you make anything that would observe the "magic" undefined behaviour.