Search code examples
c++flatmapc++23

Are the elements in a C++23 flat_set sorted or a heap?


I read the proposam document p1222r4.pdf but as you know, they are sometimes a bit difficult to read. I also found the community page cppreference and the boost docs.

I know that I can iterate over the elements in the set and they come out on order:

#include <iostream>
#include <random>
//#include <flat_set>    // no compiler support yet
#include <boost/container/flat_set.hpp>

using namespace std;
default_random_engine engine{};
mt19937 gen{engine()};

int main() {
    uniform_int_distribution<> rand_a(0, 100);
    auto roll_a = [&] { return rand_a(gen); };
    boost::container::flat_set<int> demo{};
    for (int i = 0; i < 30; ++i) {
        demo.insert(roll_a());
    }

    // iterate -- sorted, thats clear.
    for(const auto& elem : demo) {
        cout << elem << " ";
    }
    cout << "\n";
    //= 6 8 16 22 26 27 30 34 36 40 41 45 47 50 52 53 56 61 66 67 ...

    // get to the internals. still sorted? or a heap?
    auto seq = demo.extract_sequence(); // or extract() in the std.
    for(const auto& elem : seq) {
        cout << elem << " ";
    }
    cout << "\n";
    //= 6 8 16 22 26 27 30 34 36 40 41 45 47 50 52 53 56 61 66 67 ...
}

But there is also the amazing extract method (or extract_sequence in boost) to get to the internals of the set.

So, I seem to remember that some implementations of a flat_set do not keep the keys strictly sorted but instead as a heap, which is a bit more efficient.

But I do not find any mentioning of this in the Std Doc. Not finding it is no proof, so, am I overlooking something?

Question: Is the extract method guaranteed to return a sorted container?


Solution

  • Short answer: Yes, extract will return a sorted container, because the underlying container is always in sorted order.

    Long answer:

    There are a couple elements of the spec that make heaps impossible:

    • 24.6.5.1 note 5 is pretty clear "A flat_set maintains the invariant that the keys are sorted with respect to the comparison object". Even if you interpret this to mean "ordering is done in that manner", as opposed to "the container itself is sorted" (pretty sure that's an invalid assumption),
    • The fact that its iterators are random access and will iterate in sorted order (a feature heaps can't provide without being sorted prior to handing out iterators)

    The only reasonable implementation that meets the requirements of the spec is a sorted vector (or whatever alternate KeyContainer you template it with, likely deque) under the hood. As such, extract should return a sorted container (doing anything else would require more work, not less).

    The other problem with using a heap here is that it would make the whole class pointless. It's already specifying that insertion and removal are linear operations, and that the flat_set is unique. Without the uniqueness guarantee (flat_multiset), insertion and removal (given an iterator to the item to remove) could be O(log n), but they're not. If insertion and removal are O(n), then the only possible benefits left are:

    1. Cheap lookup (O(log n)), which heaps don't give you (they give you the ability to cheaply determine the minimum or maximum, depending on the type of heap, but membership tests are O(n) in general)
    2. Iteration in sorted order (which heaps can only give you if it's destructive iteration, and it would still be O(n log n), you just wouldn't have to wait for a complete sort to finish before you began iterating)

    Being a sorted random access container gives you both of those benefits, O(log n) membership testing and O(n) non-destructive iteration in sorted order. Heaps give you neither.

    The benefits heaps can give you are already provided by std::priority_queue, std::flat_set serves a different niche.