Search code examples
c++stdvector

Why is it faster to add to a vector with reserved capacity than to construct a vector from a pair of iterators?


I don't know case 2 faster then case 1 std::vector<int> vv(us.begin(), us.end()) don't call reserve? What made the difference? Thank you

this is code

#include <iostream>
#include <list>
#include <unordered_map>
#include <vector>
#include <unordered_set>
#include <chrono>
using namespace std;


class Timer
{
public:
    Timer()
    {
        start = std::chrono::high_resolution_clock::now();
    }
    chrono::time_point<chrono::steady_clock> start;

    ~Timer()
    {
        cout << chrono::duration_cast<chrono::milliseconds>(std::chrono::high_resolution_clock::now() - start).count() << "ms\n";
    }
};

int main()
{
    std::unordered_set<int> us;
    constexpr int loopCount = 10'000'000;
    for (int i = 0; i < loopCount; ++i)
    {
        us.emplace(i);
    }

    cout << "us creat!" << endl;
    {
        //**case 1**
        Timer t;
        std::vector<int> vv(us.begin(), us.end());
    }

    {
        //**case 2**
        Timer t;
        std::vector<int> vv;
        vv.reserve(us.size());
        for (const auto& item : us)
        {
            vv.emplace_back(std::move(item));
        }
    }

    cout<<"end!";
}

Output:

us creat!
1028ms
497ms
end!

C++17, Release mode, x64

OS : window10 pro

compiler : vc


Solution

  • std::unordered_set::iterator is std::forward_iterator, but it's not std::random_access_iterator.

    Only iterators which qualify as std::forward_iterator could have std::distance called for them at all, but only iterators which are additionally std::random_access_iterator have a constant time (O(1)) std::distance associated with them. Without the later, only O(n) is required.

    The constructor of std::vector doesn't mandate specializations stronger than std::input_iterator - so iterators are required to be compatible with this constructor even if they are not std::forward_iterator.

    What this means, is the the simplest default implementation must assume that std::distance is not even legal to call.

    The C++ library implementation may still choose to include std::distance as an optimization - but that requires std::random_access_iterator to be efficient. Only std::forward_iterator would often not actually give a performance benefit, but may be expensive.


    vv.reserve(us.size());
    

    This is bypassing the whole iterator trait system, so you can optimize in ways the compiler can't do with pure iterators.

    Coincidentally, this whole subject was addressed in ranges in C++23, where a range can now consist of iterators not fulfilling the std::random_access_iterator trait, but the range as a whole still having a constant time size() method, making it qualify for optimized reservation regardless.

    E.g.

    std::vector<int> vv(std::from_range, us);
    

    now always hits the best possible optimizations implemented in the corresponding C++ library.