Search code examples
c++constructordeclarationstdvectordefault-constructor

Initializing the size of a C++ vector


What are the advantages (if any) of initializing the size of a C++ vector as well as other containers? Is there any reason to not just use the default no-arg constructor?

Basically, are there any significant performance differences between

vector<Entry> phone_book;

and

vector<Entry> phone_book(1000);

These examples come from The C++ Programming Language Third Edition by Bjarne Stroustrup. If these containers should always be initialized with a size, is there a good way to determine what a good size to start off would be?


Solution

  • There are a few ways of creating a vector with n elements and I will even show some ways of populating a vector when you don't know the number of elements in advance.

    But first

    what NOT to do

    std::vector<Entry> phone_book;
    for (std::size_t i = 0; i < n; ++i)
    {
        phone_book[i] = entry; // <-- !! Undefined Behaviour !!
    }
    

    The default constructed vector, as in the example above creates an empty vector. Accessing elements outside of the range of the vector is Undefined Behavior. And don't expect to get a nice exception. Undefined behavior means anything can happen: the program might crash or might seem to work or might work in a wonky way. Please note that using reserve doesn't change the actual size of the vector, i.e. you can't access elements outside of the size of the vector, even if you reserved for them.

    And now some options analyzed

    default ctor + push_back
    (suboptimal)

    std::vector<Entry> phone_book;
    for (std::size_t i = 0; i < n; ++i)
    {
        phone_book.push_back(entry);
    }
    

    This has the disadvantage that reallocations will occur as you push back elements. This means memory allocation, elements move (or copy if they are non-movable, or for pre C++11) and memory deallocation (with object destruction). This will most likely happen more than once for an n decently big. It is worth noting that it is guaranteed "amortized constant" for push_back which means that it won't do reallocations after each push_back. Each reallocation will increase the size geometrically. Further read: std::vector and std::string reallocation strategy

    Use this when you don't know the size in advance and you don't even have an estimate for the size.

    "count default-inserted instances of T" ctor with later assignments
    (not recommended)

    std::vector<Entry> phone_book(n);
    for (auto& elem : phone_book)
    {
        elem = entry;
    }
    

    This does not incur any reallocation, but all n elements will be initially default-constructed, and then copied for each push. This is a big disadvantage and the effect on the performance will most likely be measurable. (this is less noticeable for basic types).

    Don't use this as there are better alternatives for pretty much every scenario.

    "count copies of elements" ctor
    (recommended)

    std::vector<Entry> phone_book(n, entry);
    

    This is the best method to use. As you provide all the information needed in the constructor, it will make the most efficient allocation + assignment. This has the potential to result in branchless code, with vectorized instructions for assignments if Entry has a trivial copy constructor.

    default ctor + reserve + push_back
    (situational recommended)

    vector<Entry> phone_book;
    phone_book.reserve(m);
    
    while (some_condition)
    {
         phone_book.push_back(entry);
    }
    
    // optional
    phone_book.shrink_to_fit();
    

    No reallocation will occur and the objects will be constructed only once until you exceed the reserved capacity. A better choice for push_back can be emplace_back.

    Use this if you have a rough approximation of the size.

    There is no magical formula for the reserve value. Test with different values for your particular scenarios to get the best performance for your application. At the end you can use shrink_to_fit.

    default ctor + std::fill_n and std::back_inserter
    (situational recommended)

    #include <algorithm>
    #include <iterator>
    
    std::vector<Entry> phone_book;
    
    // at a later time
    // phone_book could be non-empty at this time
    std::fill_n(std::back_inserter(phone_book), n, entry);
    

    Use this if you need to fill or add elements to the vector after its creation.

    default ctor + std::generate_n and std::back_inserter (for different entry objects)

    Entry entry_generator();
    
    std::vector<Entry> phone_book;
    std::generate_n(std::back_inserter(phone_book), n, [] { return entry_generator(); });
    

    You can use this if every entry is different and obtained from a generator.

    Intializer list (Bonus)

    Since this has become such a big answer, beyond what the question asked, I would be remiss if I didn't mention the initializer list constructor:

    std::vector<Entry> phone_book{entry0, entry1, entry2, entry3};
    

    In most scenarios this should be your go-to constructor when you have a small list of initial values for populating the vector.


    Some resources:

    std::vector::vector (constructor)

    std::vector::insert

    standard algorithm library (with std::generate std::generate_n std::fill std::fill_n etc.)

    std::back_inserter