Search code examples
c++c++11vectoriterator

How to know `std::vector::begin()` become invalid?


The code snippet below is seen at cppreference.

Why the std::vector::begin() is invalid when the vector is inserted the third time. I think the std::vector::begin() become invalid when the vector has been reallocated.

Does the reallocation is strictly defined in the C++ stardard? If not, it seems the std::vector::begin() should be called everytime when insert() is called in the demo code.

#include <iostream>
#include <iterator>
#include <vector>
 
void print(int id, const std::vector<int>& container)
{
    std::cout << id << ". ";
    for (const int x : container)
        std::cout << x << ' ';
    std::cout << '\n';
}
 
int main ()
{
    std::vector<int> c1(3, 100);
    print(1, c1);
 
    auto it = c1.begin();
    it = c1.insert(it, 200);
    print(2, c1);
 
    c1.insert(it, 2, 300);  //ME: why begin() still valid?
    print(3, c1);
 
    // `it` no longer valid, get a new one:
    it = c1.begin();
 
    std::vector<int> c2(2, 400);
    c1.insert(std::next(it, 2), c2.begin(), c2.end());
    print(4, c1);
 
    int arr[] = {501, 502, 503};
    c1.insert(c1.begin(), arr, arr + std::size(arr));
    print(5, c1);
 
    c1.insert(c1.end(), {601, 602, 603});
    print(6, c1);
}

Solution

  • The C++ standard gives std library latitude on when it invalidates vectors.

    It places requirements, but does not mandate implementation. These requirements mean that a family of implementations are possible. The actual implementations have guided the standard text, and the standard text has in turn guided the implementations.

    Most languages that are not C++ have one implementation. So the standard for that language describes what the implementation does. In C++ land this doesn't happen, so things are a layer more abstract.

    I will now give a description of how, in practice, C++ std implementers do. It will contain a few "lies told to children" to keep it simple.

    The std::vector<T> manages a buffer of memory, some of which contains objects of type T, plus a count of how many objects are in that buffer.

    When you do an operation whose resulting vector fits within that buffer, the std::vector<T> just uses that buffer.

    So, for example, a given vector might have a buffer with room for 7 Ts, and have only 3 in use. If you add an element, it would shuffle the elements around and wouldn't reallocate anything.

    However, if you added 5 elements, that would no longer fit in your buffer (as 3+5=8, which is greater than 7). At this point the std::vector<T> triggers a reallocation.

    When a reallocation occurs -- when the buffer gets replaced with a larger one - all iterators are invalidated.

    If you do a .reserve() to a larger size, std::vector implementations tend to grow the vector to exactly the size you ask. But if you do .push_back or .insert or similar operations, the std::vector implementations are required to pull off something called "amortized constant cost". In effect, this means they have to grow in size exponentially.

    If you start with a vector with a buffer with room for 1 element, and you keep pushing back, the buffer doesn't grow like "1, 2, 3, 4, 5, 6, ...". Instead, it grows like "1, 2, 4, 7, 11, 17, 26, ..." or "1, 2, 4, 8, 16, 32, ...". Some function that behaves asymptotically like "k^n".

    Two typical growth rates are "twice the old buffer size" and "1.5 times the old buffer size"; different implementations pick different growth rates, and the C++ standard is flexible enough to allow either. (there are advantages to faster or slower growth rates).

    Because of that flexibility in how fast the implementation chooses to grow, the standard doesn't say when iterators are invalidated for a given program. It just says under what conditions they can be invalidated.

    A std::vector iterator can be invalidated whenever .capacity() changes. .reserve() guarantees a minimum .capacity(). .size() is always less than or equal to .capacity(). .insert() and .push_* methods can all grow .size().

    So a first tier safety thing is to not assume iterators remain valid if you are adding elements to the vector. This rule is strong enough.

    Sometimes you want to get fancy. If you want to get fancy, you have to at run time get .capacity() when you get your iterators, and ensure your set of operations adding elements never passes that valid. If so, you can guarantee your iterators won't be invalidated by adding elements.

    (The other common way vector iterators are invalidated is swap, move-assignment and move-construction from or two a vector. In the case of move-construction and swap, the iterators are defined to migrate to the other container; in the case of move-assignment, no such guarantee is provided, as it leaves it up to the implementation to decide on a runtime basis if the vector should move elements or move buffers)