Search code examples
c++performancestdvector

Will the std::vector copy assignment operator avoid releasing and reallocating memory if possible?


I have a function copying one vector to another (among many other things). A simplified version of the code is below.

void Fun(std::vector<double> &in, std::vector<double> &out) {
    out = in;
}

I care about maximizing efficiency because the function will be run many times. So, I would like to avoid reallocating memory as much as possible. The vector 'in' may already have a significant amount of memory reserved to it. So, if I made a manual implementation, I am sure that I could accomplish this. For example:

void Fun(std::vector<double> &in, std::vector<double> &out) {
    out.resize(in.size());//note - if the capacity of out is greater than in.size(), this will involve no memory release or allocation
    for (unsigned int i = 0;i<in.size();i++) {
        out[i] = in[i];
    }
}

If in.size() is less than the existing capacity of out, then this latter implementation will not release or assign any new heap memory.

The question is - would my original implementation do the same? Ie, would std::vector know to automatically do this in a memory efficient way if I simply did "out = in;"?

In particular, I am concerned that perhaps the "out = in;" line might do something like release all the heap memory currently allocated to out, and then reallocate it. Something effectively equivalent to this:

void Fun(std::vector<double> &in, std::vector<double> &out) {
    out.clear();
    out.shrink_to_fit();//releasing memory from heap
    out.resize(in.size());//reallocating memory from heap
    for (unsigned int i = 0;i<in.size();i++) {
        out[i] = in[i];
    }
}

Solution

  • The vector 'in' may already have a significant amount of memory reserved to it.

    [...] would my original implementation do the same?

    I don't know of any implementation that reserves an excess of memory just because the vector being copied from has a lot of .capacity(). Also, your second snippet doesn't reserve more memory than needed to store all elements in in - and it does it by first default constructing the extra elements and then assigning new values to them. For doubles, it probably doesn't matter much, but for non-trivial types, it can make a noticeable difference. The original out = in; will never be slower than the code in your second snippet.

    If you want to reserve as much memory as in has reserved, you can do it manually with .reserve():

    void Fun(const std::vector<double>& in, std::vector<double>& out) {
        out.clear();
        out.reserve(in.capacity());
        out = in;
    }
    

    Regarding the additions to your question:

    In particular, I am concerned that perhaps the out = in; line might do something like release all the heap memory currently allocated to out, and then reallocate it. Something effectively equivalent to this:

    void Fun(std::vector<double> &in, std::vector<double> &out) {
        out.clear();
        out.shrink_to_fit();//releasing memory from heap
        out.resize(in.size());//reallocating memory from heap
        for (unsigned int i = 0;i<in.size();i++) {
           out[i] = in[i];
        }
    }
    

    No, that is highly unlikely. From cppreferece:

    [...] Otherwise, the memory owned by *this may be reused when possible. In any case, the elements originally belonging to *this may be either destroyed or replaced by element-wise copy-assignment.

    A sane implementation would do what it can to reuse the memory it already has allocated. The implementation will likely use different approaches for different types to make the copy assignment as effective as possible. For trivially copyable types (like double), it may out.resize(in.size()) and then std::copy. For more complex types, it may out.reserve(in.size());, copy assign the existing elements in out and then copy construct the rest (if out.size() < in.size()) - or it may out.clear(); out.reserve(in.size()); and then copy construct them all. It'll depend on the type in the container.

    If out has more capacity than in, it will (almost certainly) not shrink_to_fit (which in theory could cause an unnecessary reallocation) but instead keep the over allocation for later use as demonstanted in this example where all major implementations keep the capacity even after copy assignment of a much smaller vector.