move semantics slower then copy c++

I wrote a small test comparing the move and test semantics:

#include <vector>
#include <iostream>
#include <iterator>
#include <chrono>
#include <iomanip>

using namespace std;

int main()
{

    int lenLeft = 3;
    int lenMid = 3;
    vector<int> lenVec{10,100,1000,static_cast<int>(1e4),static_cast<int>(1e5),static_cast<int>(1e6),static_cast<int>(1e7)};
    int reps = 100;
    vector<double>delta_t_move;
    vector<double>delta_t_copy;


    //move
    cout<<"move"<<endl;
    {

        for(int len : lenVec)
        {
            auto startTime = std::chrono::high_resolution_clock::now();

            for(int i = 0; i<reps;i++)
            {
                vector<int> leftVec(len,0);

                vector<int> rightVec;
                move(leftVec.begin()+lenLeft+lenMid,leftVec.end(),std::back_inserter(rightVec));
                leftVec.erase(leftVec.begin()+lenLeft+lenMid,leftVec.end());

                vector<int> midVec;
                std::move(leftVec.begin()+lenLeft,leftVec.begin()+lenLeft+lenMid,std::back_inserter(midVec));
                leftVec.erase(leftVec.begin()+lenLeft,leftVec.begin()+lenLeft+lenMid);
            }

            auto endTime = std::chrono::high_resolution_clock::now();
            std::chrono::duration<double> elapsed = endTime - startTime;
            delta_t_move.push_back(elapsed.count());
        }
    }

    //copy
    cout<<"copy"<<endl;
    {

        for(int len : lenVec)
        {

            auto startTime = std::chrono::high_resolution_clock::now();

            for(int i = 0; i<reps;i++)
            {
                vector<int> leftVec(len,0);

                vector<int> rightVec = vector<int>(leftVec.begin()+lenLeft+lenMid,leftVec.end());
                leftVec.erase(leftVec.begin()+lenLeft+lenMid,leftVec.end());

                vector<int> midVec = vector<int>(leftVec.begin()+lenLeft,leftVec.begin()+lenLeft+lenMid);
                leftVec.erase(leftVec.begin()+lenLeft,leftVec.begin()+lenLeft+lenMid);
            }

            auto endTime = std::chrono::high_resolution_clock::now();
            std::chrono::duration<double> elapsed = endTime - startTime;
            delta_t_copy.push_back(elapsed.count());
        }
    }

    for(int i = 0; i<lenVec.size();i++)
    {
        cout<<"lenVec = "<<setw(40)<<lenVec.at(i)<<"\t\t : delta_t_copy/delta_t_move = "<< delta_t_copy.at(i)/delta_t_move.at(i)<<endl;
    }


    return 0;
}

The output I get for this program is:

move
copy
lenVec = 10 : delta_t_copy/delta_t_move = 0.431172
lenVec = 100 : delta_t_copy/delta_t_move = 0.257102
lenVec = 1000 : delta_t_copy/delta_t_move = 0.166006
lenVec = 10000 : delta_t_copy/delta_t_move = 0.108573
lenVec = 100000 : delta_t_copy/delta_t_move = 0.113769
lenVec = 1000000 : delta_t_copy/delta_t_move = 0.134912
lenVec = 10000000 : delta_t_copy/delta_t_move = 0.133874

I was splitting an initial vector of size len, into 3 pieces. The first piece of length 3, the middle one of length 3 and the rest of the size len-6. My results show that the copy semantics are much faster then the move semantics.

I am using the MSVC2015.

Any idea how that can be true? Under what circumstances is the move semantics faster?

Solution

In addition to Vittorio's answer, let me point out what specifically causes the performance drop in your "move" codepath.

What your benchmark boils down to is the difference between this way of filling a vector:

vector<int> rightVec = vector<int>(startIt, endIt);

compared to this way of filling:

vector<int> rightVec;
move(startIt, endIt, std::back_inserter(rightVec));

As this is the only part in which your two code paths differ considerably.

The second version is expected to be slower for two reasons:

In the second case, the target vector does not know how many elements it is supposed to store, so it has to keep growing as you perform the insert. Growing vectors is expensive, as it involves a reallocation and copying/moving of the previously inserted elements. You can remove this disadvantage by inserting an appropriate reseve() call before the move. This will drastically reduce the performance penalty on this code path.
The small performance difference that remains will be due to the fact that you move to a back_inserter which results in an element-wise insertion to the target vector, as supposed to the bulk insertion performed in the first case.

If you take care to mitigate the effects of these two points, you will observe the runtimes to be roughly equivalent, because, as was already pointed out, move and copy are equivalent operations for int elements.