Search code examples
c++performancevectorstderase

Is remove_if and then erase efficient on vector?


Although there are tens of questions about remove_if + erase for vector. I couldn't find what is the performance of such action. When I write:

    myVector.erase(remove_if(myVector.begin(),
                         myVector.end(),
                         some_predicate), myVector.end());

The remove if will return an iterator to the last relevant item + 1 (let's call it X). I believe that this will happen in O(n).

But how the erase will work ?

  • If the erase will try to delete from the X to myVector.end() it will be O(n^2) because it will cause copying the vector to a new location, and there will be O(n) new allocations from the heap.
  • But if it will delete from myVector.end() to X it can do it in O(n) and more importantly will not allocate new memory (something I'm trying to avoid).

Thanks.


Solution

  • Consider this vector:

    |0|1|2|3|4|5|6|7|8|9|
    

    We use remove_if to remove all elements that are multiples of 4:

    std::remove_if(v.begin(), v.end(), [](auto i){ return i != 0 && !(i%4); });
    

    This starts iterating through the vector with an iterator X until it finds an element where the predicate returns true:

    |0|1|2|3|4|5|6|7|8|9|
             X
    

    This is the first element we want to remove.

    Next it creates another iterator pointing to the next element, Y = X+1, and checks the predicate for *Y:

    |0|1|2|3|4|5|6|7|8|9|
             X Y
    

    The predicate is false, so we want to keep that element, so it assigns the next element to the element we want to remove, by doing *X = std::move(*Y):

    |0|1|2|3|5|5|6|7|8|9|
             X Y            *X = std::move(*Y)
    

    So we have two iterators, X and Y, where X points to the next element in the "output" (i.e. the elements we're not removing) and Y is the next element to consider removing.

    We move both iterators to the next position, check the predicate for Y (which is false again), and do another assignment:

    |0|1|2|3|5|6|6|7|8|9|
               X Y          *X = std::move(*Y)
    

    Then it does the same again at the next position:

    |0|1|2|3|5|6|7|7|8|9|
                 X Y       *X = std::move(*Y)
    

    And then it moves on, but finds that the predicate is true for Y

    |0|1|2|3|5|6|7|7|8|9|
                   X Y
    

    So it just increments Y, which skips that element and so doesn't copy it into the "output" position at X:

    |0|1|2|3|5|6|7|7|8|9|
                   X   Y 
    

    The predicate is not true for Y, so it assigns it to X:

    |0|1|2|3|5|6|7|9|8|9|
                   X   Y     *X = std::move(*Y)
    

    Then it increments X and Y again

    |0|1|2|3|5|6|7|9|8|9|
                     X   Y
    

    Now Y is past-the-end so we return X (which points past-the-end of the output sequence, i.e. the elements we want to keep).

    After the remove_if returns X we call v.erase(X, v.end()), so the vector invokes the destructors for each element from X to the end:

    |0|1|2|3|5|6|7|9|~|~|
                     X   end
    

    And then sets the size so the vector ends at X:

    |0|1|2|3|5|6|7|9|
                     end
    

    After this v.capacity() >= v.size()+2 because the memory that was used by the two final elements is still present, but is not in use.