Search code examples
c++stdvectorc++20

Is there an even faster approach than swap-and-pop for erasing from std::vector?


I am asking this as the other relevant questions on SO seem to be either for older versions of the C++ standard, do not mention any form of parallelization, or are focused on keeping the ordering/indexing the same as elements are removed.

I have a vector of potentially hundreds of thousands or millions of elements (which are fairly light structures, around ~20 bytes assuming they're compacted down).

Due to other restrictions, it must be a std::vector and other containers would not work (like std::forward_list), or be even less optimal in other uses.

I recently swapped from simple it = std::erase(it) approach to using pop-and-swap using something like this:

for(int i = 0; i < myVec.size();) {
  // Do calculations to determine if element must be removed
  // ...

  // Remove if needed
  if(elementMustBeRemoved) {
    myVec[i] = myVec.back();
    myVec.pop_back();
  } else {
    i++;
  }
}

This works, and was a significant improvement. It cut the runtime of the method down to ~61% of what it was previously. But I would like to improve this further.

Does C++ have a method to remove many non-consecutive elements from a std::vector efficiently? Like passing a vector of indices to erase() and have C++ do some magic under the hood to minimize movement of data?

If so, I could have threads individually gather indices that must be removed in parallel, and then combine them and pass them to erase().


Solution

  • Take a look at std::remove_if algorithm. You could use it like this:

    auto firstToErase = std::remove_if(myVec.begin(), myVec.end(),
                                       [](const & T x){
                                       // Do calculations to determine if element must be removed
                                       // ...
                                       return elementMustBeRemoved;});
    myVec.erase(firstToErase, myVec.end());
    

    cppreference says that following code is a possible implementation for remove_if:

    template<class ForwardIt, class UnaryPredicate>
    ForwardIt remove_if(ForwardIt first, ForwardIt last, UnaryPredicate p)
    {
       first = std::find_if(first, last, p);
       if (first != last)
           for(ForwardIt i = first; ++i != last; )
               if (!p(*i))
                   *first++ = std::move(*i);
       return first;
    }
    

    Instead of swapping with the last element it continuously moves through a container building up a range of elements which should be erased, until this range is at the very end of vector. This looks like a more cache-friendly solution and you might notice some performance improvement on a very big vector.

    If you want to experiment with a parallel version, there is a version (4) which allows to specify execution policy.