Is there an even faster approach than swap-and-pop for erasing from std::vector?

I am asking this as the other relevant questions on SO seem to be either for older versions of the C++ standard, do not mention any form of parallelization, or are focused on keeping the ordering/indexing the same as elements are removed.

I have a vector of potentially hundreds of thousands or millions of elements (which are fairly light structures, around ~20 bytes assuming they're compacted down).

Due to other restrictions, it must be a std::vector and other containers would not work (like std::forward_list), or be even less optimal in other uses.

I recently swapped from simple it = std::erase(it) approach to using pop-and-swap using something like this:

for(int i = 0; i < myVec.size();) {
  // Do calculations to determine if element must be removed
  // ...

  // Remove if needed
  if(elementMustBeRemoved) {
    myVec[i] = myVec.back();
    myVec.pop_back();
  } else {
    i++;
  }
}

This works, and was a significant improvement. It cut the runtime of the method down to ~61% of what it was previously. But I would like to improve this further.

Does C++ have a method to remove many non-consecutive elements from a std::vector efficiently? Like passing a vector of indices to erase() and have C++ do some magic under the hood to minimize movement of data?

If so, I could have threads individually gather indices that must be removed in parallel, and then combine them and pass them to erase().

Solution

Take a look at std::remove_if algorithm. You could use it like this:

auto firstToErase = std::remove_if(myVec.begin(), myVec.end(),
                                   [](const & T x){
                                   // Do calculations to determine if element must be removed
                                   // ...
                                   return elementMustBeRemoved;});
myVec.erase(firstToErase, myVec.end());

cppreference says that following code is a possible implementation for remove_if:

template<class ForwardIt, class UnaryPredicate>
ForwardIt remove_if(ForwardIt first, ForwardIt last, UnaryPredicate p)
{
   first = std::find_if(first, last, p);
   if (first != last)
       for(ForwardIt i = first; ++i != last; )
           if (!p(*i))
               *first++ = std::move(*i);
   return first;
}

Instead of swapping with the last element it continuously moves through a container building up a range of elements which should be erased, until this range is at the very end of vector. This looks like a more cache-friendly solution and you might notice some performance improvement on a very big vector.

If you want to experiment with a parallel version, there is a version (4) which allows to specify execution policy.