Search code examples
c++vectorerase

erase duplicate elements keeping order


I would like to eliminate duplicate elements from a vector while keeping the vector in the current order.

Below I have a proposed implementation. First, is this safe?

Second, is there a better way to do it, either more efficiently or better from a "using C++ algorithms and not reinventing the wheel" perspective.

#include <iostream>
#include <vector>
#include <algorithm>
#include <cstdlib>

int main()
{
    using namespace std;

    std::vector<int> v= {1, 7, 2, 3, 8, 4, 5, 3, 2, 3, 2, 6, 2, 3, 2, 9, 10, 1, 2, 2, 1};
    std::vector<int>::iterator finalEnd = v.end();
    for (auto vIter = v.begin(); vIter != v.end(); ++vIter) {
        for (auto nextvIter = vIter + 1; nextvIter != v.end(); ++nextProjIter) {
            if (*vIter == *nextvIter)
                finalEnd = std::remove(vIter, finalEnd, *nextvIter);
        }
    }
    v.erase(finalEnd, v.end());

    for(auto p : v)
        cout << p << "  ";

    //Should return:  1  7  2  3  8  4  5  6  9  10

    return EXIT_SUCCESS;
}

Solution

  • One of many ways this can be accomplished it to use std::unordered_set to keep track of duplicates and std::stable_partition to partition the duplicates from the lone values while preserving the order of the items:

    #include <iostream>
    #include <vector>
    #include <algorithm>
    #include <unordered_set>
    
    int main()
    {
        std::unordered_set<int> numSet;
        std::vector<int> v= {1, 7, 2, 3, 8, 4, 5, 3, 2, 3, 2, 6, 2, 3, 2, 9, 10, 1, 2, 2, 1};
        auto iter = std::stable_partition(v.begin(), v.end(), [&](int n) 
               { bool ret = !numSet.count(n); numSet.insert(n); return ret; }); // returns true if the item has not been "seen"
        v.erase(iter, v.end());           
        for(auto p : v)
            std::cout << p << "  ";
    }
    

    Output:

    1  7  2  3  8  4  5  6  9  10 
    

    The std::stable_partition will return true if the item has not been seen, thus place it to the left of the partition point. Once done, an iterator to the partition point is returned, and we use this iterator to do one single erasure from that point to the end of the vector. Note that the lambda function updates the unordered_set for each item processed.

    The reason why std::stable_partition was used instead of std::remove_if is that std::remove_if is not guaranteed to process the items in order. For example, it could have been possible for an implementation to process the second 1 in that data first, instead of the first 1. So to be safe stable_partition will not erase elements, but simply place the elements in the correct position, ready for the erasure at the end.