Search code examples
c++arrayssortingstl-algorithm

How do I remove duplicates from a C++ array?


I have an array of structs; the array is of size N.

I want to remove duplicates from the array; that is, do an in-place change, converting the array to have a single appearance of each struct. Additionally, I want to know the new size M (highest index in the reduced array).

The structs include primitives so it's trivial to compare them.

How can I do that efficiently in C++?

I have implemented the following operators:

bool operator==(const A &rhs1, const A &rhs2) 
{       
    return ( ( rhs1.x== rhs2.x )  &&
             ( rhs1.y == rhs2.y ) );
}

bool operator<(const A &rhs1, const A &rhs2) 
{       
    if ( rhs1.x == rhs2.x )  
             return ( rhs1.y < rhs2.y );

    return ( rhs1.x < rhs2.x );
}

However, I get an error when running:

std::sort(array, array+ numTotalAvailable);

 * array will have all elements here valid.

std::unique_copy(
        array, 
        array+ numTotalAvailable, 
        back_inserter(uniqueElements)); 

 * uniqueElements will have non-valid elements.

What is wrong here?


Solution

  • You could use a combination of the std::sort and std::unique algorithms to accomplish this:

    std::sort(elems.begin(), elems.end());                  // Now in sorted order.
    iterator itr = std::unique(elems.begin(), elems.end()); // Duplicates overwritten
    elems.erase(itr, elems.end());                          // Space reclaimed
    

    If you are working with a raw array (not, say, a std::vector), then you can't actually reclaim the space without copying the elements over to a new range. However, if you're okay starting off with a raw array and ending up with something like a std::vector or std::deque, you can use unique_copy and an iterator adapter to copy over just the unique elements:

    std::sort(array, array + size); // Now in sorted order
    
    std::vector<T> uniqueElements;
    std::unique_copy(array, array + size,
                     back_inserter(uniqueElements)); // Append unique elements
    

    At this point, uniqueElements now holds all the unique elements.

    Finally, to more directly address your initial question: if you want to do this in-place, you can get the answer by using the return value from unique to determine how many elements remain:

    std::sort(elems, elems + N);                // Now in sorted order.
    T* endpoint = std::unique(elems, elems + N);// Duplicates overwritten
    ptrdiff_t M = endpoint - elems;             // Find number of elements left
    

    Hope this helps!