Search code examples
c++stringstlsetremove-if

What is wrong with `std::set`?


In the other topic I was trying to solve this problem. The problem was to remove duplicate characters from a std::string.

std::string s= "saaangeetha";

Since the order was not important, so I sorted s first, and then used std::unique and finally resized it to get the desired result:

aeghnst

That is correct!


Now I want to do the same, but at the same time I want the order of characters intact. Means, I want this output:

sangeth

So I wrote this:

template<typename T>
struct is_repeated
{
    std::set<T>  unique;
    bool operator()(T c) { return !unique.insert(c).second; }
}; 
int main() {
    std::string s= "saaangeetha";
    s.erase(std::remove_if(s.begin(), s.end(), is_repeated<char>()), s.end()); 
    std::cout << s ;
}

Which gives this output:

saangeth

That is, a is repeated, though other repetitions gone. What is wrong with the code?

Anyway I change my code a bit: (see the comment)

template<typename T>
struct is_repeated
{
    std::set<T> & unique;  //made reference!
    is_repeated(std::set<T> &s) : unique(s) {} //added line!
    bool operator()(T c) { return !unique.insert(c).second; }
}; 
int main() {
    std::string s= "saaangeetha";
    std::set<char> set; //added line!
    s.erase(std::remove_if(s.begin(),s.end(),is_repeated<char>(set)),s.end()); 
    std::cout << s ;
}

Output:

sangeth

Problem gone!

So what is wrong with the first solution?

Also, if I don't make the member variable unique reference type, then the problem doesn't go.

What is wrong with std::set or is_repeated functor? Where exactly is the problem?

I also note that if the is_repeated functor is copied somewhere, then every member of it is also copied. I don't see the problem here!


Solution

  • In GCC (libstdc++), remove_if is implemented essentially as

        template<typename It, typename Pred>
        It remove_if(It first, It last, Pred predicate) {
          first = std::find_if(first, last, predicate);
        //                                  ^^^^^^^^^
          if (first == last)
             return first;
          else {
             It result = first;
             ++ result;
             for (; first != last; ++ first) {
               if (!predicate(*first)) {
        //          ^^^^^^^^^
                  *result = std::move(*first);
                  ++ result;
               }
             }
          }
        }
    

    Note that your predicate is passed by-value to find_if, so the struct, and therefore the set, modified inside find_if will not be propagated back to caller.

    Since the first duplicate appears at:

      saaangeetha
    //  ^
    

    The initial "sa" will be kept after the find_if call. Meanwhile, the predicate's set is empty (the insertions within find_if are local). Therefore the loop afterwards will keep the 3rd a.

       sa | angeth
    // ^^   ^^^^^^
    // ||   kept by the loop in remove_if
    // ||
    // kept by find_if