In the other topic I was trying to solve this problem. The problem was to remove duplicate characters from a std::string
.
std::string s= "saaangeetha";
Since the order was not important, so I sorted s
first, and then used std::unique
and finally resized it to get the desired result:
aeghnst
That is correct!
Now I want to do the same, but at the same time I want the order of characters intact. Means, I want this output:
sangeth
So I wrote this:
template<typename T>
struct is_repeated
{
std::set<T> unique;
bool operator()(T c) { return !unique.insert(c).second; }
};
int main() {
std::string s= "saaangeetha";
s.erase(std::remove_if(s.begin(), s.end(), is_repeated<char>()), s.end());
std::cout << s ;
}
Which gives this output:
saangeth
That is, a
is repeated, though other repetitions gone. What is wrong with the code?
Anyway I change my code a bit: (see the comment)
template<typename T>
struct is_repeated
{
std::set<T> & unique; //made reference!
is_repeated(std::set<T> &s) : unique(s) {} //added line!
bool operator()(T c) { return !unique.insert(c).second; }
};
int main() {
std::string s= "saaangeetha";
std::set<char> set; //added line!
s.erase(std::remove_if(s.begin(),s.end(),is_repeated<char>(set)),s.end());
std::cout << s ;
}
Output:
sangeth
Problem gone!
So what is wrong with the first solution?
Also, if I don't make the member variable unique
reference type, then the problem doesn't go.
What is wrong with std::set
or is_repeated
functor? Where exactly is the problem?
I also note that if the is_repeated
functor is copied somewhere, then every member of it is also copied. I don't see the problem here!
In GCC (libstdc++), remove_if
is implemented essentially as
template<typename It, typename Pred>
It remove_if(It first, It last, Pred predicate) {
first = std::find_if(first, last, predicate);
// ^^^^^^^^^
if (first == last)
return first;
else {
It result = first;
++ result;
for (; first != last; ++ first) {
if (!predicate(*first)) {
// ^^^^^^^^^
*result = std::move(*first);
++ result;
}
}
}
}
Note that your predicate is passed by-value to find_if
, so the struct, and therefore the set, modified inside find_if
will not be propagated back to caller.
Since the first duplicate appears at:
saaangeetha
// ^
The initial "sa"
will be kept after the find_if
call. Meanwhile, the predicate
's set is empty (the insertions within find_if
are local). Therefore the loop afterwards will keep the 3rd a
.
sa | angeth
// ^^ ^^^^^^
// || kept by the loop in remove_if
// ||
// kept by find_if