I am trying to split a comma-separated string and then perform some action on each token, but ignoring duplicates, so sth. along the following lines:
int main(int, char**)
{
string text = "token, test string";
char_separator<char> sep(", ");
tokenizer< char_separator<char> > tokens(text, sep);
// remove duplicates from tokens?
BOOST_FOREACH (const string& t, tokens) {
cout << t << "." << endl;
}
}
Is there a way to do this on the boost::tokenizer?
I know that I can solve this problem using boost::split and std::unique, but was wondering whether there is a way to achieve this with the tokenizer as well.
boost.tokenizer can do many cool things, but it cannot do this, the answer is indeed "no".
If you're only looking to drop adjacent duplicates, boost.range can help make it seemless:
#include <iostream>
#include <string>
#include <boost/range/adaptor/uniqued.hpp>
#include <boost/foreach.hpp>
#include <boost/tokenizer.hpp>
using namespace boost;
using namespace boost::adaptors;
int main()
{
std::string text = "token, test string test, test test";
char_separator<char> sep(", ");
tokenizer< char_separator<char> > tokens(text, sep);
BOOST_FOREACH (const std::string& t, tokens | uniqued ) {
std::cout << t << "." << '\n';
}
}
This prints:
token.
test.
string.
test.
In order to do some action only on globally unique tokens, you will need to store state, one way or another. The simplest solution is probably an intermediate set:
char_separator<char> sep(", ");
tokenizer< char_separator<char> > tokens(text, sep);
std::set<std::string> unique_tokens(tokens.begin(), tokens.end());
BOOST_FOREACH (const std::string& t, unique_tokens) {
std::cout << t << "." << '\n';
}