Search code examples
c++boosttokenize

Boost::tokenizer point separated, but also keeping empty fields


I have seen this question and mine is very similar to it, but it is different, so please do not mark it as duplicate.

My question is: How do I get the empty fields from a string?

I have a string like std::string s = "This.is..a.test"; and I want to get the fields <This> <is> <> <a> <test>.

I have tried also

typedef boost::char_separator<char> ChSep;
typedef boost::tokenizer<ChSep> TknChSep;
ChSep sep(".", ".", boost::keep_empty_tokens);
TknChSep tok(s, sep);
for (TknChSep::iterator beg = tok.begin(); beg != tok.end(); ++beg)
{
  std::cout << "<" << *beg << "> ";
}

but I get <This> <.> <is> <.> <> <.> <a> <test>.


Solution

  • The second argument to Boost.Tokenizer's char_separator is the kept_delims parameter. It is used to specify a delimiters that will show up as tokens. The original code is specifying that "." should be kept as a token. To resolve this, change:

    ChSep sep(".", ".", boost::keep_empty_tokens);
    

    to:

    ChSep sep(".", "", boost::keep_empty_tokens);
                // ^-- no delimiters will show up as tokens.
    

    Here is a complete example:

    #include <iostream>
    #include <string>
    #include <boost/foreach.hpp>
    #include <boost/tokenizer.hpp>
    
    int main()
    {
      std::string str = "This.is..a.test";
      typedef boost::tokenizer<boost::char_separator<char> > tokenizer;
      boost::char_separator<char> sep(
          ".", // dropped delimiters
          "",  // kept delimiters
          boost::keep_empty_tokens); // empty token policy
    
      BOOST_FOREACH(std::string token, tokenizer(str, sep))
      {
        std::cout << "<" << token << "> ";
      }
      std::cout << std::endl;
    }
    

    Which produces the desired output:

    <This> <is> <> <a> <test>