Search code examples
c++arraysstringstdvectorboost-tokenizer

Vector of comma separated token to const char**


I am trying to convert a comma separated string to vector of const char*. With the following code, by expected output is

ABC_
DEF
HIJ

but I get

HIJ
DEF
HIJ

Where am I going wrong?

Code:

#include <iostream>
#include <boost/tokenizer.hpp>
#include <vector>
#include <string>
using namespace std;

int main()
{
   string s("ABC_,DEF,HIJ");
   typedef boost::char_separator<char> char_separator;
   typedef boost::tokenizer<char_separator> tokenizer;

   char_separator comma(",");
   tokenizer token(s, comma);
   tokenizer::iterator it;

   vector<const char*> cStrings;

   for(it = token.begin(); it != token.end(); it++)
   {
      //cout << (*it).c_str() << endl;
      cStrings.push_back((*it).c_str());
   }

   std::vector<const char*>::iterator iv;
   for(iv = cStrings.begin(); iv != cStrings.end(); iv++)
   {
      cout << *iv << endl;
   }
   return 0;
}

http://ideone.com/3tvnUs

EDIT: Solution with help of answers below: (PaulMcKenzie offers a neater solution using lists.)

#include <iostream>
#include <boost/tokenizer.hpp>
#include <vector>
#include <string>
using namespace std;

char* createCopy(std::string s, std::size_t bufferSize)
{
   char* value = new char[bufferSize];
   memcpy(value, s.c_str(), (bufferSize - 1));
   value[bufferSize - 1] = 0;
   return value;
}

int main()
{
   string s("ABC_,DEF,HIJ");
   typedef boost::char_separator<char> char_separator;
   typedef boost::tokenizer<char_separator> tokenizer;

   char_separator comma(",");
   tokenizer token(s, comma);
   tokenizer::iterator it;

   vector<const char*> cStrings;

   for(it = token.begin(); it != token.end(); it++)
   {
      //cout << it->c_str() << endl;
      cStrings.push_back(createCopy(it->c_str(),
                                      (it->length() + 1)));
   }

   std::vector<const char*>::iterator iv;
   for(iv = cStrings.begin(); iv != cStrings.end(); iv++)
   {
      cout << *iv << endl;
   }

   //delete allocations by new
   //...
   return 0;
}

Solution

  • Here's the thing: boost::tokenizer::iterator doesn't return you ownership of a copy of the string, but a refernce to an internal copy.

    For example, after running your code I get:

    HIJ
    HIJ
    HIJ
    

    the solution is to replace cStrings.push_back((*it).c_str()) with one of the following:

        char* c = new char[it->length() + 1];
        c[it->length()] = 0;
        cStrings.push_back(c);
        std::strncpy(c, it->c_str(), it->length());
    

    doesn't look pretty, but you probably won't get faster than that (at least if you want to use boost::tokenizer.

    other option is to totally replace boost::tokenizer with e.g. strtok - an example can be found here: C split a char array into different variables

    you can also use boost::algorithm::string::split, but then you might need to remap string to const char* later on.