Search code examples
c++boostboost-tokenizer

tokenizing string with boost fails when casting tokens to char* const*


I'm using boost::tokenizer to tokenize a string in C++, then I want to pass it to execv.

Consider the following code snippet (compilable):

#include <iostream>
#include <cstdlib>
#include <vector>
#include <boost/tokenizer.hpp>

// I will put every token into this vector
std::vector<const char*> argc;
// this is the command I want to parse
std::string command = "/bin/ls -la -R";


void test_tokenizer() {
  // tokenizer is needed because arguments can be in quotes
  boost::tokenizer<boost::escaped_list_separator<char> > scriptArguments(
              command,
              boost::escaped_list_separator<char>("\\", " ", "\""));
  boost::tokenizer<boost::escaped_list_separator<char> >::iterator argument;
  for(argument = scriptArguments.begin(); 
    argument!=scriptArguments.end(); 
    ++argument) {

    argc.push_back(argument->c_str());
    std::cout << argument->c_str() << std::endl;
  }

  argc.push_back(NULL);
}

void test_raw() {
  argc.push_back("/bin/ls");
  argc.push_back("-l");
  argc.push_back("-R");

  argc.push_back(NULL);
}

int main() {
  // this works OK
  /*test_raw();
  execv(argc[0], (char* const*)&argc[0]);
  std::cerr << "execv failed";
  _exit(1);
  */

  // this is not working
  test_tokenizer();
  execv(argc[0], (char* const*)&argc[0]);
  std::cerr << "execv failed";
  _exit(2);
}

When I run this script it calls test_tokenizer(), it will print 'execv failed'. (Although it prints the arguments nicely).

However if I change test_tokenizer to test_raw it runs fine.

It must be some easy solution but I didn't find it.

PS.: I also drop this into an online compiler with boost support here.


Solution

  • boost::tokenizer saves the token by value (and by default as std::string) in the token iterator.

    Therefore the character array that argument->c_str() points to may be modified or invalidated when the iterator is modified and its lifetime will end with that of argument at the latest.

    Consequently your program has undefined behavior when you try to use argc.

    If you want to keep using boost::tokenizer, I would suggest to keep the tokens in a std::vector<std::string> and transform them to a pointer array afterwards.