Search code examples
c++segmentation-faulttokenstrtok

string tokenization in c++ throws a seg fault


I would like to write a function that breaks up a string by token, I came up with the following so far:

#include <cstring>
#include <iostream>
#include <vector>
#define MAXLEN 20

void mytoken(std::string input, std::vector<std::string> & out);

int main() 
{
    std::vector<std::string> out;
    std::string txt = "XXXXXX-CA";
    mytoken(txt, out);
    std::cout << "0: " << out[0] <<std::endl;
    std::cout << "1: " << out[1] <<std::endl;
}

void mytoken(std::string instr, std::vector<std::string> & out) {
    std::vector<std::string> vec;
    char input[MAXLEN] = {0};
    strcpy(input, instr.c_str());
    char *token = std::strtok(input, "-");
    while (token != NULL) {
        std::cout << token << '\n';
        token = std::strtok(NULL, "-");
        out.push_back(token);
    }    
}

which produces the following output:

terminate called after throwing an instance of 'std::logic_error'
  what():  basic_string::_M_construct null not valid
XXXXXX
CA
bash: line 7: 21987 Aborted                 (core dumped) ./a.out

and I wonder why that is.


Solution

  • It is better to use 'c++-style' functions : it is a bit simpler and more readable:

    #include <sstream>
    
    void mytoken(std::string instr, std::vector<std::string> & out)
    {
        std::istringstream ss(instr);
        std::string token;
        while(std::getline(ss, token, '-'))
        {
            std::cout << token << '\n';
            out.push_back(token);
        }
    }
    

    For your example to work correctly, you need to change the order of operations in your loop:

    //...
    while(token != NULL)
    {
        out.push_back(token);
        std::cout << token << '\n';
        token = std::strtok(NULL, "-");
    }