Search code examples
c++vectorstringstream

Tokenize a string with stringstream where the last char is the delimiter


I am reading data from a file and putting it into string tokens like so:

std::vector<Mytype> mytypes;
std::ifstream file("file.csv");
std::string line;
while (std::getline(file, line)){
    std::stringstream lineSs(line);
    std::vector<std::string> tokens;
    std::string token;
    while (std::getline(lineSs, token, ',')){
        tokens.push_back(token);
    }
    Mytype mytype(tokens[0], tokens[1], tokens[2], tokens[3]);
    mytypes.push_back(mytype);
}

Obviously a pretty standard way of doing this. However the data has no NULL values, instead it will just be empty at that point. What I mean is the data may look something like this:

id0,1,2,3
id1,,2,
id2,,,3

The case of the middle line is causing me problems, because nothing is getting pushed back into my tokens vector after the "2", though there should be an empty string. Then I get some out_of_range problems when I try to create an instance of Mytype.

Until now I have been checking to see if the last char of each line is a comma, and if so, appending a space to the end of the line. But I was wondering if there was a better way to do this.

Thanks.


Solution

  • The difference is that line 2 has !lineSs.eof() before the last call to getline(). So you should stop the loop not if getline() returns false (note: this isn't really getline() returning false, but the stream being false when casted to bool); instead, stop it once lineSs.eof() returns true.

    Here is a modification of your program that shows the idea:

    int main() {
        std::string line;
        while (std::getline(std::cin, line)){
            std::stringstream lineSs(line);
            std::vector<std::string> tokens;
            do {
                std::string token;
                std::getline(lineSs, token, ',');
                tokens.push_back(token);
                std::cout << "'" << token << "' " << lineSs.eof() << ' ' << lineSs.fail() << std::endl;
            } while(!lineSs.eof());
            std::cout << tokens.size() << std::endl;
        }
    }
    

    It will show "3" on the last line for "1,2,3", and "4" for "1,2,3,".