Search code examples
c++parsingc++17istringstream

Obtaining start position of istringstream token


Is there a way to find the start position of tokens extracted by istringstream::operator >>?

For example, my current failed attempt at checking tellg() (run online):

string test = "   first     \"  in \\\"quotes \"  last";
istringstream strm(test);

while (!strm.eof()) {

    string token;
    auto startpos = strm.tellg();
    strm >> quoted(token);
    auto endpos = strm.tellg();
    if (endpos == -1) endpos = test.length();

    cout << token << ": " << startpos << " " << endpos << endl;

}

So the output of the above program is:

first: 0 8
  in "quotes : 8 29
last: 29 35

The end positions are fine, but the start positions are the start of the whitespace leading up to the token. The output I want would be something like:

first: 3 8
  in "quotes : 13 29
last: 31 35

Here's the test string with positions for reference:

          1111111111222222222233333
01234567890123456789012345678901234  the end is -1

   first     "  in \"quotes "  last

        ^--------------------^-----^ the end positions i get and want
^-------^--------------------^------ the start positions i get
   ^---------^-----------------^---- the start positions i *want*

Is there any straightforward way to retrieve this information when using an istringstream?


Solution

  • First, see Why is iostream::eof inside a loop condition (i.e. `while (!stream.eof())`) considered wrong?

    Second, you can use the std::ws stream manipulator to swallow whitespace before reading the next token value, then tellg() will report the start positions you are looking for, eg:

    #include <string>
    #include <sstream>
    #include <iomanip>
    using namespace std;
    
    ...
    
    string test = "   first     \"  in \\\"quotes \"  last";
    istringstream strm(test);
    
    while (strm >> ws) {
    
        string token;
        auto startpos = strm.tellg();
        if (!(strm >> quoted(token)) break;
        auto endpos = strm.tellg();
        if (endpos == -1) endpos = test.length();
    
        cout << token << ": " << startpos << " " << endpos << endl;
    }
    

    Online Demo