Search code examples
c++stringsplitend-of-line

C++ string splitting segmentation error


I am splitting a string into a vector of strings

    vector<string> tokens;

    stringstream strstm(str);
    string item;
    while (getline(strstm, item, ' ')) {
        tokens.push_back(item);
    }

    token_idx = 0;

    cout << "size = " << tokens.size() << endl;

    for (unsigned int i = 0; i < tokens.size(); i++)
    {
        cout << tokens[i] << "[" << i << "]" << endl;
    } 

The split is successful, and the size() and its elements is what I like it to be. However the last token seems to act strangely when I try to get its value.

string Lexer::consume() {
    if (hasValue()) {
        token_idx++;
        cout << "consumed " << tokens[token_idx-1] << " tokens = " << token_idx -1 << endl;
        return tokens[token_idx-1];
    }
    cout << "didn't consume, token_idx = " << token_idx << endl;
    return "null";
}

hasVal is like this

bool Lexer::hasValue() {
    if ( token_idx < tokens.size()) {
        return true;
    } else {
        return false;
    }
}

if i have an input string like such 1 + 2 * 3 the expected output from my program should be (+1(*23)), however I am getting a segmentation error.

size = 5
1[0]
+[1]
2[2]
*[3]
3[4]
consumed 1 tokens = 0
consumed + tokens = 1
consumed 2 tokens = 2
consumed * tokens = 3
consumed 3 tokens = 4
Segmentation fault (core dumped)

But if i change the has value check to ( token_idx < tokens.size() -1 ), the program will return (+1 (*2 null))

size = 5
1[0]
+[1]
2[2]
*[3]
3[4]
consumed 1 tokens = 0
consumed + tokens = 1
consumed 2 tokens = 2
consumed * tokens = 3
didn't consume, token_idx = 4
(+1 (*2 null))

So I'm wondering if there's a end of line after the 3 when splitting the way that I did or is there some other factors contributing to this behaviour? I am quite certain I am not going out of bounds for the vector though.


Solution

  • I think the real incriminated code generating the error is not showed her but since I can sense the way you are manipulating indice... there is no mistery that you have done an error accessing past the end on your token list, in addition whith an error prone design, that's all.

    if (hasValue()) { // has value is useless to me
        token_idx++;  // why incrementing this here ?
    
        cout << "consumed " << tokens[token_idx-1] << " tokens = " << token_idx -1 << endl;
    
        return tokens[token_idx-1];
    }
    

    change it to this:

    if ( token_idx < tokens.size() ) { 
        cout << "consumed " << tokens[token_idx] << " tokens = " << token_idx << endl;
    
        return tokens [ token_idx++ ];
    }
    

    Also read about recursive descent parsing, It's realy simple and you will be a lot more informed on parsing, avoiding common pitfalls.