Search code examples
c++stringvectortokenerase

A specific test case will not pass the test somehow


The purpose of the code is to basically remove the words inside the useless array that are present in a text file. I'm having this very odd problem where the code will not remove the word 'the' in the phrase 'waiting on the shelf', but every other test cases (a lot) passed. Any ideas?

int main(){
    string useless[20] = { "an", "the" , "of", "to", "and", "but", "nor", "or", "some", "any", "very", "in", "on", "at", "before", "after", "into", "over", "through", "along"};

    ifstream fin("input.txt");
    if(fin.fail()){
        cout << "Input failed to open" << endl;
        exit(-1);
    }

    string line;
    getline(fin, line);
    getline(fin, line);
    getline(fin, line);
    getline(fin, line);

    ofstream fout("output.txt");

    while(getline(fin, line)){
        vector<string> vec;
        istringstream iss(line);
        while (iss) {
            string word;
            iss >> word;
            transform(word.begin(), word.end(), word.begin(), ::tolower);
            vec.push_back(word);
        }

        for(int i = 0; i < vec.size(); i++){
            for(int j = 0; j < 20; j++){
                if(vec[i] == useless[j]){
                    vec.erase(remove(vec.begin(), vec.end(), vec[i]), vec.end());
                }
            }
            fout << vec[i] << " ";
        }
        fout << endl;
    }
}

Solution

  • You are using incorrect iteration here

     for(int i = 0; i < vec.size(); i++){
            for(int j = 0; j < 20; j++){
                if(vec[i] == useless[j]){
                    vec.erase(remove(vec.begin(), vec.end(), vec[i]), vec.end());
                }
            }
            fout << vec[i] << " ";
        }
        fout << endl;
    }
    

    Before this iteration you have vector with the next values: [waiting][on][the][shelf]. When i == 1 you delete "on" from the vector, so that you have the next vector [waiting][the][shelf], but i index still equal to 1, on the next iteration you skip "the" word because the last erase operation reorganized your vector and shifted "the" word to the deleted "on" position.

    You can use remove_if. For example:

     vec.erase(remove_if(vec.begin(), vec.end(), [&]( const string& str )
     {
         return std::find(begin(useless), end(useless), str ) != end(useless);
     }), vec.end()); 
    

    After that you will get filtered vector, without words in the useless array.

    By the way we can optimize that. The algorithm above has the next complexity: O(vec_size*useless_size). We can optimize it to O(vec_size) only. Instead of array you can use hash collection (unordered_set) It gives you constant time for element access.

     unordered_set<string> useless = { "an", "the" , "of", "to", "and", "but", "nor", "or", "some", "any", "very", "in", "on", "at", "before", "after", "into", "over", "through", "along" };
     ...
     vec.erase(remove_if(vec.begin(), vec.end(), [&](const string& str)
     {
         return  useless.find(str) != useless.end();
     }), vec.end());