Search code examples
c++stringworkflow

Finding all occurrences using rfind, flow challenges?


Following a c++ tutorial and teaching about find() the following code was implemented to search for all the "cat" occurrences in a string:

std::string input;
std::size_t i = 0, x_appearances = 0;
std::getline(std::cin,input);   
for(i = input.find("cat",0); i != std::string::npos; i=input.find("cat", i))
{
    ++x_appearances;
    ++i; //Move past the last discovered instance to avoid finding the same string
}

Then the tutorial challenges the apprentice to change find() for rfind(), and that's where the problems came in, first I tried what seemed to be the obvious approach:

for(i = input.rfind("cat",input.length()); i != std::string::npos; i=input.rfind("cat", i))
{
    ++x_appearances;
    --i; //Move past the last discovered instance to avoid finding the same string
}

but with this solution I fell into an infinite loop. Then I discovered that it was happening because the increment is performed before the condition check, and that the increment rfind() was always finding a match even with i==std::string::npos (if the match is on the beginning of the string, for example "cats"). My final solution came to be:

int n=input.length();
for(i = input.rfind("cat",input.length()); n>0 && i!=std::string::npos; i=input.rfind("cat", i))
{
    ++x_appearances;
    n=i;
    --i; //Move past the last discovered instance to avoid finding the same string
}

With n I can keep the track of the position in the string, and with it exit the for loop when the entire string had been searched.

So my question is: Is my approach correct? Did I need an extra variable or is there any other simpler way of doing this?


Solution

  • for(i = input.rfind("cat",input.length()); i != std::string::npos; i=input.rfind("cat", i))
    {
        ++x_appearances;
        --i; //Move past the last discovered instance to avoid finding the same string
    }
    

    The problem with the above is the --i inside the loop. Suppose the input string starts with "cat". Your algorithm will eventually find that "cat" with i being 0. Since you've declared i as a std::size_t, subtracting 1 from 0 results in the largest possible std::size_t. There's no warning, no overflow, no undefined behavior. This is exactly how unsigned integers must work, per the standard.

    Somehow you need to handle this special case. You could use an auxiliary variable and a more convoluted test in your loop. An alternative is to keep your code simple and at the same time make it blatantly obvious you are explicitly handling this special case:

    for (i = input.rfind("cat"); i != std::string::npos; i=input.rfind("cat", i-1))
    {
        ++x_appearances;
        // Finding "cat" at the start means we're done.
        if (i == 0) {
           break;
        }
    }
    

    Note also that I've changed the loop statement a bit. The default value for pos is std::string::npos, which means search from the end of the string. There's no need for that second argument with the initializer. I also moved the --i into the update part of the for loop, changing input.rfind("cat",i) to input.rfind("cat",i-1). Since i is always positive at this point, there's no danger in subtracting one.