Search code examples
c++regexgreedy

C++ regex for overlapping matches


I have a string 'CCCC' and I want to match 'CCC' in it, with overlap.

My code:

...
std::string input_seq = "CCCC";
std::regex re("CCC");
std::sregex_iterator next(input_seq.begin(), input_seq.end(), re);
std::sregex_iterator end;
while (next != end) {
    std::smatch match = *next;
    std::cout << match.str() << "\t" << "\t" << match.position() << "\t" << "\n";
    next++;
}
...

However this only returns

CCC 0 

and skips the CCC 1 solution, which is needed for me.

I read about non-greedy '?' matching, but I could not make it work


Solution

  • Your regex can be put into the capturing parentheses that can be wrapped with a positive lookahead.

    To make it work on Mac, too, make sure the regex matches (and thus consumes) a single char at each match by placing a . (or - to also match line break chars - [\s\S]) after the lookahead.

    Then, you will need to amend the code to get the first capturing group value like this:

    #include <iostream>
    #include <regex>
    #include <string>
    using namespace std;
    
    int main() {
        std::string input_seq = "CCCC";
        std::regex re("(?=(CCC))."); // <-- PATTERN MODIFICATION
        std::sregex_iterator next(input_seq.begin(), input_seq.end(), re);
        std::sregex_iterator end;
        while (next != end) {
            std::smatch match = *next;
            std::cout << match.str(1) << "\t" << "\t" << match.position() << "\t" << "\n"; // <-- SEE HERE
            next++;
        }
        return 0;
    }
    

    See the C++ demo

    Output:

    CCC     0   
    CCC     1