Search code examples
c++regexregex-groupregex-greedy

How to capture repeated group up to N times?


I would like to capture chains of digits in a string, but only up to 3 times.

Any chain of digits afterwards should be ignored. For instance:

T441_S45/1 => 441 45 1 007_S4 => 007 4 41_445T02_74 => 41 445 02

I've tried (\d+){1,3} but that doesn't seem to work...

Any hint?


Solution

  • You may match and capture the first three chunks of digits separated with any amount of non-digits and the rest of the string, and replace with the backreferences to those groups:

    ^\D*(\d+)(?:\D+(\d+))?(?:\D+(\d+))?.*
    

    Or, if the string can be multiline,

    ^\D*(\d+)(?:\D+(\d+))?(?:\D+(\d+))?[\s\S]*
    

    The replacement string will look like $1 $2 $3.

    Details

    • ^ - start of string
    • \D* - 0+ non-digits
    • (\d+) - Group 1: one or more digits
    • (?:\D+(\d+))? - an optional non-capturing group matching:
      • \D+ - 1+ non-digits
      • (\d+) - Group 2: one or more digits
    • (?:\D+(\d+))? - another optional non-capturing group matching:

      • \D+ - one or more non-digits
      • (\d+) - Group 3: one or more digits
    • [\s\S]* - the rest of the string.

    See the regex demo.

    C++ demo:

    #include <iostream>
    #include <regex>
    using namespace std;
    
    int main() {
        std::vector<std::string> strings;
        strings.push_back("T441_S45/1");
        strings.push_back("007_S4");
        strings.push_back("41_445T02_74");
    
        std::regex reg(R"(^\D*(\d+)(?:\D+(\d+))?(?:\D+(\d+))?[\s\S]*)");
        for (size_t k = 0; k < strings.size(); k++)
        {
            std::cout << "Input string: " << strings[k] << std::endl;
            std::cout << "Replace result: " 
                         << std::regex_replace(strings[k], reg, "$1 $2 $3") << std::endl;
        }
        return 0;
    }
    

    Output:

    Input string: T441_S45/1
    Replace result: 441 45 1
    Input string: 007_S4
    Replace result: 007 4 
    Input string: 41_445T02_74
    Replace result: 41 445 02