Search code examples
regexregex-group

Capturing all occurences of a repeated group in a string and reference them for substitution


I have following regex that matches any number in the string and returns it in the group, which then i replace with another text.

For the sample string:

/text_1/123456/text_2

With /^(.*[^0-9])+([0-9]{3,}+)+(.*)$ and using substitution like $1captured_group$3 i get my desired result i.e. /text_1/captured_group/text_2

However for scenarios where the capturing groups appears more than once in the give string such as:

/text_1/123456/text_2/789011
/text_1/123456/text_2/789011/abc/12345

The given regex would only capture last group i.e. 789011 and 12345 respectively. However, what i want is to capture all of the groups and be able to reference them later to replace them.

An explanation given on regex101.com i beleive addresses my scenario:

A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations or use a non-capturing group instead if you're not interested in the data.

However, i am not sure how to Put a capturing group around the repeated group to capture all iterations and later reference all the matched values?


Solution

  • As Hao Wu commented:

    "If you want to match multiple occurrences you need to get rid of the anchors (^, $) and add a global (g) modifier, such as /\b[0-9]{3,}\b/g"

    As for storing matches and referencing them for later use, you could have an array of objects wherein each object has the match and an array of two indices -- the first index being the index of the start of the match and the second index being the index of the end of the match:

    // string = `123`
    {match: 123, indices: [0, 2]}
    

    In the example below, the function tagMatches(str, rgx) uses .matchAll() method.

    const tagMatches = (str, rgx) => {
      const matches = str.matchAll(rgx);
      let result = [];
      for (const match of matches) {
        result.push({"match": +match[0], "indices": [match.index, match.index + match.length]});
      }
      return result;
    }
    
    const string = `utfuduyiutcv fvtycy 1sdtyveaf 678900 amsiofjsogifn979/125487/`;
    const regexp = /\b(\d){3,}\b/g;
    
    const tagged = tagMatches(string, regexp)
    
    console.log(tagged);
    
    console.log("first match: "+tagged[0].match);
    console.log("second match start: "+tagged[1].indices[0]);
    console.log("first match end: "+tagged[0].indices[1]);