Search code examples
regexpcregedit

Regex to repeatedly capture group within a larger match?


Context, syntax highlighting in gedit.

Problem: I want to capture all occurrences within a specific area. Toy example:

other text here $5
keyword1 ->  (( ran$3dom$6t:,ext$9    ))
keyword1  -> ((    ran$2dom$4t:,ext$6 ))
other text here $7

I want to capture (highlight) al the $0-9 (single digit) occurrences, within the (( text )) of the keyword1. (here $3, $6, $9, $2, $4, $6 but NOT $5 and $7). This boils down to: How can I repeatedly capture a group within a larger match?

I can grab all the text where the groups can occur with: (?<=keyword1)|\(\(.*\)\) (gedit uses \g by default)

<context id="keyword1" style-ref="argument">
  <match>(?<=keyword1)|\(\(.*\)\)</match>
</context>

I have found this related question: How can I write a regex to repeatedly capture group within a larger match? but that answer uses infinite repetition inside look-behind which is, unfortunately, not supported by gedit (as far as i know). Any suggestion?


Solution

  • Description

    To ensure you're only working on lines starting with your keyword, then I see this as a two step operation.

    1. collect each of the lines you're interested
    2. extract the $[0-9] substrings

    Step 1

    This regular expression captures lines that resemble keyword1 -> ((...))

    keyword1\s*->\s*\(\(.*\)\)
    

    Regular expression visualization

    Step 2

    \$[0-9](?![0-9])(?=(?:(?!\(\().)*\)\))
    

    Regular expression visualization

    This regular expression will do the following:

    • find all dollar signs followed by a single digit that exist inside the ((...))

    Example

    Live Demo

    https://regex101.com/r/wY3jM6/1

    Sample text

    other text here $5
    keyword1 ->  (( ran$3dom$6t:,ext$9    ))
    keyword1  -> ((    ran$2dom$4t:,ext$6 ))
    other text here $7
    

    Sample Matches

    $3
    $6
    $9
    $2
    $4
    $6
    

    Explanation

    NODE                     EXPLANATION
    ----------------------------------------------------------------------
      \$                       '$'
    ----------------------------------------------------------------------
      [0-9]                    any character of: '0' to '9'
    ----------------------------------------------------------------------
      (?!                      look ahead to see if there is not:
    ----------------------------------------------------------------------
        [0-9]                    any character of: '0' to '9'
    ----------------------------------------------------------------------
      )                        end of look-ahead
    ----------------------------------------------------------------------
      (?=                      look ahead to see if there is:
    ----------------------------------------------------------------------
        (?:                      group, but do not capture (0 or more
                                 times (matching the most amount
                                 possible)):
    ----------------------------------------------------------------------
          (?!                      look ahead to see if there is not:
    ----------------------------------------------------------------------
            \(                       '('
    ----------------------------------------------------------------------
            \(                       '('
    ----------------------------------------------------------------------
          )                        end of look-ahead
    ----------------------------------------------------------------------
          .                        any character
    ----------------------------------------------------------------------
        )*                       end of grouping
    ----------------------------------------------------------------------
        \)                       ')'
    ----------------------------------------------------------------------
        \)                       ')'
    ----------------------------------------------------------------------
      )                        end of look-ahead
    ----------------------------------------------------------------------