Search code examples
regexregex-lookaroundspcreregex-group

How to capture brackets with variable in-between amount of space as a single group?


Suppose I have the following text:

Yes: [x]
Yes: [  x]
Yes: [x  ]
Yes: [  x  ]
No: [
No: ]

I am interested in capturing the angular brackets [ and ] containing an x with a variable amount of horizontal space on either side of the x. The bit I am struggling with is that both angular brackets must be captured into a group with the same ID (i.e., $1).

I started with a combination of positive lookahead and lookbehind assertions using the following regex:

\[(?=\h*x)|(?<=x)\h*\K\]

Which produces the following matches (i.e., see demo with the extended flag enabled for clarity):

Example first attempt

Then, I tried placing a capturing group around the whole expression, but the match extends to the horizontal space after the positive lookbehind (?<=x)\h* as shown below (i.e., also see demo).

Example second attempt

I am using Oniguruma regular expressions and the PCRE flavor. Do you have any ideas if and how this can be done?


Solution

  • You could make use of a branch reset group:

    (?|(\[)(?=\h*x\h*])|(?<=\[)\h*x\h*(]))
    
    • (?| Branch reset group
      • (\[)(?=\h*x\h*]) Capture [ in group 1, asserting x between optional horizontal whitespace chars to the right followed by ]
      • | Or
      • (?<=\[)\h*x\h*(]) Assert [ to the left, then match x between optional horizontal whitespace and capture ] in group 2
    • ) Close branch reset group

    Regex demo