Search code examples
regexregex-groupregex-greedy

Repeated capture group and lazy issue


I want to capture "foo" and any occurrence of "bar". Plus, I need to ignore any string between them and bar is optional.

Example text:

foo ignoreme barbarbar
foo ignoreme bar
foo ignoreme 
foo something abcbar

Expected:

foo barbarbar
foo bar
foo
foo bar

A tried with this regex :

(foo)(?:.*)((?:bar)*)

But the .* capture all the end of the string:

foo
foo
foo
foo

So I changed it to lazy to stop the capture:

(foo)(?:.*?)((?:bar)*)

I almost got the same result, only foo is captured.

It seems it stop the capture to early, however, this almost works:

(foo)(?:.*?)((?:bar)+)

foo barbarbar
foo bar
<miss third line>
foo bar

But it misses the third line because the pattern "bar" must appear one time. Example here https://regex101.com/r/NIUPew/1

Any idea from a regex guru? Thanks!


Solution

  • You can move the repeated capturing group into the non-capturing group while making that group optional:

    (foo)(?:.*?((?:bar)+))?
    

    See the regex demo.

    Details:

    • (foo) - Group 1: foo
    • (?:.*?((?:bar)+))? - an optional non-capturing group that will be tried at least once (because ? is a greedy quantifier matching the quantified pattern one or zero times) to match
      • .*? - any zero or more chars other than line break chars as few as possible
      • ((?:bar)+) - Group 2: one or more bar char sequences.