Python Re - Named Capture Group Too Greedy

I would like to pick up "Bar" from the following strings:

FooFooFoo the FooFoo the Bar Foo
FooFooFoo the FooFoo my Bar Foo

But the regex I wrote (the|my) (?P<bar>.+?) Foo seems to be too greedy and collects more text than required (example at regex101.com)

edit: "Bar" is an exemplified string to match. In my real case scenario that could me made up of multiple words.

What am I doing wrong? Thanks!

I need to run this with the standard re python library.

Solution

Your main issue is that the regex engine searches for matches from left to right, and once my or the is found, the .+? will match as few chars other than line break chars as possible, but as many as necessary to complete a valid match.

You need to match all text (using .*?) up to the last word (that can be matched with a \w+ pattern) before Foo:

(the|my) .*?(?P<bar>\w+) Foo

See the regex demo. Another variation is to match the or my as whole words and match any text up to the closest non-whitespace char chunk before Foo:

\b(the|my)\b.*?(?P<bar>\S+)\s+Foo

See this regex demo. Details:

\b(the|my)\b - the the or my word as a whole word
.*? - any zero or more chars other than line break chars, as few as possible
(?P<bar>\S+) - Group "bar": one or more non-whitespace chars
\s+ - one or more whitespace chars
Foo - a Foo string.