I would like to pick up "Bar" from the following strings:
FooFooFoo the FooFoo the Bar Foo
FooFooFoo the FooFoo my Bar Foo
But the regex I wrote (the|my) (?P<bar>.+?) Foo
seems to be too greedy and collects more text than required (example at regex101.com)
edit: "Bar" is an exemplified string to match. In my real case scenario that could me made up of multiple words.
What am I doing wrong? Thanks!
I need to run this with the standard re python library.
Your main issue is that the regex engine searches for matches from left to right, and once my
or the
is found, the .+?
will match as few chars other than line break chars as possible, but as many as necessary to complete a valid match.
You need to match all text (using .*?
) up to the last word (that can be matched with a \w+
pattern) before Foo
:
(the|my) .*?(?P<bar>\w+) Foo
See the regex demo. Another variation is to match the
or my
as whole words and match any text up to the closest non-whitespace char chunk before Foo
:
\b(the|my)\b.*?(?P<bar>\S+)\s+Foo
See this regex demo. Details:
\b(the|my)\b
- the the
or my
word as a whole word.*?
- any zero or more chars other than line break chars, as few as possible(?P<bar>\S+)
- Group "bar": one or more non-whitespace chars\s+
- one or more whitespace charsFoo
- a Foo
string.