Search code examples
regexregex-groupre2text-capture

Capture one suffix containing known substring when multiple matching prefixes (without known substring) found


Given an input of multiple string, some containing the prefix is:, I need to capture one instance of the substring "Foo" or "Bar" following the is: prefix regardless of how many times is:Foo/is:Bar or is:Baz/is:Xyzzy appear.

Using the following regex: .*is:\b([Foo|Bar]*)\b.*

And using the following examples of test input lines with matches:

"is:Baz is:Foo FooBar"          # Captures "Foo"
"is:Foo FooBar is:Bar"          # Captures "Bar"
"is:Bar FooBar FooBaz Baz"      # Captures "Bar"
"FooBar is:Bar FooBaz"          # Captures "Bar"
"FooBar is:Xyzzy is:Foo"        # Captures "Foo
"is:Baz FooBar is:Foo"          # Captures "Foo"
"FooBar is:Foo is:Xyzzy"        # No capture

In the final line I want to also capture is:Foo, but the capture is thrown off by is:Xyzzy. This isn't an exhaustive list of possible test cases but it illustrates to problem I'm coming up against.


Solution

  • You can write the pattern using a grouping without the [ and ] that denotes a character class.

    You don't need a word boundary here :\b as it is implicit due to the following alternation of (Foo|Bar)

    You can append a word boundary before \bis

    .*\bis:(Foo|Bar)\b.*
    

    See a regex101 demo.