I have the following string:
one two three four five six seven eight nine
And I am trying to construct a regular expression that groups the string into three groupings:
I have tried variations of (.*\b(one|two|three)?)(.*\b(four|five|six)?)(.*\b(seven|eight|nine)?)
but this pattern splits the full match into one group that contains the full string - the demo can be found here.
Trying (.*\b(one|two|three))(.*\b(four|five|six))(.*\b(seven|eight|nine))
seems to get me closer to what I want but the match information panel shows that the pattern identifies two matches each containing six capture groups.
I am using the OR statement because the groups can be of any length, e.g. two three four
, applying the pattern to this string should identify two groups -
A large regex that probably does it
(?=.*\b(?:one|two|three|four|five|six|seven|eight|nine)\b)(\b(?:one|two|three)(?:\s+(?:one|two|three))*\b)?.+?(\b(?:four|five|six)(?:\s+(?:four|five|six))*\b)?.+?(\b(?:seven|eight|nine)(?:\s+(?:seven|eight|nine))*\b)?
https://regex101.com/r/rUtkyU/1
Readable version
(?=
.* \b
(?:
one
| two
| three
| four
| five
| six
| seven
| eight
| nine
)
\b
)
( # (1 start)
\b
(?: one | two | three )
(?:
\s+
(?: one | two | three )
)*
\b
)? # (1 end)
.+?
( # (2 start)
\b
(?: four | five | six )
(?:
\s+
(?: four | five | six )
)*
\b
)? # (2 end)
.+?
( # (3 start)
\b
(?: seven | eight | nine )
(?:
\s+
(?: seven | eight | nine )
)*
\b
)? # (3 end)