Search code examples
regexparentheses

Regex optional parantheses into forced three groups


I have a string full of parentheses like this:

(this) (should) (be) (group) (one) (thisshouldbegrouptwo) (this) (should) (be) (group) (three)

I'd like to regex this into three groups, with the constant string thisshouldbegrouptwo, optionally in parentheses, delimiting the groups:

1. Group: (this) (should) (be) (group) (one)
2. Group: (thisshouldbegrouptwo)
3. Group: (this) (should) (be) (group) (three)

The string (thisshouldbegrouptwo) is a fixed optional string, with its parentheses also optional. If it isn't present, I expect the following result:

1. Group: (this) (should) (be) (group) (one)
2. Group: 
3. Group: (this) (should) (be) (group) (three)

In this case, it would also be OK if the whole string was matched in a single group.

The number of parentheses for group 1 or 3 are not significant, only the parentheses around the middle group matter, and should be with the middle group if found, not with the edge groups.

This is my regular expression so far (demo):

(\(.*\))?(?:\s(\(thisshouldbegrouptwo\)\s))?(\(.*\))

Solution

  • This regular expression will do what you want:

    (.*?)(\(?thisshouldbegrouptwo\)?)(.*)|(.*)
    

    When thisshouldbegrouptwo is present in the string, groups 1 and 3 will have the left and right context, excluding the optional parentheses that might surround group 2.

    When that text is not present in the string, group 4 will contain the entire string.

    Keys elements of the solution:

    • Using a non-greedy expression first, .*? instead of .*, avoids have the opening parenthesis of group 2 lumped into group 1 instead.
    • The |(.*) at the end is a catch-all, but since regular expressions are processed from left to right, you'll get the first part matching when the group two string is present.

    I could not find a solution that separated the non-matching case into two groups, since there is nothing to divide them in that case, but since you said it was OK to keep it together, moving it to group 4 as this expression does should work for you.