Search code examples
regexgostring-parsing

Regular expression enforcing at least one in two groups


I have to parse a string using regular expressions in which at least one group in a set of two is required. I cannot figure out how to write this case.

To illustrate the problem we can think parsing this case:

String: aredhouse theball bluegreencar the
Match:  ✓         ✓       ✓            ✗
  1. Items are separated by spaces
  2. Each item is composed by an article, a colour and an object defined by groups in the following expression (?P<article>the|a)?(?P<colour>(red|green|blue|yellow)*)(?P<object>car|ball|house)?\s*

  3. An item can have an 'article' but must have a 'colour' or/and an 'object'.

Is there a way of making 'article' optional but require at least one 'colour' or 'object' using regular expressions?

Here is the coded Go version of this example, however I guess this is generic regexp question that applies to any language.


Solution

  • This is working with your testcases.

    /
        (?P<article>the|a)?                         # optional article
        (?:                                         # non-capture group, mandatory
            (?P<colour>(?:red|green|blue|yellow)+)  # 1 or more colors  
            (?P<object>car|ball|house)              # followed by 1 object
            |                                       # OR
            (?P<colour>(?:red|green|blue|yellow)+)  # 1 or more colors
            |                                       # OR
            (?P<object>car|ball|house)              # 1 object
        )                                           # end group
    /x        
    

    It can be reduced to:

    /
        (?P<article>the|a)?                         # optional article
        (?:                                         # non-capture group, mandatory
            (?P<colour>(?:red|green|blue|yellow)+)  # 1 or more colors  
            (?P<object>car|ball|house)?             # followed by optional object
            |                                       # OR
            (?P<object>car|ball|house)              # 1 object
        )                                           # end group
    /x