Search code examples
regexvbscriptregex-groupoptional-values

Regex: Match optional string to group


I have a file describing objects in which some properties of the object are optional. For example (color is optional):

type=dog
sex=male
name=wolf
color=brown
type=dog
sex=male
name=bob
type=cat
sex=male
name=tom
color=black
type=dog
sex=female
name=simona
color=white

I'm looking for a regex that gives me a pair of properties for a dog "name" - "color". I'm waiting for something like this:

wolf - brown
bob - 
simona - white

I started with

type=dog[\s\S]*?name=(\w+)[\s\S]*?color=(\w+)

Which gives wrong:

wolf - brown
bob - black
simona - white

Then I made group from color(which gives the same) and added "?" quantifier:

type=dog[\s\S]*?name=(\w+)[\s\S]*?(color=(\w+))?

But, instead of the desired result I lost 2nd group in all matches:

wolf - 
bob - 
simona - 

What's wrong with my expression and how to achieve my goal. Please do not use Lookbehind, Lookahead and Conditionals. VBScript not implement them.

My example on regex101.com


Solution

  • Set regex.Multiline = True and use the following regex:

    ^type=dog[\s\S]*?^name=(\w+)(?:(?:(?!^type=)[\s\S])*?^color=(\w+))?
    

    See the regex demo

    Details

    • ^ - start of a line
    • type=dog - a string
    • [\s\S]*? - 0 or more chars as few as possible
    • ^ - start of a line
    • name= - a literal string
    • (\w+) - Group 1: any one or more letters, digits or underscores
    • (?:(?:(?!^type=)[\s\S])*?^color=(\w+))? - an optional non-capturing group matching 1 or 0 occurrences of
      • (?:(?!^type=)[\s\S])*? - any char, 0 or more times, as few as possible, that does not start a type= substring at the start of a line
      • ^color= - color= substring start of a line
      • (\w+) - Group 2: any one or more letters, digits or underscores