Search code examples
regexcapturing-group

Regex Capturing Group with alternative doesn't match


I have the following string where I want to match the valid <key>:<value> pairs.

A valid <key> is anything with a non-whitespace character followed by :
A valid <value> is either enclosed in [] or a string without whitespaces.

key1:value1 key#2:@value#2 nyet key3:[@value#3, value4] key4:[value5] :bar

Basically I want to match everything except nyet and :bar

I came up with following regex \S+:(\S+|\[[^]]+\]) but it doesn't seem to match the expression in key3:[@value#3, value4]. In the capturing group, the second alternative \[[^]]+\] should match this expression, so I don't understand why it doesn't match.

The following regex works: \S+:([^([ )]+|\[[^\]]+\]) but doesn't feel elegant.

Questions:

  1. Why does the first regex \S+:(\S+|\[[^]]+\]) not work?
  2. How would a more elegant solution look to match the key value pairs?

Solution

  • In the pattern you can switch the alternatives \S+:(\[[^]]+\]|\S+) but is would also match the [] in that case.

    You could also exclude matching the : in the first part [^\s:]+:(\[[^]]+]|\S+) using a negated character class.

    For the groups, you could use an alternation and check for the existence of group 2 or group 3 for the value.

    ([^\s:]+):(?:\[([^][]+)]|(\S+))
    

    The pattern matches:

    • ([^\s:]+) Capture group 1, match any char except a whitespace char or :
    • : Match the :
    • (?: Non capture group
      • \[([^][]+)] Match [ capture in group 2 any char except [ and ] and match the closing ]
      • | or
      • (\S+) Capture 1+ non whitespace chars in group 3
    • ) Close non capture group

    Regex demo


    If an conditional is supported, you could check if group 2 has captured a [. If it did, you can capture any char except the brackets in group 3.

    The values you want are then in group 1 and group 3.

    ([^\s:]+):(?:(\[)(?=[^][]*]))?((?(2)[^][]+|\S+))\]?
    

    Regex demo