Search code examples
regexgore2

regexp: multiline, non-greedy match until optional string


Using Go's regexp, I'm trying to extract a predefined set of ordered key-value (multiline) pairs whose last element may be optional from a raw text, e.g.,

 Key1:
  SomeValue1
  MoreValue1
 Key2:
  SomeValue2
  MoreValue2
 OptionalKey3:
  SomeValue3
  MoreValue3

(here, I want to extract all the values as named groups)

If I use the default greedy pattern (?s:Key1:\n(?P<Key1>.*)Key2:\n(?P<Key2>.*)(?:OptionalKey3:\n(?P<OptionalKey3>.*))?), it never sees OptionalKey3 and matches the rest of the text as Key2.

If I use the non-greedy pattern (?s:Key1:\n(?P<Key1>.*)Key2:\n(?P<Key2>.*?)(?:OptionalKey3:\n(?P<OptionalKey3>.*))?), it doesn't even see SomeValue2 and stops immediately: https://regex101.com/r/QE2g3o/1

Is there a way to optionally match OptionalKey3 while also able to capture all the other ones?


Solution

  • Use

    (?s)\AKey1:\n(?P<Key1>.*)Key2:\n(?P<Key2>.*?)(?:OptionalKey3:\n(?P<OptionalKey3>.*))?\z
    

    See regex proof.

    EXPLANATION

    --------------------------------------------------------------------------------
      (?s)                     set flags for this block (with . matching
                               \n) (case-sensitive) (with ^ and $
                               matching normally) (matching whitespace
                               and # normally)
    --------------------------------------------------------------------------------
      \A                       the beginning of the string
    --------------------------------------------------------------------------------
      Key1:                    'Key1:'
    --------------------------------------------------------------------------------
      \n                       '\n' (newline)
    --------------------------------------------------------------------------------
      (?P<Key1>                 group and capture to "Key1":
    --------------------------------------------------------------------------------
        .*                       any character (0 or more times (matching
                                 the most amount possible))
    --------------------------------------------------------------------------------
      )                        end of "Key1"
    --------------------------------------------------------------------------------
      Key2:                    'Key2:'
    --------------------------------------------------------------------------------
      \n                       '\n' (newline)
    --------------------------------------------------------------------------------
      (?P<Key2>                group and capture to "Key2":
    --------------------------------------------------------------------------------
        .*?                      any character (0 or more times (matching
                                 the least amount possible))
    --------------------------------------------------------------------------------
      )                        end of "Key2"
    --------------------------------------------------------------------------------
      (?:                      group, but do not capture (optional
                               (matching the most amount possible)):
    --------------------------------------------------------------------------------
        OptionalKey3:            'OptionalKey3:'
    --------------------------------------------------------------------------------
        \n                       '\n' (newline)
    --------------------------------------------------------------------------------
        (?P<OptionalKey3>         group and capture to "OptionalKey3":
    --------------------------------------------------------------------------------
          .*                       any character (0 or more times
                                   (matching the most amount possible))
    --------------------------------------------------------------------------------
        )                        end of "OptionalKey3"
    --------------------------------------------------------------------------------
      )?                       end of grouping
    --------------------------------------------------------------------------------
      \z                       the end of the string