Search code examples
javaregexregex-group

Regexp to match multiline sections starting with identifier up to an end identifier


How to write a regexp that will match all the multi line sections (with different amount of lines) that start with a given identifier (Until receiving an end of message keyword).

Example - I want to extract all sections that start with keyword 'START' up until 'END_OF_MSG' from a given text block:

HELLO
START ABC DEF GHI JKL
QWER RANDOM TEXT 213%@#!
UIOP RANDOMZXCVB123456
START ABC DEF GHI JKL
ZZZZZ RANDOMTEXT213%@#!
11111 RANDOMZXCVB123456
$$$$$$ SOMEMORETEXT
START ABC DEF GHI JKL
QWER RANDOMTEXT213%@#!
$$$$$ RANDOMZXCVB123456
END_OF_MSG

I'd like the regexp to produce three sections:

START ABC DEF GHI JKL
QWER RANDOM TEXT 213%@#!
UIOP RANDOMZXCVB123456
START ABC DEF GHI JKL
ZZZZZ RANDOMTEXT213%@#!
11111 RANDOMZXCVB123456
$$$$$$ SOMEMORETEXT
START ABC DEF GHI JKL
QWER RANDOMTEXT213%@#!
$$$$$ RANDOMZXCVB123456

So far i've worked out a regexp which seems to do this almost correctly

(?m)^START(.|\n)*?((?=^START)|END_OF_MSG)

The issue is, that the last section also includes the END_OF_MSG identifier which i'd like to skip. I also think that this regexp does not look like the most optimal way of grabbing those sections. Any ideas on how to improve this?

Example available here: Regex101


Solution

  • You can match START followed by the rest of the line, and match all following lines that do not start with START of END_OF_MSG using a negative lookahead.

    ^START\b.*(?:\R(?!START\b|END_OF_MSG\b).*)*
    

    Explanation

    • ^ Start of string
    • START\b.* Match START, a word boundary and the rest of the line
    • (?: Non capture group
      • \R Match a newline sequence
      • (?!START\b|END_OF_MSG\b).* Match the whole line if it does not start with any of the alternatives using a negative lookahead
    • )* Close the group and repeat it 0+ times to match all the lines

    In Java with doubled backslashes

    ^START\\b.*(?:\\R(?!START\\b|END_OF_MSG\\b).*)*
    

    Regex demo | Java demo