Following is an example of a list of multiline records, each starting with a fixed string label (LABEL
):
<Irrelevant line>
...
<Irrelevant line>
LABEL ...
...
...
LABEL ...
...
...
LABEL ...
...
...
LABEL ...
...
...
Is there a Java regular expression that can much the above and extract each record, i.e.
LABEL ...
...
...
Also, is this the fastest way of extracting those records, or reading line-by-line and checking the start of the string would yield faster results?
To iterate over all the LABEL
groups, use this:
Pattern regex = Pattern.compile("(?sm)LABEL.*?(?=^LABEL|\\Z)");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
// the current LABEL group: regexMatcher.group()
}
See the demo for the various matches.
Explanation
(?s)
activates DOTALL
mode, allowing the dot to match across lines(?m)
turns on multi-line mode, allowing ^
and $
to match on each lineLABEL
matches literal characters.*?
lazily matches all chars up to...(?=^LABEL|\\Z)
can assert that what follows is the next LABEL
or the end of the string