Regex capture multi-line groups

I'm struggling in creating a regex to capture what's included between two keywords in a multi-line file.

In particular, consider the following file:

#%META
# date: 2022-08-27
# generated-by: Me
# id: 1
#%ENDS

#%BODY
....
#%ENDS

#%META
# date: 2022-08-27
# generated-by: Another Me
# id: 2
#%ENDS

#%BODY
....
#%ENDS

I wanted to parse what is included between the #%META and the #%ENDS keywords, if possible, without the leading #, i.e., the desired result is to capture both:

date: 2022-08-27
generated-by: Me
id: 1

and

date: 2022-08-27
generated-by: Another Me
id: 2

I come out with following regex: (?<=#%META\n)([\S\s]*?)(?=#%ENDS\n).

However this is not capable to identify the two chuncks of text to be matched as well as does not remove the leading #.

Could anyone help in that?

Thank's a lot! :)

Solution

You might use a pattern to first capture all the parts between #%META and #%ENDS and then after process the capture group 1 values removing the leading # followed by optional spaces.

^#%META((?>\R(?!#%(?:META|ENDS)$).*)+)\R#%ENDS$

Explanation

^ Start of string
#%META Match literally
( Capture group 1
- (?> Atomic group
  - \R Match any unicode newline sequence
  - (?!#%(?:META|ENDS)$) Negative lookahead, assert that the line is not #%META or #%ENDS
  - .* Match the whole line
- )+ Close the atomic group and repeat 1+ times
) Close group 1
\R Match any unicode newline sequence
#%ENDS Match literally
$ End of string

Regex demo | PHP demo

Example

$re = '/^#%META((?>\R(?!#%(?:META|ENDS)$).*)+)\R#%ENDS$/m';
$str = '#%META
# date: 2022-08-27
# generated-by: Me
# id: 1
#%ENDS

#%BODY
....
#%ENDS

#%META
# date: 2022-08-27
# generated-by: Another Me
# id: 2
#%ENDS

#%BODY
....
#%ENDS';

if (preg_match_all($re, $str, $matches)) {
    $result = array_map(function ($s) {
        return preg_replace("/^#\h*/m", "", trim($s));
    }, $matches[1]);
    var_export($result);
}

Output

array (
  0 => 'date: 2022-08-27
generated-by: Me
id: 1',
  1 => 'date: 2022-08-27
generated-by: Another Me
id: 2',
)