Search code examples

Regex capture multi-line groups

I'm struggling in creating a regex to capture what's included between two keywords in a multi-line file.

In particular, consider the following file:

# date: 2022-08-27
# generated-by: Me
# id: 1


# date: 2022-08-27
# generated-by: Another Me
# id: 2


I wanted to parse what is included between the #%META and the #%ENDS keywords, if possible, without the leading #, i.e., the desired result is to capture both:

date: 2022-08-27
generated-by: Me
id: 1


date: 2022-08-27
generated-by: Another Me
id: 2

I come out with following regex: (?<=#%META\n)([\S\s]*?)(?=#%ENDS\n).

However this is not capable to identify the two chuncks of text to be matched as well as does not remove the leading #.

Could anyone help in that?

Thank's a lot! :)


  • You might use a pattern to first capture all the parts between #%META and #%ENDS and then after process the capture group 1 values removing the leading # followed by optional spaces.



    • ^ Start of string
    • #%META Match literally
    • ( Capture group 1
      • (?> Atomic group
        • \R Match any unicode newline sequence
        • (?!#%(?:META|ENDS)$) Negative lookahead, assert that the line is not #%META or #%ENDS
        • .* Match the whole line
      • )+ Close the atomic group and repeat 1+ times
    • ) Close group 1
    • \R Match any unicode newline sequence
    • #%ENDS Match literally
    • $ End of string

    Regex demo | PHP demo


    $re = '/^#%META((?>\R(?!#%(?:META|ENDS)$).*)+)\R#%ENDS$/m';
    $str = '#%META
    # date: 2022-08-27
    # generated-by: Me
    # id: 1
    # date: 2022-08-27
    # generated-by: Another Me
    # id: 2
    if (preg_match_all($re, $str, $matches)) {
        $result = array_map(function ($s) {
            return preg_replace("/^#\h*/m", "", trim($s));
        }, $matches[1]);


    array (
      0 => 'date: 2022-08-27
    generated-by: Me
    id: 1',
      1 => 'date: 2022-08-27
    generated-by: Another Me
    id: 2',