Search code examples
phpregexpcreyaml-front-matter

Splitting out front matter from a Twig template with PCRE


Let's say I want to split the metadata from the Twig template

---
some metadata
multiple lines
---
Twig template
More data

I came up with /\A---\R(.+?\R)?---\R(.*)\Z/s which does it more or less but I am wondering whether it can become pathological in backtracking.


Solution

  • Your regex seems working well.

    You may make it somewhat more efficient if you "unroll" the first lazy dot pattern like

    /\A---\R(.*(?:\R(?!---\R).*)*\R)?---\R(?s)(.*)\Z/
    

    See the regex demo. Note: no mofifiers are necessary, there is one inline (?s) modifier inside the pattern.

    Details

    • \A - start of string
    • ---\R - a full --- line with a linebreak
    • (.*(?:\R(?!---\R).*)*\R)? - an optional Capturing group 1:
      • .* - the whole line
      • (?:\R(?!---\R).*)* - 0 or more repetitions of
        • \R(?!---\R) - a line break that is not followed with a --- line followed with a linebreak
        • .* - the whole line
      • \R - a linebreak sequence
    • ---\R - a full --- line with a linebreak
    • (?s) - an inline DOTALL modifier making the dots to the right match line break chars
    • (.*) Group 2: any 0+ chars as many as possible
    • \Z - end of string.