Search code examples
phpregextwig

RegEx to extract block from twig template


In PHP, I want to extract the text included in a twig block and thought that regex would be the most efficient.

Let's say I have a file "index.twig" with this content:

{% block content %}
Content of the page...
{% endblock %}

This code works perfectly fine:

$input = file_get_contents("index.twig"); 
preg_match_all('/\{%\s*block\s*content\s*\%}([^\%}]*)\{%\s*endblock\s*\%}/', $input, $output);

$output will contain the expected result.

However, if the input file is something like:

{% block content %}
{{ a_tag }}
Content of the page...
{% endblock %}

In this case, the closing }} breaks the regex and $output is empty.

Any clue for the correct regex?

Another solution to extract the content of the block?

I would like to get:

{{ a_tag }}
Content of the page...

Solution

  • Using [^\%}]* means that you match any character except the listed using a negated character class, which in this case are % (which you don't have to escape) and }.

    Using that approach, you can not match {{ a_tag }} between the blocks.


    One way to get the values is to match the starting code for the block until the first occurrence of the end block. In between you match all the lines that do not start with the endblock pattern.

    Instead of using \s, you could use \h to match a horizontal whitespace char and \R to match any unicode newline sequence.

    {%\h*block\h*content\h*%}\R((?:(?!{%\h*endblock\h*%}).*\R)*){%\h*endblock\h*%}
    

    The pattern will match:

    • {%\h*block\h*content\h*%}\R Match the block content part and a newline
    • ( Capture group 1
      • (?:(?!{%\h*endblock\h*%}).*\R)* Match the whole line and a newline if the line does not start with the endblock pattern
    • ) Close group 1
    • {%\h*endblock\h*%} Match the endblock part

    Regex demo