Search code examples
phpregexwhitespacepreg-match-allbad-gateway

relatively simple Preg_match_all causes 502 Bad Gateway


I have a preg_match_all functions with pattern:

preg_match_all(
    '/\[(if) ([^\]]*)\]
    ((?:(?!\[if).|(?R))*?)
    \[endif\]/sx',
    $text,
    $matches
);

It's quite easy pattern I guess, it looks for a syntax [if condition] sometext [endif], but it supports also embed ifs f.e. [if condition1] aa [if condition2] bb [endif] [endif]. I used s switch to treat newlines as dots (as I want to have it working multiline) and x for easier reading (but removing x does not fix the issue).

It works fine for most of input data I have, but for some specific input it causes 502 Bad gateway error on nginx server without any errors or exceptions in logs that. I'm using nginx + php-fpm (5.6.15-1+deb.sury.org~trusty+1), but same happens on php7.

Here's causing 502 Bad gateway error PHP code that you can easily check it, very simple, just a variable and regex.

http://pastebin.com/G54Xa0as

Please, be sure that you copied a content 1:1, with all spaces, tabs etc.

The very stange thing is that you can remove almost any single line or even delete one indent (any few spaces in any place) to make it working.

I have no more ideas what's wrong here, I was able to create this single file to demonstrate my issue, but have no ideas how to fix it.


Solution

  • Your regex contains a negative lookahead that "tempers" the dot pattern. However, you failed to add the end delimiter to it, and thus, it became rather "heavy".

    I suggest adding the end delimiter ([endif]) to the lookahead check:

    \[(if)\s+([^\]]*+)\]((?>(?!\[(?:end)?if\b).|(?R))*)\[endif\]
                                 ^^^^^^^^
    

    See demo

    Or, you can even unroll the tempered greedy token as

    \[(if)\s+([^\]]*+)\]((?>[^[]++(?:\[(?!(?:end)?if\b)[^[]*)*|(?R))*)\[endif\]
    

    See the regex demo (however, if a [ can follow [if...], it won't work).

    Also, note that your regex has a space after (if) and since you are using /x modifier, it is not considered as a literal space, but is ignored. That is why I changed it to \s+.