Search code examples
phppreg-replacebacktracking

What is the "unit" of pcre.backtrack_limit?


I'm encountering an issue where preg_replace() with a complicated regular expression causes an error (PREG_BACKTRACK_LIMIT_ERROR) due to pcre.backtrack_limit being too low, which is set to 1,000,000 by default. I set this to 10,000,000, and it works for this particular application.

My question is, what exactly is backtracking limit's, loosely defined, "unit"? Does the 1,000,000 figure correspond to memory size? If not, what does it signify? I'm trying to understand what a reasonable setting for this on my environment.

Reference on preg_replace: https://www.php.net/manual/en/pcre.configuration.php#ini.pcre.backtrack-limit

Reference on backtracking: In regular expressions, what is a backtracking / back referencing?


Solution

  • From the PCRE source code, this error is returned when "match()" is called more than 1,000,000 times recursively:

    /* First check that we haven't called match() too many times, or that we
    haven't exceeded the recursive call limit. */
    
    if (md->match_call_count++ >= md->match_limit) RRETURN(PCRE_ERROR_MATCHLIMIT);
    

    That is converted into a "PHP_PCRE_BACKTRACK_LIMIT_ERROR" error here.

    According to the pcreapi manpage (see https://serverfault.com/a/408272/140833 ):

    Internally, PCRE uses a function called match() which it calls repeatedly (sometimes recursively). The limit set by match_limit is imposed on the number of times this function is called during a match, which has the effect of limiting the amount of backtracking that can take place. For patterns that are not anchored, the count restarts from zero for each position in the subject string.

    I think that the unit is therefore something like "Number of backtracking attempts". I'm not sure that it's 1-to-1 with that though.

    Here's a demo isolating the error case with a simple "Catastrophic Backtracking" regex:

    <?php
    
    ini_set('pcre.backtrack_limit', 100);
    
    for ($len = 1000; $len <= 1001; $len++) {
    
        $x = str_repeat("x", $len);
        $ret = preg_match("/x+x+y/", $x);
    
        echo "len = " . $len . "\n";
        echo "preg_match = " . $ret . "\n";
        echo "PREG_BACKTRACK_LIMIT_ERROR = " . PREG_BACKTRACK_LIMIT_ERROR . "\n";
        echo "preg_last_error = " . preg_last_error() . "\n";
        echo "\n";
    }
    

    Run this code here: https://3v4l.org/EpaNC, to get this output:

    len = 1000
    preg_match = 0
    PREG_BACKTRACK_LIMIT_ERROR = 2
    preg_last_error = 0
    
    len = 1001
    preg_match = 
    PREG_BACKTRACK_LIMIT_ERROR = 2
    preg_last_error = 2