Search code examples
phpregexpreg-replace-callback

Why this regex is not greedy in PHP


This regex should match lists just like in Markdown:

/((?:(?:(?:^[\+\*\-] )(?:[^\r\n]+))(?:\r|\n?))+)/m

It works in Javascript (with g flag added) but I have problems porting it to PHP. It does not behave greedy. Here's my example code:

$string = preg_replace_callback('`((?:(?:(?:^\* )(?:[^\r\n]+))(?:\r|\n?))+)`m', array(&$this, 'bullet_list'), $string);

function bullet_list($matches) { var_dump($matches) }

When I feed to it a list of three lines it displays this:

array(2) { [0]=> string(6) "* one " [1]=> string(6) "* one " } array(2) { [0]=> string(6) "* two " [1]=> string(6) "* two " } array(2) { [0]=> string(8) "* three " [1]=> string(8) "* three " } 

Apparently var_dump is being called three times instead of just once as I expect from it since the regex is greedy and must match as many lines as possible. I have tested it on regex101.com. How do I make it work properly?


Solution

  • This regex won't work correctly if you have \r\n newlines in your input text.

    The part (?:\r|\n?) matches either an \r or an \n, but not both. (regex101 treats newlines as \n only, so it works there).

    Does the following work?

    /(?:(?:(?:^[+*-] )(?:[^\r\n]+))[\r\n]*)+/m
    

    (or, after removal of all the unnecessary non-capturing groups - thanks @M42!)

    /(?:^[+*-] [^\r\n]+[\r\n]*)+/m