Search code examples
phpregexpreg-replace-callback

PHP preg_replace_callback REGEX (.*?) fails to match


I'm trying to match/replace the following input text with regular expressions in PHP:

{#var1>var2}
  {#>empty}inside empty{#>empty}
  before rows
  {#>firstrow}inside firstrow{#>firstrow}
  {#>row}inside row{#>row}
  {#>lastrow}inside lastrow{#>lastrow}
  after rows
{#}

where var1>var2 is an array:

$var1['var2'] = array('key1' => 'value1', 'key2' => 'value2', ...)

I have the following class to parse text with the regular expression (using preg_replace_callback):

class parse {

  public static function text($text) {
    $text = preg_replace_callback('/\{(#+)([a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*)((?:\>[a-zA-Z0-9_\x7f-\xff]*)*)\}\s*(\{\1\>empty\}\s*(.*?)\s*\{\1\>empty\})?\s*(.*?)\s*(\{\1\>firstrow\}\s*(.*?)\s*\{\1\>firstrow\})?\s*(\{\1\>row\}\s*(.*?)\s*\{\1\>row\})?\s*(\{\1\>lastrow\}\s*(.*?)\s*\{\1\>lastrow\})?\s*(.*?)\s*\{\1\}/s', array('parse', 'replace_array'), $text);
    return $text;
  }

  public static function replace_array($matches) {
    print_r($matches);
  }
}

I get the (incorrect) output:

Array (
  [0] => {#var1>var2>var3} {#>empty}inside empty{#>empty} before rows {#>firstrow}inside firstrow{#>firstrow} {#>row}inside row{#>row} {#>lastrow}inside lastrow{#>lastrow} after rows {#}
  [1] => #
  [2] => var1
  [3] => >var2
  [4] => {#>empty}inside empty{#>empty}
  [5] => inside empty
  [6] =>
  [7] =>
  [8] =>
  [9] =>
  [10] =>
  [11] =>
  [12] =>
  [13] => before rows {#>firstrow}inside firstrow{#>firstrow} {#>row}inside row{#>row} {#>lastrow}inside lastrow{#>lastrow} after rows
) 

When I remove the "before rows" from the input text, I get the correct result:

Array (
  [0] => {#var1>var2>var3} {#>empty}inside empty{#>empty} {#>firstrow}inside firstrow{#>firstrow} {#>row}inside row{#>row} {#>lastrow}inside lastrow{#>lastrow} after rows {#}
  [1] => #
  [2] => var1
  [3] => >var2
  [4] => {#>empty}inside empty{#>empty}
  [5] => inside empty
  [6] =>
  [7] => {#>firstrow}inside firstrow{#>firstrow}
  [8] => inside firstrow
  [9] => {#>row}inside row{#>row}
  [10] => inside row
  [11] => {#>lastrow}inside lastrow{#>lastrow}
  [12] => inside lastrow [13] => after rows
)

I'm already searching for a day, and I think this is going to be a little stupid problem, but I cannot find it... Any help?


Solution

  • This works for me:

    \{(#+)([a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*)((?:\>[a-zA-Z0-9_\x7f-\xff]*)*)\}\s*(\{\1\>empty\}\s*(.*?)\s*\{\1\>empty\})?\s*([^\n]*)\s*(\{\1\>firstrow\}\s*(.*?)\s*\{\1\>firstrow\})?\s*(\{\1\>row\}\s*(.*?)\s*\{\1\>row\})?\s*(\{\1\>lastrow\}\s*(.*?)\s*\{\1\>lastrow\})?\s*(.*?)\s*\{\1\}
    

    As far as I can tell (and it's really hard to tell) the problem was this part

    {\1\>empty\})?\s*(.*?)\s*
    

    specifically the (.*?) It wouldn't match the before rows because you're using the \s flag. Since it was non-greedy the . would stop at the first match, which in this case was the newline.

    What I did was replace it with:

     {\1\>empty\})?\s*([^\n]*)\s*
    

    Basically telling it to give me everything but a newline since I can't really use the dot operator here.

    Not sure my reasoning is 100% correct but my pattern should work as you can see here.

    http://regex101.com/r/dS4fG9