I'm trying to match/replace the following input text with regular expressions in PHP:
{#var1>var2}
{#>empty}inside empty{#>empty}
before rows
{#>firstrow}inside firstrow{#>firstrow}
{#>row}inside row{#>row}
{#>lastrow}inside lastrow{#>lastrow}
after rows
{#}
where var1>var2 is an array:
$var1['var2'] = array('key1' => 'value1', 'key2' => 'value2', ...)
I have the following class to parse text with the regular expression (using preg_replace_callback):
class parse {
public static function text($text) {
$text = preg_replace_callback('/\{(#+)([a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*)((?:\>[a-zA-Z0-9_\x7f-\xff]*)*)\}\s*(\{\1\>empty\}\s*(.*?)\s*\{\1\>empty\})?\s*(.*?)\s*(\{\1\>firstrow\}\s*(.*?)\s*\{\1\>firstrow\})?\s*(\{\1\>row\}\s*(.*?)\s*\{\1\>row\})?\s*(\{\1\>lastrow\}\s*(.*?)\s*\{\1\>lastrow\})?\s*(.*?)\s*\{\1\}/s', array('parse', 'replace_array'), $text);
return $text;
}
public static function replace_array($matches) {
print_r($matches);
}
}
I get the (incorrect) output:
Array (
[0] => {#var1>var2>var3} {#>empty}inside empty{#>empty} before rows {#>firstrow}inside firstrow{#>firstrow} {#>row}inside row{#>row} {#>lastrow}inside lastrow{#>lastrow} after rows {#}
[1] => #
[2] => var1
[3] => >var2
[4] => {#>empty}inside empty{#>empty}
[5] => inside empty
[6] =>
[7] =>
[8] =>
[9] =>
[10] =>
[11] =>
[12] =>
[13] => before rows {#>firstrow}inside firstrow{#>firstrow} {#>row}inside row{#>row} {#>lastrow}inside lastrow{#>lastrow} after rows
)
When I remove the "before rows" from the input text, I get the correct result:
Array (
[0] => {#var1>var2>var3} {#>empty}inside empty{#>empty} {#>firstrow}inside firstrow{#>firstrow} {#>row}inside row{#>row} {#>lastrow}inside lastrow{#>lastrow} after rows {#}
[1] => #
[2] => var1
[3] => >var2
[4] => {#>empty}inside empty{#>empty}
[5] => inside empty
[6] =>
[7] => {#>firstrow}inside firstrow{#>firstrow}
[8] => inside firstrow
[9] => {#>row}inside row{#>row}
[10] => inside row
[11] => {#>lastrow}inside lastrow{#>lastrow}
[12] => inside lastrow [13] => after rows
)
I'm already searching for a day, and I think this is going to be a little stupid problem, but I cannot find it... Any help?
This works for me:
\{(#+)([a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*)((?:\>[a-zA-Z0-9_\x7f-\xff]*)*)\}\s*(\{\1\>empty\}\s*(.*?)\s*\{\1\>empty\})?\s*([^\n]*)\s*(\{\1\>firstrow\}\s*(.*?)\s*\{\1\>firstrow\})?\s*(\{\1\>row\}\s*(.*?)\s*\{\1\>row\})?\s*(\{\1\>lastrow\}\s*(.*?)\s*\{\1\>lastrow\})?\s*(.*?)\s*\{\1\}
As far as I can tell (and it's really hard to tell) the problem was this part
{\1\>empty\})?\s*(.*?)\s*
specifically the (.*?)
It wouldn't match the before rows
because you're using the \s
flag. Since it was non-greedy the .
would stop at the first match, which in this case was the newline.
What I did was replace it with:
{\1\>empty\})?\s*([^\n]*)\s*
Basically telling it to give me everything but a newline since I can't really use the dot operator here.
Not sure my reasoning is 100% correct but my pattern should work as you can see here.