I have a couple of "shortcode" blocks in a text, which I want to replace with some HTML entities on the fly using preg_replace_callback.
The syntax of a shortcode is simple:
[block:type-of-the-block attribute-name1:value attribute-name2:value ...]
Attributes with values may be provided in any order. Sample regex pattern I use to find these shortcode blocks:
/\[
(?:block:(?<block>piechart))
(?:
(?:\s+value:(?<value>[0-9]+)) |
(?:\s+stroke:(?<stroke>[0-9]+)) |
(?:\s+angle:(?<angle>[0-9]+)) |
(?:\s+colorset:(?<colorset>reds|yellows|blues))
)*
\]/xumi
Now, here comes the funny thing: PHP matches non-existent named groups. For a string like this:
[block:piechart colorset:reds value:20]
...the resulting $matches array is (note the empty strings in "stroke" and "angle"):
array(11) {
[0]=>
string(39) "[block:piechart colorset:reds value:20]"
["block"]=>
string(8) "piechart"
[1]=>
string(8) "piechart"
["value"]=>
string(2) "20"
[2]=>
string(2) "20"
["stroke"]=>
string(0) ""
[3]=>
string(0) ""
["angle"]=>
string(0) ""
[4]=>
string(0) ""
["colorset"]=>
string(4) "reds"
[5]=>
string(4) "reds"
}
Here's the code for testing (you can execute it online here as well: https://onlinephp.io/c/2429a):
$pattern = "
/\[
(?:block:(?<block>piechart))
(?:
(?:\s+value:(?<value>[0-9]+)) |
(?:\s+stroke:(?<stroke>[0-9]+)) |
(?:\s+angle:(?<angle>[0-9]+)) |
(?:\s+colorset:(?<colorset>reds|yellows|blues))
)*
\]/xumi";
$subject = "here is a block to be replaced [block:piechart value:25 angle:720] [block] and another one [block:piechart colorset:reds value:20]";
preg_replace_callback($pattern, 'callbackFunction', $subject);
function callbackFunction($matches)
{
var_dump($matches);
// process matched values, return some replacement...
$replacement = "...";
return $replacement;
};
Is it normal that PHP creates empty entries in $matches array, just in case of a match, but doesn't clean it up when no actual match is found? What am I doing wrong? How to prevent PHP from creating these false entries, which simply shouldn't be there?
Any help or explanation would be deeply appreciated! Thanks!
This behaviour is as expected, although not well documented. In the manual under "Subpatterns":
When the whole pattern matches, that portion of the subject string that matched the subpattern is passed back to the caller
and:
Consider the following regex matched against the string Sunday:
(?:(Sat)ur|(Sun))day
Here Sun is stored in backreference 2, while backreference 1 is empty
and also in the documentation of the PREG_UNMATCHED_AS_NULL
flag (new as of version 7.2.0). From the manual:
If this flag is passed, unmatched subpatterns are reported as null; otherwise they are reported as an empty string.
Which then gives you a way to work around this behaviour:
preg_replace_callback($pattern, 'callbackFunction', $subject, -1, $count, PREG_UNMATCHED_AS_NULL);
If you take this approach then in your callback you could filter the $matches
array using array_filter
to remove the NULL
values.
$matches = array_filter($matches, function ($v) { return !is_null($v); }))