php security preg-replace preg-replace-callback

preg_replace & preg_replace_callback security concerns

Ok, I know this is a bad practice, but part of the code was existing, and I have to extend it to run custom functions with one argument.

So, our pages are stored in the db, and when they are displayed in our template, we are currently using three different preg_replace function with e modifiers, on the whole html page.

This seems slow, so I'd like to change it to only use one preg_replace call, and be able to supply a custom function with 1 argument in a bbcode way: Example:

[FUNC:testfunc(test string)]

So, this is what I came up with... I'm not sure which method is more secure, preg_rplace with the e modifer, or preg_replace_callback:

<?php
$str = '
<h2>Title That should Not Be Affected</h2>
<p>[FUNC:linkbox(/somestuff/newpage.html)]</p>
<a href="[FUNC:getvar(url)]">[FUNC:getvar(title)]</a>
<p>Random Html THat should not be affected</p>
<p>[FUNC:linkbox(/somestuff/otherpage.html)]</p>
';

$str2 = preg_replace_callback('~\[FUNC:(.*?)\((.*?)\)\]~', 'callback_caller', $str);

$str = preg_replace('~\[FUNC:(.*?)\((.*?)\)\]~e', 'emodcaller("\\1", "\\2")', $str);

echo $str.'<br><br>'.$str2;

function callback_caller($args){

    if(!isset($args[0], $args[1]))
        return false;

    $func = strip_tags($args[1]);
    $param = isset($args[2]) ? strip_tags($args[2]) : '';

    //Only allow calling of known functions
    switch($func){

        case 'linkbox':
            return linkbox($param);
        break;

        case 'getvar':
            return getvar($param);
        break;

        case 'default':
            return '';
        break;

    }

}

function emodcaller($fun, $arg){
    $arg = strip_tags($arg);

    //Only allow calling of known functions
    switch($fun){
        case 'linkbox':
            return linkbox($arg);
        break;

        case 'getvar':
            return getvar($arg);
        break;

        case 'default':
            return '';
        break;

    }

}

function linkbox($addy){
    return 'Linkbox Called: '.$addy;
}

function getvar($arg) {

    switch($arg){
        case 'url':
            return '/index.html';
        break;

        case 'title':
            return 'This is a test title';
        break;
    }

}
?>

THe thing I like about using the e modifier, is that I could add another parameter onto the function call directly in php if I needed to, like this:

preg_replace('~\[FUNC:(.*?)\((.*?)\)\]~e', 'emodcaller("\\1", "\\2", $page->getid())', $str);

Is one method safer than the other? Are they both huge security risks? I have to have some implementation of these..

Edit: I figured out that you can pass the callback function additional parameters using an anonymous function like:

$content = preg_replace_callback(
'~(?:\<p\>)?\[FUNC:(.*?)\((.*?)\)\](?:\<\/p\>)?~',
function($matches) use ($article) {
return callback_caller($matches, $article);
}, 
$content);

This passes the callback_caller() function my entire article class for use. Is creating an anonymous function like this for every match bad, performance-wise?

Solution

Depending on how many times you're executing this, creating the anonymous functions may be bad.

You can always use a method call for a callback like this:

$content = preg_replace_callback($rx, array($obj, 'method_name'), $content);

That allows you to avoid having to pass any arguments into the method if they are already object properties.

I tend to favor preg_replace_callback() over preg_replace() with 'e', because it avoids the pitfalls of correctly escaping any pattern matches in the eval'd code and avoids the overhead of having to eval at all (and if you're using an opcode cache like APC, it means that there's zero compilation vs the eval which compiles on every call). It also makes code a little easier to read - eval'd code always looks a little uglier because you have to escape the string.

That said, there's nothing inherently wrong with using preg_replace() with 'e'.