Search code examples
phparraysregexstringlegacy

How to refactor a function to use preg_replace_callback() with multiple regex strings and replacement strings in multiple arrays?


I am refactoring some legacy PHP code (version 5.5) and I need to refactor a function to use preg_replace_callback() from preg_replace() (this is because of a \e evaluation deprecation) however what is being passed into the only 2 preg_replace() calls are arrays of search patterns and arrays of replacements.

I am aware I need to pass in a callback function to preg_replace_callback() but what would be the way to go about this when each pattern is different?

    function format($in, $options = 0)
{
    if (!$options) {
        $options = FORMAT_BREAKS | FORMAT_HTMLCHARS | FORMAT_CENSOR;
    }

    if ($options & FORMAT_CENSOR) {
        if (!$this->replaces_loaded) {
            $this->get_replaces();
        }

        if ($this->censor) {
            $in = preg_replace($this->censor, '####', $in);
        }
    }

    if ($options & FORMAT_MBCODE) {
        $search = array(
            '~(^|\s)([a-z0-9-_.]+@[a-z0-9-.]+\.[a-z0-9-_.]+)~i',
            '~(^|\s)(http|https|ftp)://(\w+[^\s\[\]]+)~ise'
        );

        $replace = array(
            '\\1[email]\\2[/email]',
            '\'\\1[url]\' .wordwrap(\'\\2://\\3\', 1, \' \', 1) . \'[/url]\''
        );

        $brackets = (strpos($in, '[') !== false) && (strpos($in, ']') !== false);

        if ($brackets) {
            $b_search = array(
                '~\[code](.*?)\[/code]~ise',
                '~\[php](.*?)\[/php]~ise',
                '~\[php=([0-9]+?)](.*?)\[/php]~ise',
                '~\[img](http|https|ftp)://(.*?)\[/img]~ise',
                '~\[url](.*?)\[/url]~ise',
                '~\[url=(http|https|ftp)://(.+?)](.+?)\[/url]~ise'
            );

            $b_replace = array(
                '\'[code]\' . function (array $matches){ return base64_encode(\'\\1\') . \'[/code]\'',
                '\'[php]\' . base64_encode(\'\\1\') . \'[/php]\'',
                '\'[php=\\1]\' . base64_encode(\'\\2\') . \'[/php]\'',
                '\'[img]\' . wordwrap(\'\\1://\\2\', 1, \' \', 1) . \'[/img]\'',
                '\'[url]\' . wordwrap(\'\\1\\2\', 1, \' \', 1) . \'[/url]\'',
                '\'[url=\' . wordwrap(\'\\1://\\2\', 1, \' \', 1) . \']\\3[/url]\''
            );

            $search  = array_merge($search, $b_search);
            $replace = array_merge($replace, $b_replace);

            error_log(print_r($replace));
        }
        
        $in = preg_replace($search, $replace, $in);

        $brackets = (strpos($in, '[') !== false) && (strpos($in, ']') !== false); //We may have auto-parsed a URL, adding a bracket
    }

    $strtr = array();

    if ($options & FORMAT_HTMLCHARS) {
        $strtr['&'] = '&';
        $strtr['"'] = '"';
        $strtr['\''] = ''';
        $strtr['<'] = '&lt;';
        $strtr['>'] = '&gt;';
    }

    if ($options & FORMAT_BREAKS) {
        $strtr["\n"] = "<br />\n";
    }

    if ($this->user['user_view_emoticons'] && ($options & FORMAT_EMOTICONS)) {
        if (!$this->replaces_loaded) {
            $this->get_replaces();
        }

        $strtr = array_merge($strtr, $this->emotes['replacement']);
    }

    $in = strtr($in, $strtr);

    if (($options & FORMAT_MBCODE) && $brackets) {
        $search = array(
            '~\[(/)?([bi])]~i',
            '~\[u]~i',
            '~\[s]~i',
            '~\[/[us]]~i',
            '~\[url](h t t p|h t t p s|f t p) : / /(.+?)\[/url]~ise',
            '~\[url=(h t t p|h t t p s|f t p) : / /(.+?)](.+?)\[/url]~ise',
            '~\[email]([a-z0-9-_.]+@[a-z0-9-.]+\.[a-z0-9-_.]+)?\[/email]~i',
            '~\[email=([^<]+?)](.*?)\[/email]~i',
            '~\[img](h t t p|h t t p s|f t p) : / /(.*?)\[/img]~ise',
            '~\[(right|center)](.*?)\[/\1]~is',
            '~\[code](.*?)\[/code]~ise',
            '~\[php](.*?)\[/php]~ise',
            '~\[php=([0-9]+?)](.*?)\[/php]~ise',
            '~\[color=(\S+?)](.*?)\[/color]~is',
            '~\[font=(.+?)](.*?)\[/font]~is',
            '~\[size=([0-9]+?)](.*?)\[/size]~is'
        );

        $replace = array(
            '<\\1\\2>',
            '<span style=\'text-decoration:underline\'>',
            '<span style=\'text-decoration:line-through\'>',
            '</span>',
            '\'<a href="\' . str_replace(\' \', \'\', \'\\1://\\2\') . \'" onclick="window.open(this.href,\\\'' . $this->sets['link_target'] . '\\\');return false;">\' . str_replace(\' \', \'\', \'\\1://\\2\') . \'</a>\'',
            '\'<a href="\' . str_replace(\' \', \'\', \'\\1://\\2\') . \'" onclick="window.open(this.href,\\\'' . $this->sets['link_target'] . '\\\');return false;">\\3</a>\'',
            '<a href="mailto:\\1">\\1</a>',
            '<a href="mailto:\\1">\\2</a>',
            '\'<img src="\' . str_replace(\' \', \'\', \'\\1://\\2\') . \'" alt="\' . str_replace(\' \', \'\', \'\\1://\\2\') . \'" />\'',
            '<div align="\\1">\\2</div>',
            '$this->format_code(\'\\1\', 0)',
            '$this->format_code(\'\\1\', 1)',
            '$this->format_code(\'\\2\', 1, \'\\1\')',
            '<span style=\'color:\\1\'>\\2</span>',
            '<span style=\'font-family:\\1\'>\\2</span>',
            '<span style=\'font-size:\\1ex\'>\\2</span>'
        );

        if ((substr_count($in, '[quote]') + substr_count($in, '[quote=')) == substr_count($in, '[/quote]')) {
            $search[] = '~\[quote=(.+?)]~i';
            $search[] = '~\[quote]~i';
            $search[] = '~\[/quote]~i';

            $replace[] = '<table style="width:90%; margin-left:5%; margin-right:5%;" border="0" cellpadding="3" cellspacing="0"><tr><td><b>\\1 ' . $this->lang->main_said . ':</b></td></tr><tr><td class="quote">';
            $replace[] = '<table style="width:90%; margin-left:5%; margin-right:5%;" border="0" cellpadding="3" cellspacing="0"><tr><td><b>' . $this->lang->main_quote . ':</b></td></tr><tr><td class="quote">';
            $replace[] = '</td></tr></table>';
        }

        $in = preg_replace_callback($search, $replace, $in);
        $in = str_replace(array('  ', "\t", '&amp;#'), array('&nbsp; ', '&nbsp; &nbsp; ', '&#'), $in);
    }

    return $in;
}

I tested trying to place an anonymous functions directly into the replacement arrays but was served the error: Object of class Closure could not be converted to string. But perhaps I may have gone about it incorrectly?

            $replace = array(
            '\\1[email]\\2[/email]',
            "'\'\\1[url]\'" . function (array $matches) {return wordwrap($matches[1], $matches[2], $matches[3], $matches[4]); }  . " \'[/url]\''"
        );

Advice would be much appreciated.


Solution

  • Create an associative array where the keys contain regex patterns and the values contain the callbacks (no prepending/appending strings to the callbacks).

    I am not going to rewrite that behemoth from my phone, so I'll demonstrate a single replacement.

    Feed your array of patterns and callbacks to preg_replace_callback_array().

    Code: (Demo)

    $patternCallbacks = [
        '~\[code](.*?)\[/code]~is' =>
            function($m) {
                return '[code]' . base64_encode($m[1]) . '[/code]';
            },
        // add more elements as needed...
    ];
    
    echo preg_replace_callback_array(
             $patternCallbacks,
             'This is my [code]script[/code] to display'
         );
    

    Output:

    This is my [code]c2NyaXB0[/code] to display
    

    Edit, since you cannot use preg_replace_callback_array(), you will need to make iterated calls of preg_replace_callback().

    Code: (Demo)

    $patternCallbacks = [
        '~\[code](.*?)\[/code]~is' =>
            function($m) {
                return '[code]' . base64_encode($m[1]) . '[/code]';
            },
        '~\[php(?:=\d+)?]\K(.*?)\[/php]~is' =>
            function($m) {
                return base64_encode($m[1]) . '[/php]';
            },
    ];
    
    $text = <<<TEXT
    This is my [code]script[/code] to display.
    It has [php]unnumbered tag
     code[/php] and [php=8]numbered tag code[/php].'
    TEXT;
    
    foreach ($patternCallbacks as $pattern => $callback) {
        $text = preg_replace_callback($pattern, $callback, $text);
    }
    echo $text;
    

    Output:

    This is my [code]c2NyaXB0[/code] to display.
    It has [php]dW5udW1iZXJlZCB0YWcKIGNvZGU=[/php] and [php=8]bnVtYmVyZWQgdGFnIGNvZGU=[/php].