Search code examples
phpregexpreg-replacearray-map

How to assign an ID to each replaced string in preg_replace and get a list of matched words


I already have a code that works, but i need to add two additional features. This code basically replaces all bad words form a sentence, and replaces it with dots (leaving the first word of the letter visible for the reader).

The new features I need to add:

  1. Assign an html span with a unique ID (autoincremental) to each replaced string in preg_replace

  2. Add all matched words (including repeated instances) in a php variable, in the same order.

This is my current code:

function sanitize_badwords($string) {
    $list = array(
        "dumb",
        "stupid",
        "brainless"
    );

    # use array_map to generate a regex of array for each word
    $relist = array_map(function($s) {
        return '/(?:\b(' . $s[0] . ')(?=' . substr($s, 1) . '\b)|(?!\A)\G)\pL/';
    }, $list);

    # call preg_replace using list of regex
    return preg_replace($relist, '<span id="bad_'.$counter.'">$1.</span>', $string);
}

echo sanitize_badwords('You are kind of dumb and brainless. Very dumb!');

The current code prints:

You are kind of d... and b......... Very d....!

After implementing the first feature, the result should be:

You are kind of <span id="bad_1">d...</span> and <span id="bad_2">b........</span>. Very <span id="bad_3">d...</span>!

The second feature should allow me to have a single php array that has all the matched words (including repeated instances):

$matches = array('dumb', 'brainless', 'dumb');

The reason I need this, is that I cannot print bad words in the crawlable html for ToS reasons, but I still need to display the bad word on mouseover via javascript later (I can easily take the contents of $matches and convert that to a javascript array and assign them to the hover state of all the bad_ids spans).


Solution

  • You could use preg_replace_callback() and pass the $counter reference to increment it :

    $list = array("dumb", "stupid", "brainless");
    $string = 'You are kind of dumb and brainless. Very dumb!';
    
    
    // See comments below - Many thanks @revo
    usort($list, function($a,$b) { return strlen($b) < strlen($b); }); 
    
    $counter = 0 ; // Initialize the counter
    $list_q = array_map('preg_quote', $list) ; // secure strings for RegExp
    
    
    // Transform the string
    $string = preg_replace_callback('~(' . implode('|',$list_q) . ')~', 
        function($matches) use (&$counter) {
           $counter++;
           return '<span id="bad_' . $counter . '">'
               . substr($matches[0], 0, 1)
               . str_repeat('.', strlen($matches[0]) - 1)
               . '</span>' ;
    }, $string);
    
    echo $string;
    

    Will outputs :

    You are kind of <span id="bad_1">d...</span> and <span id="bad_2">b........</span>. Very <span id="bad_3">d...</span>!
    

    Using a function, that stores the matches in $references variable :

    function sanitize_badwords($string, &$references) {
    
        static $counter  ;
        static $list  ;
        static $list_q  ;
    
        if (!isset($counter)) {
            $counter = 0 ;
            $list = array("dumb", "stupid", "brainless");
    
            // See comments below - Many Thanks @revo
            usort($list, function($a,$b) { return strlen($b)< strlen($b) ; }); 
    
            $list_q = array_map('preg_quote', $list);
        }
    
        return preg_replace_callback('~('.implode('|',$list_q).')~',
            function($matches) use (&$counter, &$references){
                $counter++;
                $references[$counter] = $matches[0];
                return '<span id="bad_'.$counter.'">'
                   . substr($matches[0],0,1)
                   . str_repeat('.', strlen($matches[0])-1)
                   . '</span>' ;
    
        }, $string) ;
    }
    
    $matches = [] ;
    echo sanitize_badwords('You are kind of dumb and brainless. Very dumb!', $matches) ;
    
    
    print_r($matches);
    

    Will outputs :

    You are kind of <span id="bad_1">d...</span> and <span id="bad_2">b........</span>. Very <span id="bad_3">d...</span>!
    
    Array
    (
        [1] => dumb
        [2] => brainless
        [3] => dumb
    )