I already have a code that works, but i need to add two additional features. This code basically replaces all bad words form a sentence, and replaces it with dots (leaving the first word of the letter visible for the reader).
The new features I need to add:
Assign an html span with a unique ID (autoincremental) to each replaced string in preg_replace
Add all matched words (including repeated instances) in a php variable, in the same order.
This is my current code:
function sanitize_badwords($string) {
$list = array(
"dumb",
"stupid",
"brainless"
);
# use array_map to generate a regex of array for each word
$relist = array_map(function($s) {
return '/(?:\b(' . $s[0] . ')(?=' . substr($s, 1) . '\b)|(?!\A)\G)\pL/';
}, $list);
# call preg_replace using list of regex
return preg_replace($relist, '<span id="bad_'.$counter.'">$1.</span>', $string);
}
echo sanitize_badwords('You are kind of dumb and brainless. Very dumb!');
The current code prints:
You are kind of d... and b......... Very d....!
After implementing the first feature, the result should be:
You are kind of <span id="bad_1">d...</span> and <span id="bad_2">b........</span>. Very <span id="bad_3">d...</span>!
The second feature should allow me to have a single php array that has all the matched words (including repeated instances):
$matches = array('dumb', 'brainless', 'dumb');
The reason I need this, is that I cannot print bad words in the crawlable html for ToS reasons, but I still need to display the bad word on mouseover via javascript later (I can easily take the contents of $matches and convert that to a javascript array and assign them to the hover state of all the bad_ids spans).
You could use preg_replace_callback()
and pass the $counter
reference to increment it :
$list = array("dumb", "stupid", "brainless");
$string = 'You are kind of dumb and brainless. Very dumb!';
// See comments below - Many thanks @revo
usort($list, function($a,$b) { return strlen($b) < strlen($b); });
$counter = 0 ; // Initialize the counter
$list_q = array_map('preg_quote', $list) ; // secure strings for RegExp
// Transform the string
$string = preg_replace_callback('~(' . implode('|',$list_q) . ')~',
function($matches) use (&$counter) {
$counter++;
return '<span id="bad_' . $counter . '">'
. substr($matches[0], 0, 1)
. str_repeat('.', strlen($matches[0]) - 1)
. '</span>' ;
}, $string);
echo $string;
Will outputs :
You are kind of <span id="bad_1">d...</span> and <span id="bad_2">b........</span>. Very <span id="bad_3">d...</span>!
Using a function, that stores the matches in $references
variable :
function sanitize_badwords($string, &$references) {
static $counter ;
static $list ;
static $list_q ;
if (!isset($counter)) {
$counter = 0 ;
$list = array("dumb", "stupid", "brainless");
// See comments below - Many Thanks @revo
usort($list, function($a,$b) { return strlen($b)< strlen($b) ; });
$list_q = array_map('preg_quote', $list);
}
return preg_replace_callback('~('.implode('|',$list_q).')~',
function($matches) use (&$counter, &$references){
$counter++;
$references[$counter] = $matches[0];
return '<span id="bad_'.$counter.'">'
. substr($matches[0],0,1)
. str_repeat('.', strlen($matches[0])-1)
. '</span>' ;
}, $string) ;
}
$matches = [] ;
echo sanitize_badwords('You are kind of dumb and brainless. Very dumb!', $matches) ;
print_r($matches);
Will outputs :
You are kind of <span id="bad_1">d...</span> and <span id="bad_2">b........</span>. Very <span id="bad_3">d...</span>!
Array
(
[1] => dumb
[2] => brainless
[3] => dumb
)