So I have an array of profanities that I am checking for in a string.
E.g.
$string = 'naughty string';
$words = [
'naughty',
'example',
'words'
];
$pattern = '/('.join($words, '|').')/i';
preg_match_all($pattern, $string, $matches);
$matched = implode(', ', $matches[0]);
But I also want to check profanities split with spaces:
E.g.
n a u g h t y
Yes I can do this by adding it to the array:
$words = [
'naughty',
'n a u g h t y',
'example',
'e x a m p l e',
'words',
'w o r d s'
];
But I have a huge array of "bad" words and was wondering if there is any easy way of doing this?
------ EDIT ------
So this isn't meant to be super accurate. For my application every space is a new line.. So a string like this: n a u g h t y string would result in this:
n
a
u
g
h
t
y
string
To answer the question as asked, create a pattern like b\s*a\s*d
instead of just bad
:
$string = 'some bad and b a d and more ugly and very u g l y words';
$words = [
'bad',
'ugly'
];
$pattern = '/\b(' . join(
array_map(function($w) {
return join(str_split($w), '\s*');
}, $words), '|') .'\b)/i';
print preg_replace($pattern, '***', $string);
// some *** and *** and more *** and very *** words
On a more general note, you can't reliably remove profanities, especially in the unicode world. There's no way you can filter out something like ƒⓤçκ
.