Search code examples
phparraysregexpreg-match-allprofanity

Regex check for words and words with spaces separating letters


So I have an array of profanities that I am checking for in a string.

E.g.

$string = 'naughty string';
$words = [
    'naughty',
    'example',
    'words'
];
$pattern = '/('.join($words, '|').')/i';
preg_match_all($pattern, $string, $matches);
$matched = implode(', ', $matches[0]);

But I also want to check profanities split with spaces:

E.g.

n a u g h t y

Yes I can do this by adding it to the array:

$words = [
    'naughty',
    'n a u g h t y',
    'example',
    'e x a m p l e',
    'words',
    'w o r d s'
];

But I have a huge array of "bad" words and was wondering if there is any easy way of doing this?

------ EDIT ------

So this isn't meant to be super accurate. For my application every space is a new line.. So a string like this: n a u g h t y string would result in this:

n

a

u

g

h

t

y

string


Solution

  • To answer the question as asked, create a pattern like b\s*a\s*d instead of just bad:

    $string = 'some bad and b a d and more ugly and very u g l y words';
    
    $words = [
        'bad',
        'ugly'
    ];
    
    $pattern = '/\b(' . join(
        array_map(function($w) {
            return join(str_split($w), '\s*');
        }, $words), '|') .'\b)/i';
    
    print preg_replace($pattern, '***', $string); 
    // some *** and *** and more *** and very *** words
    

    On a more general note, you can't reliably remove profanities, especially in the unicode world. There's no way you can filter out something like ƒⓤçκ.