Im implementing a profanity filter by using a Trie data structure. Every swear word is added to the Trie. When I have a string to remove profanities from, I explode the string by using punctuations and check every word with the Trie. If found I replace by asterisks.Then I implode the string The issue is, how do I keep track of punctuations? In other words how do I make sure the resultant string has punctuations?
If you are using preg_split()
to split up your string, consider using the PREG_SPLIT_DELIM_CAPTURE
flag to capture the punctuation with the matches.
Consider:
$str = "This. string/ has? punctuation!";
print_r(preg_split('/(\W+)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE));
/*
Array
(
[0] => This
[1] => .
[2] => string
[3] => /
[4] => has
[5] => ?
[6] => punctuation
[7] => !
[8] =>
)
*/
See http://php.net/preg_split for more information.