Search code examples
phpfilterpunctuationprofanity

String has been split using punctuation as delimiters; how to reassemble and put the punctuation back in?


Im implementing a profanity filter by using a Trie data structure. Every swear word is added to the Trie. When I have a string to remove profanities from, I explode the string by using punctuations and check every word with the Trie. If found I replace by asterisks.Then I implode the string The issue is, how do I keep track of punctuations? In other words how do I make sure the resultant string has punctuations?


Solution

  • If you are using preg_split() to split up your string, consider using the PREG_SPLIT_DELIM_CAPTURE flag to capture the punctuation with the matches.

    Consider:

    $str = "This. string/ has? punctuation!";
    print_r(preg_split('/(\W+)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE));
    
    /*
      Array
      (
          [0] => This
          [1] => . 
          [2] => string
          [3] => / 
          [4] => has
          [5] => ? 
          [6] => punctuation
          [7] => !
          [8] => 
      )
    */
    

    See http://php.net/preg_split for more information.