Search code examples
phpregexpreg-split

Splitting paragraph into sentences keeping the punctuations - not a dup


Here us a point i am stuck again using regular expression with PHP preg_split() function.

Here is the code :

preg_split('~("[^"]*")|[!?.।]+\s*|\R+~u', $paragraph, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);

I am trying to split a paragraph into sentences. This code does the job for me.
here is a link to my previous question

But, now I need to keep the punctuation intact(the question marks, full stop etc.).

using the PREG_SPLIT_DELIM_CAPTURE is supposed to have done that job but somehow it's not working that way. I get only sentences, without the full-stop or question marks.


Solution

  • Your requirement doesn't need PREG_SPLIT_DELIM_CAPTURE. It's helpful when you need them to be returned as individual matches. In this case you need \K:

    <?php
    
    var_dump(preg_split('~("[^"]*")|[!?.।]+\K\s*|\R+~u', <<<STR
    hello! how are you? how is life
    live life, live free. "isnt it?"
    STR
    , -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY));
    

    Output:

    array(5) {
      [0]=>
      string(6) "hello!"
      [1]=>
      string(12) "how are you?"
      [2]=>
      string(11) "how is life"
      [3]=>
      string(21) "live life, live free."
      [4]=>
      string(10) ""isnt it?""
    }