Search code examples
phpregexpreg-split

PHP + Split paragraph into array


I cannot find any solution to this. Please help. I need to split this "paragraph" into sentences array:

$paragraph = "a. b. c. hello o.c.. hello world -in.. hello. world. 8.5 hello world. ";

The resulting array should look like:

0=>a.
1=>b.
2=>c.
3=>hell o.c.
4=>hello world -in.
5=>hello.
6=>world.
7=>8.5 hello world.

I got this far

preg_split('/(?<=[.?!;:])\s+/', $sentence, -1, PREG_SPLIT_NO_EMPTY);

But this does not allow a decimal number.


Solution

  • You can use (*SKIP)(*FAIL) to tell the regex to not match if the preceding match matches. So

    (in|o\.c)\.\h+(*SKIP)(*FAIL)|(?<=[.?!])\s+
    

    Should tell the regex to not match if in. or o.c. is matched. Otherwise split on ., !, or ? and a space.

    PHP Demo: https://eval.in/542856
    Regex101 Demo: https://regex101.com/r/eS0tR7/1