Search code examples
phppreg-split

Excluding some pattern from pattern_match in php


I am trying to split raw text into sentences. So I simply use preg_split() function and split a raw text into sentence based on occurrence of ?, . and ;. But as expected I faced some problem due to some special case of . for example "Dr.", "Mr.", etc.

How can I exclude such word, or patter from spliting?

preg_split('/(\. )|(\? )|(\; )!(Mr\.)/', $content);

Solution

  • You can add negative lookbehind to the regex to make sure that the dot is not preceded by "Mr" and company:

    preg_split('/((?<!(Mr|Dr))\.|\?|;) /', $content);
    

    I also simplified the regex a little bit. You should also consider substituting \s|$ (any whitespace or end of input) for the single space at the end of the current expression.

    See it in action.