Search code examples
phpregexpreg-split

Split string on dots not preceded by a digit without losing digit in split


Given the following sentence:

The is 10. way of doing this. And this is 43. street.

I want preg_split() to give this:

Array (
 [0] => "This is 10. way of doing this"
 [1] => "And this is 43. street"
)

I am using:

preg_split("/[^\d+]\./i", $sentence)

But this gives me:

Array (
 [0] => "This is 10. way of doing thi"
 [1] => "And this is 43. stree"
)

As you can see, the last character of each sentence is removed. I know why this happens, but I don't know how to prevent it from happening. Any ideas? Can lookaheads and lookbehinds help here? I am not really familiar with those.


Solution

  • You want to use a negative assertion for that:

    preg_split("/(?<!\d)\./i",$sentence)
    

    The difference is that [^\d]+ would become part of the match, and thus split would remove it. The (?! assertion is also matched, but is "zero-width", meaning it does not become part of the delimiter match, and thus won't be thrown away.