Search code examples

How can I split a sentence into words and punctuation marks?

For example, I want to split this sentence:

I am a sentence.

Into an array with 5 parts; I, am, a, sentence, and ..

I'm currently using preg_split after trying explode, but I can't seem to find something suitable.

This is what I've tried:

$sentence = explode(" ", $sentence);
returns array(4) {
  string(1) "I"
  string(2) "am"
  string(1) "a"
  string(8) "sentence."

And also this:

$sentence = preg_split("/[.?!\s]/", $sentence);
returns array(5) {
  string(1) "I"
  string(2) "am"
  string(1) "a"
  string(8) "sentence"
  string(0) ""

How can this be done?


  • You can split on word boundaries:

    $sentence = preg_split("/(?<=\w)\b\s*/", 'I am a sentence.');

    Pretty much the regex scans until a word character is found, then after it, the regex must capture a word boundary and some optional space.


    array(5) {
      string(1) "I"
      string(2) "am"
      string(1) "a"
      string(8) "sentence"
      string(1) "."