Search code examples
javascriptregexregex-group

Match sentences and whitespace separately


Take the following text:

This is a sentence. This is a sentence...    This is a sentence! This is a sentence? This is a sentence.This is a sentence. This is a sentence

I'd like to match this so I have an array like the following:

[
  "This is a sentence.",
  " ",
  "This is a sentence...",
  "    ",
  "This is a sentence!",
  " ",
  "This is a sentence?",
  " ",
  "This is a sentence.",
  "",
  "This is a sentence.",
  " ",
  "This is a sentence",
]

With my current regex, however:

str.match(/[^.!?]+[.!?]*(\s*)/g);

I get the following:

[
  "This is a sentence. ",
  "This is a sentence...    ", 
  "This is a sentence! ",
  "This is a sentence? ", 
  "This is a sentence.", 
  "This is a sentence. ", 
  "This is a sentence"
]

How can I achieve this with JS ReExp?

Thanks in advance!


Solution

  • Just add [^\s] at the beginning and change (\s*) to |\s+.

    The final regex will be like:

    str.match(/[^\s][^.!?]+[.!?]*|\s+/g)

    • [^\s] will remove white spaces from the beginning of the expression
    • |\s+ will treat white spaces as a new expression