The following regex (taken from here) splits a string by characters length (e.g. 20 characters), while being word-aware (live demo):
\b[\w\s]{20,}?(?=\s)|.+$
This means that if a word should be "cut" in the middle (based on the provided characters length) - then the whole word is taken instead:
const str = "this is an input example of one sentence that contains a bit of words and must be split"
const substringMaxLength = 20;
const regex = new RegExp(`\\b[\\w\\s]{${substringMaxLength},}?(?=\\s)|.+$`, 'g');
const substrings = str.match(regex);
console.log(substrings);
However, as can be seen when running the snippet above, the leading whitespace is taken with each substring. Can it be ignored, so that we'll end up with this?
[
"this is an input example",
"of one sentence that",
"contains a bit of words",
"and must be split"
]
I tried adding either [^\s]
, (?:\s)
, (?!\s)
everywhere, but just couldn't achieve it.
How can it be done?
You can require that every match starts with \w
-- so for both options of your current regex:
const str = "this is an input example of one sentence that contains a bit of words and must be split"
const substringMaxLength = 20;
const regex = new RegExp(`\\b\\w(?:[\\w\\s]{${substringMaxLength-1},}?(?=\\s)|.*$)`, 'g');
const substrings = str.match(regex);
console.log(substrings);