i have this html string:
this simple the<b>html string</b> text test that<b>need</b>to<b>spl</b>it it too
i want to split it and have result array like this :
this simple
the<b>html string<b>
text test
that<b>need</b>to<b>spl</b>it
it too
i tried this way :
var string ='this simple the<b>html string</b> text test that<b>need</b>to<b>spl</b>it it too';
var regex = XRegExp('((?:[\\p{L}\\p{Mn}]+|)<\\s*.*?[^>]*>.*?<\/.*?>(?:[\\p{L}\\p{Mn}]+|))', "g");
result = string.split(regex);
it didn't work i don't want split word by word is there way to do it ...
Use
string.split(/\s*(?<!\S)([^\s<>]+(?:\s+[^\s<>]+)*)(?!\S)\s*/).filter(Boolean);
Capturing group will enable saving the matches as part of the resulting array.
REGEX EXPLANATION
NODE EXPLANATION
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?<! look behind to see if there is not:
--------------------------------------------------------------------------------
\S non-whitespace (all but \n, \r, \t, \f,
and " ")
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[^\s<>]+ any character except: whitespace (\n,
\r, \t, \f, and " "), '<', '>' (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ")
(1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
[^\s<>]+ any character except: whitespace (\n,
\r, \t, \f, and " "), '<', '>' (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
\S non-whitespace (all but \n, \r, \t, \f,
and " ")
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
JavaScript:
const string = 'this simple the<b>html string</b> text test that<b>need</b>to<b>spl</b>it it too';
const regex= /\s*(?<!\S)([^\s<>]+(?:\s+[^\s<>]+)*)(?!\S)\s*/;
console.log(string.split(regex).filter(Boolean));
Output:
[
"this simple",
"the<b>html string</b>",
"text test",
"that<b>need</b>to<b>spl</b>it",
"it too"
]