I have a string that looks something like this:
<strong>word</strong>: or <em>word</em> or <p><strong>word</strong>: this is a sentence</p> etc...
I am trying to parse each string into an array without the html element.
For example the string:
<strong>word</strong>
should end up being an array that looks like this:
['word', ':']
The string:
<p><strong>word</strong>: this is a sentence</p>
should end up being an array that looks like this:
['word', ':', 'this', 'is', 'a', 'sentence']
Is there anyway to do this via Javascript? My code below is creating an array of individual characters rather than words separated by spaces.
//w = the string I want to parse
var p = document.querySelector("p").innerText;
var result = p.split(' ').map(function(w) {
if (w === '')
return w;
else {
var tempDivElement = document.createElement("div");
tempDivElement.innerHTML = w;
const wordArr = Array.from(tempDivElement.textContent);
return wordArr;
}
});
console.log(result)
<p><strong>word</strong>: this is a sentence</p>
I would make the temp div first and extract the inner text. Then use match()
to find words (note \w
matches letters, numbers and underscore). This will treat the punctuation like :
as separate words, which seems to be what you want.
p = '<strong>word</strong>: or <em>word</em> or <p><strong>word</strong>: this is a sentence</p>'
var tempDivElement = document.createElement("div");
tempDivElement.innerHTML = p;
let t = tempDivElement.innerText
let words = t.match(/\w+|\S/g)
console.log(words)
If you just want the words, match only on \w
:
p = '<strong>word</strong>: or <em>word</em> or <p><strong>word</strong>: this is a sentence</p>'
var tempDivElement = document.createElement("div");
tempDivElement.innerHTML = p;
let t = tempDivElement.innerText
let words = t.match(/\w+/g)
console.log(words)