Search code examples
javascriptarraysinnertext

Is there a way to create an array of individual words from innerText via JavaScript?


I have a string that looks something like this:

<strong>word</strong>: or <em>word</em> or <p><strong>word</strong>: this is a sentence</p> etc...

I am trying to parse each string into an array without the html element.
For example the string:

<strong>word</strong>

should end up being an array that looks like this:

['word', ':']

The string:

<p><strong>word</strong>: this is a sentence</p>

should end up being an array that looks like this:

['word', ':', 'this', 'is', 'a', 'sentence']      

Is there anyway to do this via Javascript? My code below is creating an array of individual characters rather than words separated by spaces.

//w = the string I want to parse
var p = document.querySelector("p").innerText;

var result = p.split(' ').map(function(w) {
  if (w === '')
    return w;
  else {
    var tempDivElement = document.createElement("div");
    tempDivElement.innerHTML = w;

    const wordArr = Array.from(tempDivElement.textContent);
    return wordArr;
  }
});
console.log(result)
<p><strong>word</strong>: this is a sentence</p>


Solution

  • I would make the temp div first and extract the inner text. Then use match() to find words (note \w matches letters, numbers and underscore). This will treat the punctuation like : as separate words, which seems to be what you want.

    p = '<strong>word</strong>: or <em>word</em> or <p><strong>word</strong>: this is a sentence</p>'
    
    var tempDivElement = document.createElement("div");
    tempDivElement.innerHTML = p;
    
    let t = tempDivElement.innerText
    let words = t.match(/\w+|\S/g)
    console.log(words)

    If you just want the words, match only on \w:

    p = '<strong>word</strong>: or <em>word</em> or <p><strong>word</strong>: this is a sentence</p>'
    
    var tempDivElement = document.createElement("div");
    tempDivElement.innerHTML = p;
    
    let t = tempDivElement.innerText
    let words = t.match(/\w+/g)
    console.log(words)