Search code examples
javascriptdomselectors-api

Get innerText and split by <br>


Below is a minimal example of some HTML for which I am trying to extract the text Content. My desired outcome is the array ['keep1', 'keep2', 'keep3', 'keep4', 'keep5'], so I am dropping anything that is a child element of the div, then splitting the div's text into an array on the <br /> tags.

Usually I would use .innerText on the div which helpfully gets all the text and drops child elements, but as far as I am aware is not suitable in this case because then I lose the <br /> tags that I need for splitting into an array. Below is the best I could come up with, but doesn't handle cases where child elements are not surrounded by <br />. Is there any better way to do this?

const text = document
  .querySelector("div")
  .innerHTML.split("<br>")
  .map(e => e.trim())
  .filter(e => e[0] != "<" && e != "");
console.log(text);
<div>
  <br /> keep1 <br /> keep2
  <span>drop</span> keep3
  <br /> keep4
  <br />
  <h4>drop2</h4>
  <br />keep5
</div>


Solution

  • One possible approach is as below:

    // we use the spread syntax inside of an Array-literal to convert the
    // iterable result of document.querySelector().childNodes into an
    // Array:
    const text = [...
      // here we retrieve the first/only <div> element from the document
      // and return the live NodeList of all its child-nodes:
      document.querySelector('div').childNodes
      // we then use Array.prototype.filter() to filter the returned collection:
    ].filter(
      // we use an Arrow function to test each node passed to the
      // Array.prototype.filter() method ('node' is a reference to the current
      // node of the Array of nodes;
      // node.nodeType: we first test that the node has a nodeType,
      // we then assess if the node is a textNode (the nodeType of a text-node
      // is 3),
      // finally - to prevent empty array-element-values - we check that
      // the length of the nodeValue (the text-content of the text-node) once
      // leading and trailing white-space is removed has a length greater
      // than zero:
      (node) => node.nodeType && node.nodeType === 3 && node.nodeValue.trim().length > 0
      // we then use Array.prototype.map() to return a new Array based on the existing
      // Array of text-nodes:
    ).map(
      // again we pass the array-element into the function,
      // and here we trim the leading/trailing white-space of the node's value,
      // by passing the string to String.prototype.trim():
      (node) => node.nodeValue.trim()
    );
    
    console.log(text); // ["keep1","keep2","keep3","keep4","keep5"]
    <div>
      <br /> keep1 <br /> keep2
      <span>drop</span> keep3
      <br /> keep4
      <br />
      <h4>drop2</h4>
      <br />keep5
    </div>

    JS Fiddle demo.

    References: