javascript jquery html google-chrome-extension split

Modify HTML while preserving the existing elements and event listeners

I'm new here. I try to explain you my problem. I'm developing an extension for Chrome that manage DOM. I have to split up each single word inside  tag element, to apply after some css features on each word, but preserving other tag elements (<a>, , , etc.) that could be in  tag.

Example of possible text in a web page:

<p> 
   Sed ut <a> perspiciatis unde omnis </a> 
   iste natus <em> error sit </em> 
   voluptatem <strong> accusantium </strong> 
   doloremque laudantium 
</p>

Using jQuery, I've thought to put a  tag around each word to define a class attribute to use with css. I found this code that splits the words (belonging to ) correctly but doesn't consider other possible elements inside .

Code used (that doesn't do what I need):


 $("p").each(function() {
    var originalText = $(this).text().split(' ');
    var spannedText = [];

    for (var i = 0; i < originalText.length; i += 1) {
        if(originalText[i] != ""){
           spannedText[i] = ('<span class="...">' + originalText.slice(i,i+1).join(' ') + '</span>');
         }
     }

     $(this).html(spannedText.join(' '));
 });

In the example shown above this codes generate the following output, removing the other tag elements:

<p> 
    <span>Sed</span> 
    <span>ut</span> 
    <span>perspiciatis</span> 
    <span>unde</span> 
    <span>omnis</span> 
    <span>iste</span> 
    <span>natus</span> 
    <span>error</span> 
    <span>sit</span> 
    <span>voluptatem</span> 
    <span>accusantium</span>
    <span>doloremque</span> 
    <span>laudantium</span> 
</p>

It is close to solution I need but in this case all the tags present in the example (<a>, , ) are removed and substituted with  tag.

Instead I want to keep the html structure of  and insert only ... for each word.

This it the output I would like to achieve:

<p> 
    <span>Sed</span> 
    <span>ut</span> 
    <a> <span>perspiciatis</span> <span>unde</span> <span>omnis</span> </a>
    <span>iste</span> 
    <span>natus</span> 
    <em> <span>error</span> <span>sit</span> </em>
    <span>voluptatem</span> 
    <strong> <span>accusantium</span> </strong>
    <span>doloremque</span> 
    <span>laudantium</span> 
</p>

Can you help me?

Solution

Never replace HTML via innerHTML or jQuery's html()

Replacing HTML destroys all event listeners added in JavaScript to the child elements and makes the browser re-parse the entire thing which is a CPU-intensive operation so it can be slow on slower devices. Don't do this.

Process only the text nodes recursively:

const span = document.createElement('span');
span.className = 'foo';
span.appendChild(document.createTextNode(''));

// these will display <span> as a literal text per HTML specification
const skipTags = ['textarea', 'rp'];

for (const p of document.getElementsByTagName('p')) {
  const walker = document.createTreeWalker(p, NodeFilter.SHOW_TEXT);
  // collect the nodes first because we can't insert new span nodes while walking
  const textNodes = [];
  for (let n; (n = walker.nextNode());) {
    if (n.nodeValue.trim() && !skipTags.includes(n.parentNode.localName)) {
      textNodes.push(n);
    }
  }
  for (const n of textNodes) {
    const fragment = document.createDocumentFragment();
    for (const s of n.nodeValue.split(/(\s+)/)) {
      if (s.trim()) {
        span.firstChild.nodeValue = s;
        fragment.appendChild(span.cloneNode(true));
      } else {
        fragment.appendChild(document.createTextNode(s));
      }
    }
    n.parentNode.replaceChild(fragment, n);
  }
}

Since we may be replacing thousands of nodes, this code tries to be fast: it uses TreeWalker API, DOM cloning, skipping the potentially superlong sequences of spaces and line breaks via a simple regular expression \s+, and DocumentFragment to place the new nodes in one mutation operation. And of course not using jQuery.

P.S. There are advanced libraries for much more complex matching and processing like mark.js.