Search code examples
javascripthtmlright-to-leftbidi

Get exact browser rendered text (RTL and LTR direction mix)


Is there a way to retrieve the actual rendered text by a browser (in the context of Right-to-left text direction)?

<html dir="rtl">
<body>
  <p id='ko'>Hello (world)</p>
  <p id='ok'>Hello <bdo dir='ltr'>(world)</bdo></p>
</body>
</html>

Will render :

  • in chrome

enter image description here

  • in firefox

enter image description here

But both document.getElementById('ok').textContent === document.getElementById('ko').textContent and document.getElementById('ok').innerText === document.getElementById('ko').innerText are true (for both browsers).

Is there a way to get the actual text that is displayed in the webpage?

https://jsfiddle.net/019kvo56/1/


Solution

  • There is an direction CSS property that you can grab from e.g getComputedStyle(elem), but this is only at the element level, so you can't know exactly how the browser did render the textNodes.

    So what you need to do is :

    • first grab all the textNodes from your container (best done with a TreeWalker).
    • select each of its characters with an Range object
    • get each character's current position thanks to the Range's getBoundingClientRect() method.
    • sort them
    • get back their text values

    Here is a live demo :

    function getDisplayedText(container) {
    
      var r = document.createRange(); // to get our nodes positions
    
      var nodes = []; // first grab all the nodes
      var treeWalker = document.createTreeWalker(container, NodeFilter.SHOW_TEXT, null, false);
      while (treeWalker.nextNode()) nodes.push(treeWalker.currentNode);
    
      var chars = []; // then get all its contained characters
      nodes.forEach(n => {
        n.data.split('').forEach((c, i) => {
          r.setStart(n, i); // move the range to this character
          r.setEnd(n, i+1);
          chars.push({
            text: c,
            rect: r.getBoundingClientRect() // save our range's DOMRect
          })
        })
      });
    
      return chars.filter(c => c.rect.height) // keep only the displayed ones (i.e no script textContent)
        .sort((a, b) => { // sort ttb ltr
          if (a.rect.top === b.rect.top) {
            return a.rect.left - b.rect.left;
          }
          return a.rect.top - b.rect.top;
        })
        .map(n => n.text)
        .join('');
    }
    
    console.log('ko : ', getDisplayedText(ko));
    console.log('ok : ', getDisplayedText(ok));
    <div dir="rtl">
      <p id='ko'>Hello (world)</p>
      <p id='ok'>Hello <bdo dir='ltr'>(world)</bdo></p>
    </div>

    And now, as to why webkit does render the last ) flipped and first... I've got no idea if they're correct or not to do so...