Search code examples
javascripthtmlcontenteditablespell-checkingrangy

How to map visible text indices to their location in an HTML tree


I am trying to implement a web-based rich text editor that will automatically decorate the user's text while he's typing (think spellcheck).

The issue is that the server only processes raw text, and returns annotations with their index + length in the raw text.

So the complete flow must look like :

  1. When spellcheck routine triggers, it converts the contents of the HTML structure into raw text.
  2. Query the server for spellcheck annotations.
  3. From the returned indices, find out the corresponding HTML portion and surround it with underline tags.

For step one I am using Rangy and especially the TextRange module. However for step 3, I can't find a proper way to convert text indices to their corresponding HTML node + offset.

I'm looking for a solution that would be quite robust, that can handle unicode characters, words that are cut in middle by a tag, or any other weird HTML structure.

FYI I am using Pell rich editor but the problem is the same with any contenteditable-based editor, and if another one solves this poblem I will happily switch.

What's the best way to achieve this goal?


Solution

  • Turns out I totally missed the selectCharacters() method from Rangy which solves this problem.

    const content = document.getElementById("content");
    const range = rangy.createRange();
    // indexes are in unicode code points, not bytes
    range.selectCharacters(content, /* from */ 0, /* to */ 5);